Skip to content

Minutes 2021 11 10

Corentin Wallez edited this page Nov 16, 2021 · 1 revision

GPU Web 2021-11-10

Chair: Corentin

Scribe: Ken/Corentin

Location: Google Meet

Tentative agenda

  • CTS updates
  • Milestone triage (timebox 15m) #2073 (currently triaging V1 issues, start at #794)
  • Streaming implementations and indirect draws/dispatches #2189
  • Specify the importExternalTexture of WebCodec VideoFrame #2124 (Kai/Shaobo)
  • Refactor descriptors for attachment load/store/read-only #2215
  • Binding to zero-sized range of a buffer #686
  • Ensure extending fillBuffer is feature detectable #2278
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Brandon Jones
    • Corentin Wallez
    • Kai Ninomiya
    • Ken Russell
    • Loko Kung
    • Shrek Shao
  • Microsoft
    • Rafael Cintron

CTS updates

  • KN: there continue to be more shader tests!
  • Expanding compressed texture format support in tests. ETC/ASTC formats for all relevant parameterized tests. Increasing scope of ~100 different tests.
  • KR: Please contribute. We're missing a large number of tests so far still.
  • MM: we're starting our impl, have grand visions about how to do it. Don't want to write any platform-specific tests - i.e., don't write tests that aren't contributed to the CTS.
  • CW: great idea. If we had CTS and a great way to run it we would have done the same. Test coverage is partial. Execution tests are ~ok though good coverage in other areas. How will you know which tests need to be written and which are already there?
  • MM: no good plan yet - was planning to just look at the test suite.
  • KN: hoping it's well organized. If you open the standalone page, you can look at the whole list of tests. Useful way to see what's tested, rather than looking at all code / files, since one file can have multiple tests in it.
  • MM: for initial impl I don't think it'll be that hard to start. Can just look for what tests call a particular function.
  • MM: we're also looking at using your shared header.
  • CW: it's an impl detail. We have a JSON file that generates Emscripten, this one, and one that Dawn/Chromium use. Everything that's in there and Dawn - it's huge. Things that are different - device creation, just didn't get to it. Similar for wgpu. Hopefully it works at well. Feel free to raise issues on the repo.

Milestone triage (timebox 15m) #2073 (currently triaging V1 issues, start at #794)

  • #794 pipeline statistics clarifications
    • Fingerprinting concern: muck with the output data in the compute shader.
    • KN: not sure why we should do it.
    • MM: Intel proposed it; maybe they should give feedback.
    • CW: can discuss during APAC-timed meeting. They really wanted timestamp queries.
    • CW: non-precise occlusion queries are core. precise ones are non-core.
    • MM: not going to fight for this. ok with me if you see the utility.
    • CW: would like to discuss with Mozilla/Intel but think they'll be OK with it.
    • KN: will make sure we bring up the topic. Add to TODO list.
  • #814 Development-only API features
    • KN: think we can close this. Separate doc.
    • MM: two feasible ways: flag to browser, or web inspector API that talks back and forth with page.
    • CW: separate doc works great. When written, wanted place where APIs behind flag can be tested.
    • KN: close this?
    • CW: resolved, closed.
  • #842
    • MM: is this needed for MVP?
    • CW: no. If we do pixel format reinterpretation it might be a bonus.
    • MM: can always add something like this afterward.
    • CW: post-V1?
    • KN: yes.
  • #851 Optional GPUBindGroupLayout entries
    • CW: DM was coming up with process for making certain entries optional. minBufferBindingSize, optional, can be figured out at draw time. Why is this different for other things like viewDimension? Meta-question.
    • CW: createBindGroup-time validation if supplied in layout. Otherwise, draw-time validation.
    • CW: just discussing the rationale for how the API is.
    • KN: don't want to close it right now, but OK to close if DM's OK with closing it.
  • #950
    • CW: purely editorial.
    • MM: still need for v1.
    • KN: marked Needs-Spec.
  • #959 Multisample coverage on Metal
    • CW: just clarification. Stays in V1.
  • #962 Ordering of promise resolution.
    • KN: still want for V1.
    • MM: can you write a sophisticated app without ordering guarantees?
    • KN: problem is, probably de facto ordered differently between browsers, then will have problem. IF we can guarantee resolution's random, people would deal with it.
    • MM: thinking about it, colleagues feel strongly that it should be tightly specified when Promises resolve.
  • #965 … integer formats
    • KN: more or less figured out.
    • KN: question is for unsigned integer formats. What if when you spec a load value that doesn't fit into the format?
    • MM: what's the answer?
    • KN: don't remember. Do IDL conversions to the appropriate format, I think.
    • KN: think we can figure this out.
    • CW: this is definitely V1, need to clarify what happens with this. Doesn't matter what we choose, but need consistent behavior
  • #1001 to look at next time.

Streaming implementations and indirect draws/dispatches #2189

  • MM: background: starting to implement in WebKit. One observation: our GPU process is 2-3 yrs old now. "IV drip" of commands is better than a big dump of commands. Borne out in benchmarks, and we have intuition for this as well. Take it as a given.
  • MM: I'm having messages between Web process & GPU process per draw call. Instead of 1 message for submit call I have a lot of messages. Recording a draw call is its own message. Maybe too fine-grained. Want it finer-grained than 1 submit.
  • KN: FWIW that's what we do. We don't flush all of them but serialize each.
  • MM: figuring out how it'll work with indirect draws. For those, IIRC, 2 impl strategies. One that makes the most sense for oru devices is to run compute shader for validation. If we do this - trigger that lets us know whether we need to run it is the app issues an indirect draw/dispatch. Now, we have to "go back in time" and inject compute shader dispatch. How to do that when we're streaming commands to Metal as fast as web app sends them?
  • MM: web app says in a command buffer, draw this, draw that, now draw indirect. Upon draw indirect, have already sent these commands to Metal.
  • MM: most straightforward: put compute shader invocation in its own small command buffer, and whenever we submit the cmdbuf we're recording now, we submit another before it. Seems most natural.
  • MM: in order for that to work, indirect buffer can't be modified between beginning of cmdbuf and the call to drawIndirect.
  • MM: other soln proposed in the thread - think it's worse.
  • CW: makes sense. Problem: cost of splitting command buffers.
  • MM: earlier in thread - split at pass boundaries. Probably bad idea. Wrote worst-case benchmark that splits them. Real-world results probably wouldn't be as bad, but should probably still avoid it.
  • KR: Can we detect when the application modified the indirect buffer and only issue it the validation commands there?
  • KR: In ANGLE's Metal backend they schedule a small command buffer before the command, and <sorry didn't get that>
  • MM: that was the second counter-proposal in this issue. In metal the application says "draw this, that, drawIndirect". At that point it is too late to split at the beginning of the pass we're currently recording. So we need an earlier trigger. Like in a previous pass in the command buffer the application wrote to an indirect buffer. You might be doing an indirect call that uses that buffer. Won't be uncommon for applications to put an indirect usage if they applications
  • KR: issuing perf warnings?
  • MM: Safari issuing it won't have the same effect as "dominant browser" issuing them.
  • CW: nitpick. "Dominant browser" depends on the ecosystem. iOS has 1 browser.
  • MM: people don't develop on iOS usually.
  • CW: but they do test there.
  • MM: thesis: 2 paths. (1) if you issue a DrawIndirect, the indirect buffer couldn't have been modified since beginning of currently recorded cmdbuf.
  • (2) if app modifies indirect buffer, split the current renderpass …
  • MM: Kai made good point - (1) isn't very intuitive. Reasonable to say, but think it's worth it.
  • CW: WebGPU's very declarative. Declare: in this renderpass I'm going to use indirect stuff.
  • KN: exactly what I was going to say.
  • MM: if "indirect stuff" includes the specific buffers, I'm fine with that.
  • CW: do we need that?
  • MM: maybe not.
  • CW: in Renderpassdescriptor, "usesIndirectCommands". That would be the thing that causes the split. You'll need to validate stuff.
  • MM: think that fits the bill.
  • CW: a bit annoying to have to declare this, but it's OK.
  • MM: think it's better than the solution I proposed.
  • KN: was thinking we'd need to list the buffers, but don't think we do. Or, buffer up commands for renderpasses (only), and don't ship them until later. Know you don't want to do this.
  • CW: if you stream commands - draw in JS causes draw in GPU process - how do you not submit work if validation fails? How do you protect yourself against validation errors in the state? Drop the MetalCommandEncoder?
  • MM: that's the error handling mechanism this group proposed ~3 yrs ago. If any validation issue, "isError" flag is set.
  • MM: comfortable with a single boolean on each pass's descriptor. Presumably would have warning, if they say they use indirect calls but they didn't.
  • KN: alternative: Make prevalidation of indirect buffer explicit. You have to issue a command that prevalidates the buffer before you can use it in indirect draws/dispatches. Not sure it's better. Prevents you from having to do splits. But since don't need to do a lot of splits, probably OK.
  • MM: soln might not be a good one. One cmdbuf with bunch of passes, and one used indirect commands - split it up a million times. Better to coalesce.
  • CW: don't need to split. Optimization territory. Can detect if passes don't write to indirect buffer. Unlikely you'll have a bunch of renderpasses all writing to the buffer. Don't need to do the split until indirect buffer's written to. Put all validation for compute shaders into first split.
  • MM: trigger: you have this bit set, and you wrote to an indirect buffer => split.
  • CW: 2 bits. One, true->false when you set the bit on the RenderPassEncoder, adn gets set back to true when you write to an indirect buffer.
  • MM: would like some more time to think about this. Might be reasonable, but need more time to think.
  • KN: could we have both? Explicitly name the buffers, then you won't get a split, it'll inject the validation.
  • CW: no, because validation depends on the state, and the offset in the buffer. Depends on current index buffer size, for example.
  • MM: would need to have all the information.
  • KN: yes, that's right. Flag is very promising.
  • CW: if we can dot the flag, that's better for us.

Specify the importExternalTexture of WebCodec VideoFrame #2124 (Kai/Shaobo)

  • CW: are we OK spec'ing something in VideoFrame though not all browsers have it yet? Want to implement in Chrome, want to make sure would be reasonable if other browsers implement it.
  • CW: slight change to lifetime of VideoFrame. More a question of how browsers are implemented.
  • KN: for imports from HTMLVideoElement: spec'd in cautious way. Auto-detach external texture very quickly, in deterministic way.
  • KN: this works, but say video's running at 24 FPS and animating at 60, will have to import multiple times. Devs would say, should I hold on to this somehow? Copy the video into a separate texture? Browser'd need to cache imports to avoid unnecessary synchronization work.
  • KN: what can we do to make this less confusing? One way: make external textures not invalid immediately; only upon requestAnimationFrameCallback. At some point in the future, on a future animation frame, the external texture would get detached. Check every frame to make sure external texture hasn't been detached, and re-import if it has.
  • KN: timing: tied to animation frames rather than video.
  • KN: requirement: browsers can keep video frame alive for arbitrary time. Easy in WebCodecs - VideoFrame's strong referenced. Maybe cause copy-on-write if held on to for too long. Need to make sure other browsers can do this.
  • BJ: presumably if video were paused you could hold on to the frame for longer?
  • KN: yes. Various scenarios: paused, at end of video, streaming+buffering. Those frames are showing up on screen therefore must exist.
  • MM: why is this WebCodec interop part of MVP?
  • KN: it's not WebCodecs. Just HTMLVideoElement.
  • CW: we can come back to this next week. Can people talk with their video colleagues and find out what's possible?
  • MM: What does WebGL do?
  • KR: We are not going to ship it. Had to generate the exact video format to the application. WebGPU's method is easier for the developers.
  • MM: So WebGPU would be the first Web3D API that supports 0 copy interop with videos.
  • KR: The prototype on WebGL was done, good perf on 8k videos. The standardization process would have been really long, so better to spend it on WebGPU. Results show it is important for WebGPU to get it right.
  • MM: not comfortable saying anything on this today.
  • KN: possible to get input from folks familiar with your video stack?
  • MM: will talk with Dean about it.
  • CW: can talk about WebCodecs interop next week. Multiple big apps interested in WebCodecs/WebGPU interop. Interaction's really tiny. Would help these adopters of WebGPU. That's why we're interested in it for V1.
  • MM: given some feedback that non-Chrome ppl have given - would be reasonable to say that WebCodecs interop with WebGPU requires a copy for MVP, and we can revisit after MVP.
  • CW: if we decide this then we'll go to the WebCodecs group and have them spec that importExternalTexture accepts VideoFrame. The spec interaction is trivial and customers have told us that they need this urgently.
  • KR: just to point out, WebGL already accepts VideoFrame as input to texImage2D.
  • MM: that requires a copy.
  • KR: yes but it's about the spec interaction. HTMLVideoElement actually has VideoFrames internally in its impl; WebCodecs just makes this explicit. Customers need this urgently, in particular for video processing apps.

Refactor descriptors for attachment load/store/read-only #2215

Binding to zero-sized range of a buffer #686

Ensure extending fillBuffer is feature detectable #2278

  • BJ: Implementing fillBuffer, found that adding a fourth argument afterwards is not feature-detectable and would cause issues.
  • BJ: We could split functions, rename this one to zeroBuffer and later add a fillBuffer.
  • BJ: Could have a limit that says what the max value is. Or maybe we just implement fillBuffer with the fourth value right now.
  • CW: let's discuss next week, see if Mozilla has feedback.
  • RC: can have fillBufferWithValue.
  • BJ: that's one of the proposals. We already have fillBuffer, and if we don't mind putting in the extra spec work, let's do it now. What's the format? Byte? 16-bit value?
  • KN: already determined it's a byte because of Metal.
  • BJ: no practical problem from usability standpoint.
  • MM: no problem, just spec the real thing, it'll be fine.

Agenda for next meeting

  • GPUAdapter.name considered harmful #2191
  • texture_2d_depth for loading depth values might be unnecessary? #2094
  • CW: Whatever carry-over and additional stuff. If you have items you want to discuss, please add them here.
    • Expose blocking operations on workers #2217
    • Texture view reinterpretation between srgb and non-srgb #168 (comment)
    • Extension documents #2301
Clone this wiki locally