Minutes 2021 06 21

GPU Web 2021-06-21

Chair: Corentin Wallez

Scribe: Ken, others

Location: Google Meet

Tentative agenda

CTS updates
Investigation: Compressed Texture Formats #144 (decide what extensions are guaranteed to be supported in WebGPU)
Storage buffer access: compatibility with the pipeline layout #1837
Figure out whether indirectBuffer OOB is a validation error #1834
Remove the read-only texture storage bindings #1794
- Allow read-write storage on selected texture formats #1772
- "Sampled" is ambiguous in the spec #1785
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Kai Ninomiya
- Ken Russell
- Shrek Shao
Microsoft
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Francois Guthmann
Michael Shannon
Mehmet Oguz Derin
Timo de Kort

CTS updates

CW: Pretty comprehensive vertex state tests have landed.
- Dawn's going to use programmable vertex pulling for robust buffer access on Metal. Imp't that we have robust tests of that code path.
CW: been a bunch of other smaller tests. Buffer, array init, …
KR: Is there a place with starter CTS tasks?
KN: we have a few. Can share them somewhere. We don't track testing items on Github - need them in-repo. No easy way to link to them. But can try to make a list of them.
KR: Hoping more folks can contribute to the CTS. There's a lot of medium priority tasks on Chromium side that we'll get longer to get to, so please chime in!

Investigation: Compressed Texture Formats #144 (decide what extensions are guaranteed to be supported in WebGPU)

CW: biggest question: what does WebGPU guarantee? Biggest discussion - has to be a choice of BC or (something else). The something else is the question: ETC? ETC or ASTC? ETC and ASTC?
- MM: from application writer's point of view does it matter? Impl has to support BC or something else. App will try BC, if not that, something else, etc.
KN: not trivial for app to support a format unless they support transcoding. lot of pipeline work for each compressed texture format. Being guaranteed that they only have to support 2 formats would be a benefit.
JG: ppl often don't want to make runtime decisions about these things. Asset generation-time thing. Mobile assets block, desktop assets block. Baked into quality matrix. Ppl avoid compressed textures because there are too many unknowns. IF they can't count on it it's hard. Looking at WebGL specs you have ~5 formats you might need to support. Being able to commit to, "it's just these two", app wouldn't need to supply uncompressed version.
BJ: even in cases where you're doing runtime transcoding - i.e., everybody does Basis everywhere - can transcode to ASTC or even RGB - you have a guarantee in terms of memory estimation if you have a guaranteed format. 8bpp, 4bpp. Alternative, need to plan for 8x more memory usage in uncompressed path. If I know my mem usage will be bounded, runtime transcoding benefits.
KN: decreases number of configurations people have to test.
KN: I wrote a comment on the issue, these are the things we need to decide. https://github.com/gpuweb/gpuweb/issues/144#issuecomment-858106611 .
KR: Even encoded formats benefit from targeting what the format was compressed for. Targeting ETC and going to ASTC can produce ringing. Let's expose what the hardware supports.
KN: don't' think there's a good reason not to expose ETC when it's available. Question is about ASTC. Not sure it reduces the feature support matrix to require it.
MM: our goal is to support all the formats our hardware supports. We don't plan to transcode on the fly. As long as we can do that, we don't have a position here.
CW: looking in terms of hardware support: seems WebGPU can mandate either BC or ETC and ASTC. OpenGL ES is a different story.
KR: thought Intel was pressing for ASTC on desktop.
CW: That would be fine. either BC, or both ETC/ASTC.
MM: on Apple Silicon we support the compressed texture formats from desktop but also the mobile platforms, so we'll want to support everything. Big effort by hardware team.
JG: super appreciate that. :)
JG: would be nice if our requirement was ASTC rather than ETC.
KN: strongly suspect ETC would be huge pain for a lot of people. Don't think it's worth it. Not a lot of work for us - just passing bytes through. Requiring ASTC would be nice. On mobile, require ES 3.2 class hardware.
JG: support for ASTC was high before it was put in the ES core. Wish we had more coverage data.
CW: think we have consensus that the min spec is either BC, or ETC+ASTC.
JG: would propose requiring ASTC for core. And consider non-ASTC support basically for portability.
CW: it is the option alongside BC. Proposal is, if browser doesn't advertise BC support, it has to expose ETC+ASTC. It's one of the proposals, it'll be marked consensus after this meeting. Call for someone to write this PR, and the ETC and ASTC extensions?
- JG: is it quick? I could write it.
- KN: PR is a bit of work. Have to find block sizes, etc. Not that bad, though. Just ASTC LDR 2D.
- JG: I'll do that.

Storage buffer access: compatibility with the pipeline layout #1837

CW: in spec right now, one place where we can promote RO storage buffers to storage buffers, and another place we can't. Implementation details leaked into the spec. When you have RO storage buffer in your shader, the spec says it can't be promoted to writable by the layout. Validation error. But creation of implicit bind groups - 2 bindings, on the same bind point - then RO storage buffer gets promoted to writable storage buffer. Inconsistent. DM's suggestion was to change implicit layout to not do that, match pipeline layout tables.
MM: thought in WGSL call we decided it was a bad idea to allow one buffer to be bound to two different bind points?
CW: if it's RO it's not a problem. We discussed this last week in this call. About aliasing.
MM: if you have one buffer bound RW and RO, write into it, then read from the RO binding, some devices will behave one way and others another way.
CW: should not be allowed. Should fix the spec. Can't have two RW bind points pointing to the same storage buffer.
MM: could say the use case isn't enough to justify the pain. Either bound multiple places as RO, or one place as RW.
DM: think it's complicated. Usage scope may contain multiple different draw calls in multiple different bind groups. Do you need to define something where storage buffer is seen by the same shader? Or same usage scope? Becomes tricky.
CW: tricky to define for render passes. For this patch, some people might want huge buffer they see subranges of, but don't know how big the ranges are. Or allocate memory out of a bigger buffer. Bind multiple times at different offsets, shader doesn't know.
DM: if there is a restriction it'd be written explicitly and not piggyback off usage semantics? Because they wouldn't handle it.
MM: yes. And validation handled differently, while encoding. Bound the buffer different ways for different draw calls. Not objecting; think it's probably the right way to go. But different level of validation.
DM: current validation collects usages over the usage scope. Goes through command buffer & tries to find conflicts. Validation you mention will be similar, but more detailed.
CW: concern - could imagine order-independent transparency, if we require bind point always to be the same for the buffer - at end of pass you collate OIT results, but need to use same bind point.
MM: can't do that in the same render pass.
CW: ah, right, data race. Still - have different materials or something, bind point happens to move. Weird we require something to be the same bind point for the whole pass.
MM: you're imagining - bind RO buffer at bind point 7, then later same buffer RW at bind point 8, that should be valid?
CW: yes.
MM: that makes sense.
CW: at least for render passes - you're opting into data races. Sort of the point of GPUs. You opt into data races and if you do it well you get performance.
MM: think you convinced me that this is essentially the same thing as binding the same buffer at 2 different bind points in the same draw call.
CW: ok. Little difference now - dispatches in compute calls.
MM: that's different. Scope's a single dispatch call, so not a big deal.
CW: trying to think of TF.js/WebGPU - would they suballocate tensors inside a bigger buffer? Shader's compiled not knowing which bind point goes into which heap. Issue for composability? Don't know if it's practical. Have something that writes 2 tensors, or similar.
CW: think we've solved the issue in the PR now. But derailed into whether storage buffer aliasing is allowed. Let's think about it offline, and think about storage buffer aliasing for dispatches later.

Figure out whether indirectBuffer OOB is a validation error #1834

KN: last comment on the issue proposes 2 strawman options.
KN: q is: have different draw call variants. Indexed, indirect, etc. Each could validate certain things. Like non-indexed can validate draws are inside the buffers. Indexed can verify that the call's inside the vertex buffer, but not whether the indices are in range. Indirect, similar issues.
KN: do we want to always validate when it's easy to do so? Or omit it for consistency with the other entry points? Could do it all the time, or never, etc.
KN: another level for the draw count buffer, when we have multi-draw indirect.
KN: editors discussed this. While ago, we decided not to do any of this validation, but no-op the draw calls so it's sufficient. But not sure whether this consistency is useful. Think we should throw validation errors.
MM: agree consistency across draw calls isn't useful. Consistency across impls is. For indirect calls; our plan is to implement that with a tiny compute shader, so no-op'ing the draw call will be required for our impl, so should be allowed semantics.
KN: only adding validation for the ones where it's trivial, where the data's on the CPU.
MM: q is, should there be an error reported in ErrorScope.
KN: or the call's clamped. Or get warning but not error.
JG: potentially more impl complexity to do things two ways. Could handle direct draw calls differently than indirect ones.
MM: difference between reporting error in ErrorScope and no-op'ing draw call is immaterial. Scary when one impl clamps the draw call and another doesn't.
KN: we'll already have that issue for other things. DrawIndexedIndirect - what if indirect call goes outside bounds, what if an index goes out of bounds. Would only affect the trivially checkable draw calls.
MM: OK, so once you're doing vertex fetch in your shader you can't no-op the draw call.
CW: we're going to clamp Draw and DrawIndirect but if you go out of bounds on the index buffer we'll no-op (DrawIndexedIndirect, if you go out of bounds on the index buffer).
KN: other impl could build segment tree for the index buffer.
MM: we could actually check these things, because we have the data structure.
Discussion about segment tree generation and validation on the GPU.
CW: based on this discussion - behavior on OOB stuff will differ anyway. Should differ, if we want impl choices. Q is: what part do we agree on forcing to be done in a certain way? Think validation of indirect buffer range is trivial.
KN: have to do it anyway - just question of whether we generate error or not.
MM: right now you're saying e.g. indirect buffer has to be min 16 bytes long.
CW: other one we can do on CPU side - DrawIndexed - validate range of index buffer, raise validation error. 3 arithmetic ops.
KN: all of these, have to do the checks for security. Then do we no-op, or raise validation error? No perf different.
DM: then we should definitely do validation error.
KN: maybe cases where you'd upgrade to indirect call and wouldn't do it on the CPU.
MM: when would that upgrade be done?
KN: hypothetical only.
MM: might do it for e.g. bundles. Haven't thought about it.
CW: even then, have all the state on the CPU. When DrawIndexed is called, you have the index buffer.
MM: indirect buffer too short -> DrawNotIndexed, vertex buffer too small, same class. Few classes of behaviors. Should this class succeed with no side-effects, or fail with an error message? Think impl complexity is roughly equal. Error messages sound better than no error messages.
DM: does Myles' point include vertex ranges, index ranges, etc.?
MM: my point is that all of those are in the same class and shold be treated identically.
JG: feels like, if no philosophical reason...requiring an error can lock us into certain impl choices today. Probably quick to check if vertex
KR: Having actual validation errors make it easier to write CTS tests because you only need to check for the validation error, not a range of behavior.
JG: But we have parts that don't have validation errors.
BJ: the parts of hte API that are easier to validate are the parts that beginners will hit first. Surfacing reasonable error messages there are worthwhile even if we can't produce those errors for harder calls like indirect ones.
JG: wonder whether we can offer same fidelity for indirect calls.
KN: we could issue warnings, then not lock ourselves into behavior later on.
BJ: could enable this with a feature. That's the Vulkan robustness path. Enable certain feature to enable more errors, without locking behavior in.
KN: if we require error now, we can take an error out later. Didn't think of that. We generally consider it non-breaking to take an error out.
MM: that's an argument for making it an error today.
JG: yes.
DM: seems we're mostly in consensus. Let's move to PR.
MM: if I understood - later, could have extension to opt into more deep validation?
BJ: would be excited to see that. Non-trivial, but having great validation at perf cost would be great.
MM: worth standardizing?
JG: in FF we have prefs to put more perf warnings in your console. Good to have. Useful. Developers can solve their own problems.
MM: think we have consensus for issue at hand.
KN: yup.

Remove the read-only texture storage bindings #1794

Allow read-write storage on selected texture formats #1772
"Sampled" is ambiguous in the spec #1785
DM: CW and I discussed. CW not happy with removal of RO usage. I agreed to see how it'd look if we kept it with the new semantics. #1864 shows it off. Only compatible with itself. Not compatible with sampled, or WO storage. Available only on some texture formats. Texture format caps adjusted to show it.
DM: for example: RG32, don't have RO storage. R32 do. RGBA also do have.
DM: use is vastly more restricted than what there is today. If ppl still want to use them, they can. Main motivation for using them - make the driver use different caching policy on resource. Lower latency, less caching.
MM: any numbers on that?
DM: I do not. Would be equally happy just removing them. Having numbers would be nice.
MM: can we add these in later if we determine they're valuable for perf?
CW: Yes. Other reason I think it's sad to remove them - think we'll want this as soon as we want RW storage textures. Caps will be there. For symmetry etc, think we'll want RO storage textures. Have W, RW, and RO.
MM: think symmetry is weak argument for having an API.
CW: true, but - intuition is there'll be a use case for it.
DM: interesting you're saying when we have RW storage textures. I assume it'll be needed less then. Assume people will use RW caps.
CW: I remember why - in the future we'll want to optimize impl based on that. Right now, in compute passes, heavy use of barriers. Our impl hasn't optimized them. If you have RW texture, have to sync between every dispatch using that texture. Shader could have written it. RO texture - don't need to sync between subsequent dispatches. Better capture the intent of the author, so we can better optimize on their behalf.
JG: one idea for that - even if marked RW in shader, we can tell in shader whether it's statically written to at all. Could cheat a bit. It's RW, but if you don't read from it or write to it, we don't insert certain barriers. Would be weird if we did validation based on that.
DM: don't think that will affect validation. Can already have 2 things in flight using RW. Our storages are composable.
JG: OK.
MM: in Metal, in pipeline descriptor you say which buffers are readable/writable, but not textures. If you were doing this for buffers, thumbs up - but for textures, it's unneeded data for us.
CW: we have this already for buffers (RO storage buffers). Isn't there a thing in Metal compute encoders where I'll do my own barriers?
MM: there is.
CW: we'd take advantage of that.
DM: Jeff's point still stands. Can take advantage of it without extra APIs.
CW: more brittle for developer. Can't use type system to enforce it for them.
JG: some ppl do/don't write const-correct C++ code. Make it harder to accidentally write something.
DM: would rather see feature in WebGPU to allow compute synchronization scope to be widened to the whole pass. Would address same thing from a different angle.
CW: putting the onus on the developer. (...)
MM: thing that would motivate me - if somebody provided hard evidence that'd be compelling. Like, I wrote a program two ways, and it's faster at runtime because of lack of barriers or something.
CW: we wouldn't be able to do that for a while.
MM: not in WebGPU, like in Vulkan.
JG: or D3D.
MM: if numbers show this is valuable, thumbs up.
JG: could we not add this later?
CW: we could. Easy to write synthetic benchmark.
MM: I'd be convinced by moderately synthetic one. Even if trivial, but reasonable, workload.
DM: impl can optimize all the things you'd do with the API. Use case you show won't prove anything.
MM: by looking at the shader you mean?
DM: yes.
MM: agree argument's not fully compelling. Also agree that magic perf cliffs are bad. If big perf cliff, want it to be part of the API. Tiny one, authors can just fall off it.
CW: I'll think about what benchmark we want here.
MM: doesn't have to be elaborate. Just couple hours' work.

Agenda for next meeting

Storage buffer aliasing in dispatches.
- DM: think it should be discussed more in WGSL group than here./
- JG: OK. Is there an issue for that in particular?
- DM: think Alan Baker had one, not sure of number.
DM: limits? Unless we want to resolve them in Editors.
- CW: can probably resolve offline
- KN: agree.
CW: maybe cancel next week's meeting?
- DM: yay - we have a week off at Mozilla next week. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2021 06 21

GPU Web 2021-06-21

Tentative agenda

Attendance

CTS updates

Investigation: Compressed Texture Formats #144 (decide what extensions are guaranteed to be supported in WebGPU)

Storage buffer access: compatibility with the pipeline layout #1837

Figure out whether indirectBuffer OOB is a validation error #1834

Remove the read-only texture storage bindings #1794

Agenda for next meeting

Clone this wiki locally