Minutes 2021 08 16

GPU Web 2021-08-16

Chair: Corentin

Scribe: Ken

Location: Google Meet

Tentative agenda

Administrivia
- APAC friendly meeting and office hours?
CTS update
Should we constrain the location of user input-output stage variables WGSL #1962
Pad implicit mapping sizes up to multiples of 4 bytes? #2010
Do pipeline blend states need to be validated against the pipeline render targets? #2025
(stretch) Problem with the default queue #1977
- Associate command encoding with a queue #1278
Agenda for next meeting

Attendance

Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Shrek Shao
Microsoft
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Francois Guthmann
Michael Shannon
Mehmet Oguz Derin

Administrivia

APAC friendly meeting and office hours?
MM: how about trying a few meetings and see how they go.
CW: when does CSSWG meet?
MM: 4-5 PM California time.
CW: OK. Can't attend these, but will set these up for early September. JG could you chair these?
JG: sure.
CW: 4-5 PM first Monday of the month, for couple of months, let's see how it goes. Intend to put stuff on agenda things for people who can't attend the regular meeting.

CTS update

(outstanding tasks: https://bugs.chromium.org/p/dawn/issues/list?q=Type%3DTask-CTS&can=2)
KN: backlog of pull requests to review. :) Lot of contributions from Intel (thanks!).
KN: possible testing-related thing people could do - getting tests running on other impls?
KN: lots of testing work going on. Probably will see velocity going down a bit, because we've been pushing to get tests in place before branch point for Chrome's Origin Trial. However still tons of testing work to do.
MM: when was branch point?
- KN: last week.
- MM: congratulations!
- CW: fingers crossed we won't have to disable it. :)

Should we constrain the location of user input-output stage variables WGSL #1962

CW: Jiawei found in at least Vulkan - limit to number of user input-output variables between vertex and fragment. Also has to take builtins into account. At least on Vulkan, locations need to be constrained to valid range - if you can only have 16 user I/Os, they need to be allocated between 0-15.
CW: should we bring this constraint into the WGSL spec? Or assume impls will renumber user I/Os?
CW: note that renumbering likely doesn't need pipeline time shader recompilation. We require I/Os to match between vertex and fragment. If you see vertex entry point, you know exactly what kind of fragment it will be used with and can do the renumbering there.
MM: downside of renumbering?
CW: complexity in impl. Also: if in future we say we relax the fact that user I/Os have to match exactly, then renumbering will sometimes need to happen at pipeline compilation time.
MM: sounds like an argument to do the renumbering now.
DM: it's not an argument for doing the renumbering now.
CW: vertex output/fragment input interfaces matching exactly. If in future this restriction is relaxed, it'll be impossible to do the renumbering for all kinds of shader interfaces.
MM: realistically that's if we want vertex to output more data streams frag takes as input. Then this change'd be incompatible because we'd need to do shader compilation later.
CW: yes. Only tradeoff/downside I can see.
MM: complexity seems low.
DM: this closes the door to allowing vert/frag to mismatch. DXIL requires interface to match? Or just HLSL?
CW: it's D3D12.
DM: the idea is that we want something to be generated at shader module creation time. If doing everything at pipeline creation this restriction doesn't make sense.
CW: D3D12 pipelines fail creation if you provide vert/frag that don't match I/Os.
DM: you can fill them out at pipeline creation if that's where you're generating your shaders. This restriction exists because we want to generate DXIL at shader module creation and patch it later. Seems best for now to restrict the locations because it keeps the doors open.
CW: the flexibility loss seems significant. (? is this correct?)
JG: Feel it's minor flexibility to use arbitrary numbers.
KR: Saw this in WebGL applications that drop fragment inputs without dropping the corresponding vertex output.
MM: makes sense - author wants to swap out fragment shader, but only uses a subset of the vertex shader output.
JG: guess we could do it at pipeline creation time. Hard to do at shader module creation time.
CW: from this discussion - seems likely people will want flexibility to have the fragment input be a subset of the vertex output in the future. Because of incompatibility with renumbering, maybe we should take the constraint on the locations for now.
MM: there was distinction between values having to be less than 16 but sparse, and allowing them to be > 16 but sparse. Interested in that distinction. If we wanted to support one but not the other would that be impl'd any differently?
CW: I think if we say they can be less than 16 and sparse then we don't have to do anything. Can just pass the values to the driver. No API that requires the locations to be dense starting from 0.
MM: even DXIL?
CW: that's the only one I'm not sure about. DXIL should be better than DXBC, so if DXBC allows it then DXIL should too.
MM: OK. So those two options are distinct?
CW: yes.
CW: I'd like to suggest that we take in the restriction, and can lift it later if we feel that we can.
KR: pointed out history where OpenGL allows vertex output and fragment input to mismatch, and we tried to maket shi stricter in WebGL but found there were a lot of applications being ported from OpenGL to the web that we broke. We reversed the change to allow vertex/fragment interfaces to mismatch.
DM: Dealing with a lot of API limitations for our different targets. OpenGL has triangle fans that we don't. It seems natural to take the limitations as a starting point.
GT: pointed out debugging case - you comment out a function call and a fragment input is no longer used.
CW: fair point. Think less likely to comment out fragment inputs in the interface though rather than just their use in the shader. May not remove static use though.
KN: argument to entry points are considered static use.
MM: thanks DM for adding that!
CW: getting back to the restriction on the locations - think discussion shows that we don't know if we'll allow subsetting, or with arbitrary numbers, or both. Implications of when shaders get compiled. Think we should take in the restriction on the numbering and see where we get.
MM: think that's fine. Think we should figure this out before the first browser ships. Seems natural for authors to want to do this. Would be unfortunate if we shipped a version of WebGPU where they have to match and another where they don't, and authors have to do feature detection.
RC: D3D team said you can subset as long as the fragment shader consumes the first ones (can't skip in the middle of the locations).
JG: so you can't skip the second one, only e.g. the 5th one.
DM: even skipping the last one is problematic. We'll put SV_Position at the end. Shouldn't rely on the consecutive semantics.
RC: maybe things were more relaxed in WebGL because there was a link phase.
CW: shaders are compiled at draw call time so WebGL could fix up everything.
MM: I'll open an issue and we can move on.
- https://github.com/gpuweb/gpuweb/issues/2038

Pad implicit mapping sizes up to multiples of 4 bytes? #2010

KN: so far think mostly DM and I have discussed this. Right now we allow buffers to be created with any size, down to a single byte - not 4 byte aligned. But many things you can't do with those last bytes. mappedAtCreation - can't create buffer with unaligned size. mapAsync uses size of buffer by default - will fail unless you explicitly set size, and you won't be able to use last 1-3 bytes.
KN: talked about restricting buffers to be size-aligned to 4 bytes. Wasn't appetite for that.
KN: talking about papering over this corner for mapping. If mapping were allowed to go couple of bytes past end of buffer, to next multiple of 4, would work as expected. Couple of bytes past end you could write to, wouldn't go into the resource.
KN: need to figure out the semantics for that. But would make it so you can do mappedAtCreation for last few bytes of these buffers. There are GPU commands that can access the last couple bytes of buffer. Buffer->texture copies. Need to make sure validation rules don't let you access those bytes. Might only exist in the mapping.
KN: could see some pitfalls. E.g. doing direct mapping with pointer from GPU: would have to round up actual resource size to multiple of 4. Probably OK, but need to make sure last couple bytes not accessed by API calls.
KN: interest in trying to paper over this weird corner, so we don't have an annoying corner case in the validation?
GT: do you need to make sure they're not accessed? What about known value?
KN: known value would work. Also considered rounding sizes up to 4 bytes. Don't think significantly any different than validating things are multiple of 4 bytes.
KN: potentially complicated thing. Would want to implement before spec'ing. Might run into more corner cases.
MM: if I make buffer size 3, then mapAsync, what happens when I write into the last byte?
KN: no specific proposal yet. Think that byte would be preserved - if you map again would see it - but GPU validation wouldn't let you access the byte.
MM: OK, so would prevent you from mapping the last parts of the buffer.
KN: right, unless you padded buffer allocations to multiple of 4.
MM: reason I didn't like rounding up buffer allocations - author confusion argument. Think different from this case. Buffer's still size 3. Silently make the mapping commands succeed where they would have failed.
KN: right, not visible to user except extra byte.
MM: and under the hood the buffer would be padded?
KN: yes, but they wouldn't be able to tell.
MM: well, map, write, unmap, map again - value would be preserved.
KN: right, but couldn't tell GPU resource was rounded up.
MM: what's the downside of preserving resource size?
KN: seems complicated to spec the semantics. Rest of buffer's preserved. Think extra padding byte should follow the same rules so we don't have to worry about it.
MM: are you proposing - if I make buffer length 3 then explicitly map size 4 that should fail? But if I leave off the 4 it'll succeed with length 4?
KN: thought it would be allowed to map explicitly with length 4. mapAsync, etc. would be allowed to go past end of buffer up to next multiple of 4, explicitly or implicitly. Observable. Only affects mapping.
MM: I don't hate the idea.
CW: would it affect robust access behavior of vertex shaders?
MM: not if you padded the length of the buffer. Robust buffer access wouldn't be triggered.
CW: but then other things could see the data.
KN: don't think that's a problem. Thought about it. Most places we say, robust buffer access has to stay within bounds of binding.
DM: no, that's not the case. RBA works at the scope of the whole buffer, not the binding range.
CW: either way, if there's some specific robust buffer access behavior, or if alternative is that you map, set bytes, remap and bytes are gone - think both cases are fine.
KN: think robustness behavior doesn't matter that much. OK if you see those extra few bytes. Annoying to write in the spec. Doesn't matter in practice.
KR: unless users start relying on undefined behavior.
MM: think that's beauty of this proposal. Size of binding has to be less then e.g. 7, not the rounded-up 8.
KN: but if you access last byte somehow and that incurs robust buffer access it'll work on most impls but not all.
CW: think browser impl is allowed to put 0s there. Next time you map you get 0s.
KN: maybe better to put the 0s there.
DM: you can't bind a buffer that's mappable. Problem's if you copy, etc. Not mapping with binding.
MS: why can't you return the 7-byte getMappedRange?
CW: we require mappings' offset/size are 4-byte aligned.
KN: offset because of typed arrays because largest typed array size is Float64. That's not the reason for the size restriction though. Think that came from a native mapping restriction.
CW: think it came from copies.
KN: yes, makes sense. Often have to copy out of the staging memory into actual buffer.
MM: and can't copy 7 bytes?
CW: no, unless you want to write a compute shader. Fetch-modify-write. Metal e.g. requires using words. (Also Vulkan, maybe D3D12)
MM: so we'd map 8 bytes but expose 7? Last byte would be undefined? Copy, and in order to get 3 remaining bytes we'd have to copy 4? Now you get garbage in your buffer because we had to copy undefined data into it?
CW: that's why I think we can say there 0'd out.
KN: not garbage - what the user put in there. Would spec it gives 0s.
MM: if buffer's big and you try to map 7 bytes…
KN: can't map 7 bytes. Mapping size has to be multiple of 4 so you don't have that problem.
MM: I wouldn't object to this.
DM: is this complexity that we're opting into? Can we just say we don't want MVP with this complexity and we'll consider later?
KN: probably.
MM: trick is - we expect authors will want to use halves - they work really well on our hardware. You'd want to create a buffer of size 6 and map all of it.
CW: we've run into at least 2 users running into this with mappedAtCreation.
JG: but is this something users will run into all the time or just the first?
CW: say you're Sketchfab, test models have all even number of 16-bit verts.
JG: less worried about people working with content they didn't expect - more concerned about portability bugs between browsers/hardware.
MM: think there are more issues - calculate size of buffer with complex math, then map the whole thing -
KR: think we should just round up the buffer allocation to 4 bytes.
CW: slight testing concern - buffer copies and ensuring we can touch all bytes.
MM: think it's user-hostile to be allowed to allocate buffer of size 7 then bind to size 8.
MS: so right now when we allocate buffer we round up?
KN: I'd prefer slightly - instead of implicitly rounding up buffer sizes, validate that they've been rounded up to size 4. Adds new validation rule though, annoying.
CW: we've spent 25 minutes on this. How do we move forward?
KN: unless we want to add new validation rule, think we can just table this until post-1.0.
JG: situation now - even if nobody likes it it's probably tolerable. Not enough consensus to do something different.
KN: only breaking change here would be validation that buffer sizes must be a multiple of 4. None of the others are breaking changes. If we don't want to that, we can make other changes whenever we want.
CW: OK. Table until post-MVP or post-V1. See how many people are cut by the paper cut.
KR: (didn't say this) think we should call this out in WebGPU notes to developers - esp since a couple have run into this already.

Do pipeline blend states need to be validated against the pipeline render targets? #2025

BJ: think KN or SS know more about this.
KN: fairly straightforward. Probably need to get info from somebody - like D3D folks. Blending state can read from destination alpha. Dest format might not have alpha. What happens when you read from it? One indication that it might be uninitialized data (what you get when you read from src alpha and don't output alpha from fragment shader).
CW: can we say dest alpha is ==1 always? Muck with pipeline blend states we give to target API?
KN: think it'd have to be 0 because have to use dest blend factor of 0. Think it's possible.
CW: if it's not there, pretend it's 0.
DM: why would we do this instead of validating it?
CW: less validation.
DM: if something doesn't make sense to do we shouldn't let it work somehow.
KN: should check backend APIs. Couldn't find anything in D3D/Vulkan about this, even if it's uninitialized. There's spec text about srcAlpha but not destAlpha.
RC: can check, but generally agree with DM. Patching at runtime seems like a lot more work then validating and forbidding it.
KN: generally agree, but if there's generally accepted solution in the hardware, we should use it.
MM: little sample program that triggers behavior you're seeing would help. Would like someone to write that.
KN: simple. Pipeline with color blend state that uses a factor that uses dest alpha, but attachment doesn't have an alpha channel.
MM: can forward that on.
CW: let's investigate and see what Vulkan does. Someone on Google side can do that.

(stretch) Problem with the default queue #1977

Associate command encoding with a queue [#1278](https://github.com/gpuweb/gpuweb/pull/1278)

Agenda for next meeting

#2025 - one we just discussed.
Want to revisit default queue? #1977
- DM: don't know yet.
If we don't have many topics, maybe should make this a WGSL meeting.
DM: sounds good.
CW: JG can you ask if WGSL group wants the meeting slot?
JG: yes, but generally don't like having back-to-back WGSL meetings. Lot of things that come out of meetings we can't actionably revisit the next day. Don't think we have enough new content for 2 full hours. Prefer to cancel here.
CW: I'll take AI to start null bindgroup vs. unsetting bindgroups. Also please add agenda items on Github project.
JG: in particular, with an eye towards what we're missing for MVP. And after that, MVP -> V1. Even things that'd be difficult discussions, we have to wrap up.
CW: SecureContext. :D
MM: chance I'll finish investigation of counters on Apple Silicon by next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2021 08 16

GPU Web 2021-08-16

Tentative agenda

Attendance

Administrivia

CTS update

Should we constrain the location of user input-output stage variables WGSL #1962

Pad implicit mapping sizes up to multiples of 4 bytes? #2010

Do pipeline blend states need to be validated against the pipeline render targets? #2025

(stretch) Problem with the default queue #1977

Agenda for next meeting

Clone this wiki locally