Minutes 2022 06 01

GPU Web 2022-06-01

Chair: Corentin

Scribe: Ken / others

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
Streaming implementations and indirect draws/dispatches #2189
GPURenderBundleEncoderDescriptor needs maxCommandCount if it is to be implementable on MTLIndirectCommandBuffer #2612
constrain attribute alignment value to divide device alignment limits #2505
texture_2d_depth for loading depth values might be unnecessary? #2094
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Brandon Jones
- Corentin Wallez
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
Mozilla
- Jim Blandy
- Kelsey Gilbert
Mehmet Oguz Derin
Michael Shannon

Administrivia

Corentin will miss the next few meetings (until July) due to choir rehearsal
Kelsey will chair the meetings
Corentin will help prepare agenda for the meetings
It's becoming increasingly difficult to find agenda items! This is great!

CTS Update

Please contribute to the CTS tests!

“Tacit Resolution” Queue

Use PredefinedColorSpace instead of GPUPredefinedColorSpace #2967
- WebGPU starts supporting display-p3 color space!
- If CSS later gains new enums (there are proposals), WebGPU will automatically support them.
  - KG: kind of a concern - tension between wanting to reuse enums, but we can't support things immediately all the time. If things added to CSS, most of us not participating there - won't know.
  - CW: think TypeError will handle this. For developers, easiest is - throw TypeError if you don't support the enum. Seems to be the web platform way.
  - KG: not 100% sure about TypeError, but likely some sort of error. Needs to be feature-detectable. Otherwise seems good idea.
- MM: 1) what does feature detection look like. Other option, tying this into an extension. 2) matters whether browsers will try to implement new color space across every web API together. Or will they implement it piecewise.
- CW: For feature detection: can rely on TypeError. Happens in 2D canvas for example, creating with unsupported color space. Same for WebGPU, CopyExternalImageToTexture, ImportTexture, Canvas.configure().
- CW: looking at Chromium's impl: when you add support for a new color space it sort of is automatically implemented everywhere. Probably cheap for other browsers too.
- MM: what does Mozilla think about this?
- KG: don't know. We're in initial development on the first of these - display-p3. Initial rollout is restricted to certain configurations, just WebGL, certain OSs. Expanding from there. Will gate this more. We'll push out support for this before Canvas 2D color spaces.
- MM: this matches our development practice. New colorspaces in Canvas 2D is different from WebGL.
- KN: also true for us. Once you have 2 colorspaces everywhere, adding a third everywhere is cheap. Adding colorspace support in the first place is not easy.
- MM: not sure about that given our architectural differences.
- KG: think going from 1->2 is hardest, 2->n is easier.
- MM: how's this observable? In JS can't iterate over enum members. String coming from one enum or another - no JS affected by this change. What would be changed, author looks at spec, only half of these specified enums work!
- KG: agreed. Observable. Not too worried about adding e.g. rec-2020 and people asking where is WebGPU's support for them? Think we're satisfied.
- CW: maybe no longer in tacit resolution queue - people in agreement, land it now. As we get closer to multi-browser shipment, how about a list of things we expect browsers to implement gradually. Canvas readback and this one, for examples.
- MM: super weak preference to keep these enums separate because of different implementation timelines.
- KN: do we really need to come back to this CG to add a new colorspace? Easier to just put things in our impl.
- KG: would like to have discussion in standards body. Probably will be color spaces spec'd by CSS that we won't support.
- KN: PredefinedColorSpace defined only for canvases I think.
  - https://html.spec.whatwg.org/multipage/canvas.html#predefinedcolorspace
- CW: maybe put this on agenda as regular item to discuss?
- KR: a lot of these are discussed in the ColorWeb community group. Members of this CG sit there, so think it's not necessary to duplicate the enum.
- KN: agreed, these are canvas-only.
- CW: can we agree to merge this now, and file a request to rename PredefinedColorSpace to something like CanvasColorSpace?
- KG: I'll create a new PR against another working group.
- MM: I see.
Clarify how to interpret stencilClearValue #2964
- CW: bits in stencil clear color outside the first byte?
- CW: think should be - they're masked off. (API-dependent results otherwise - a corner case.)
- KN: stencils tend to be used as masks. Nice to treat them as an array of bits. Only place native APIs don't behave this way - Intel on Metal. Rest of them have the usual behavior.
- KG: as long as we can polyfill it sounds fine.
- KN: yes, easy.
- KG: let's do it.
- MM: what are the operations on stencil values? inc/dec? Can't set individual bits?
- KG: you can or.
- CW: think you can replace.
- KN: a bunch of things you can do with stencil ops.
- CW: let's say - it doesn't matter, go with this.
- KN: we could validate it, give you error if you go outside the stencil aspect size. Doesn't seem important.
Remove depth24unorm-stencil8 #2950
- CW: not well specified, not useful right now. If we want this format, can talk about it more.
- MM: surprised you can't chop up these, and that as resolution we're removing the useful composite type.
- CW: this doesn't give more functionality than depth24plus_stencil8.
- CW: add'l functionality would be copying depth aspect to buffer. Can't do because of padding byte. For V1, wouldn't provide add'l capability, so worthless - drop it.
- MM: add'l capability is knowing the precision of the depth buffer?
- CW: I guess.
- MM: it's already optional, behind an extension.
- CW: idea here - we need to rework it, so remove it from the spec so people don't get used to a specific behavior.
- KN: not too worried about that - as written, doesn't have any extra caps, just guaranteeing the precision. Hard time imagining app would optionally use fixed precision format and then fall back in any other way than using depth24plus. Might use depth32float. When do you need exact rather than min precision? Another issue about being able to find out what size of depth24plus you got. If we need that in order to remove this, happy to do that - we resolved to do that a while back. We can get that in.
- CW: OK. If someone cares enough about not having this please comment on the PR.

Streaming implementations and indirect draws/dispatches #2189

MM: max number of draws (?) - could be really high, up to 1 billion.
CW: I don't see a resolution. Maybe I missed the meeting?
CW: having to validate this - we add a counter to draw calls.
MM: it's a single counter, super cheap.
KG: what's the proposal here?
MM: add a new limit to GPU limits struct representing max number of draws that a render pass can contain. I don't know whether that includes bundles or not. Base value could be super high - nobody would run into it. Reason for this - without knowing this, app wouldn't be able to create an indirect command buffer / dispatch. The indirection step that has to happen before the render pass has to know how many things it'll be validating. Placing a limit on this allows us to leave enough space in that validation code.
KN: not sure that normatively validating this is necessary. If you put that many draws in a render pass - already well into territory of, things will break down regardless of this limit. May run OOM, or when you execute it you'll cause a GPU hang. Having normative limit doesn't seem to make much sense.
MM: think would be OK if spec mentioned say 1000 - tell authors, don't go crazy. Telling them what they should expect to work.
KN: if we tell authors that, it won't work. Will break down for other reasons.
MM: someone brought up the idea of a GPU hang. My counter argument - industry's moving in a direction of interruptible GPUs. Individual primitive/pixel can be interrupted. Window server won't hang if you have big render pass.
KR: we're shipping on current hardware. Eminently possible to wedge the GPU with expensive fragment shader and little geometry. Can we handle this with non-normative text?
KG: even if we end up not needing this limit, having it (wish didn't have to validate it) would be fine.
CW: why validate it when you want to give a number larger than 1000 (people will do more than 1000 draw calls) - 1 million may be OOM / timeout. What kind of useful number can you give? Your command buffer's too big. Taking 1 GB to send these commands to the GPU.
KG: I'm willing to compromise on this implementation-wise. I can just not care, but if it makes other impls happy, that's great.
KN: you'll still have to validate it.
KG: that's fine.
MM: reason I'm pushing on this - this would be an architectural limit rather than in-practice, de-facto limit. Think arch limits should be listed, rather than fuzzy thing.
MS: one place I see where you'll push 1 million - renderbundles with indirect draw calls, or multi-draw-indirect in the future. Easy to exceed 1 million at that point.
CW: 1 million draw calls?
MS: yes, with indirect draws, or packaging up those in render bundles. Can easily exceed 1 million, because you're not sending much to the GPU each time.
KN: in response to myles about this being architectural limit - it is, but it can be arbitrarily large. Can set it more or less to whatever you want in your impl. You can set your limit to higher than the point where things break down - nobody will hit it because other things will break first. That's why you don't need it to be a normative lmit.
MM: I reached the opposite conclusion. Since you can set it arbitrarily - plugging it into limits, and requesting different ones if you want higher limits - this could be the first one authors raise. I'm going to use multi-draw-indirect, will use up to 10 million draw calls, raise the limit. Limits facility seems a good place to do this.
KN: that makes sense to me. Making it a limit doesn't make sense - wouldn't report this in the adapter. Then maybe should go in device descriptor. Then, why not put in render pass descriptor?
MM: makes sense to me.
KN: second part, default value is infinity. Set it internally to some really large value. I know that probably doesn't satisfy what you want.
MM: if author doesn't set this, can't validate draw commands. What should default value be if author doesn't specify?
CW: could have a counter. So many other things happening at draw time, could add a counter.
MM: each render pass could have its pre-reserved validation phase. Bigger render passes, bigger validation space.
KR: how can we move this forward?
CW: we can have the limit in Chromium. +1 validation thing. People will point at it, WebGPU has extra overhead.
KG: it's fine. Think we just need a PR adding this.
MS: From perspective of adding WebGPU support to an existing engine, we have no way of knowing this before a render pass. Can only sort of make a guess.
KG: just choosing at what point we throw away render pass we've been building, and create a new one with more stuff around it.
CW: Can you set the render pass store op late in Metal?
MM: No, it’s part of the descriptor.
CW: had been hoping we could just split render pass - tiled hardware has to flush the render pass for example, we can do that too.
MM: right.
CW: I guess we can take the limit. For your case Michael someone has to take a guess somewhere - guess it'll be you, you can say 10 million, nobody will do more than that number of draw calls.
MM: is this a limit or something in render pass descriptor?
CW: think we said render pass descriptor.
KN: think so.

GPURenderBundleEncoderDescriptor needs maxCommandCount if it is to be implementable on MTLIndirectCommandBuffer #2612

MM: easier - render bundles are opaque. "Bake" step - finish(). There, because it's opaque - will want to "optimize" the indirect comamnd buffer (can compact commands). Think it's reasonable to concatenate a bunch of indirect command buffers into one big one. After you bake it, have one big optimized indirect command buffer. Before that, have vector of indirect command buffers.
MM: downside- can't concatenate indirect cmdbufs on the CPU. finish() is cpu-side. Don't concat the indirect cmdbufs together until first use after baked. Whichever queue it's used on first - issue those concat commands on, just before cmdbuf gets submitted. Just so we can call GPU function to concat buffers together. Don't have CPU side function to do it.
MM: little complex, but think it's OK - can close this.
KN: can you just issue in finish()?
MM: don't have a queue.
CW: use default one?
MM: not great to do so. Issue on other queue while being baked on first queue? Confusing.
CW: OK. Happy to close this one.

constrain attribute alignment value to divide device alignment limits #2505

CW: in WGSL can specify almost-arbitrary alignment for types. Uniform buffer is structure with 1 KB alignment, for example. Later we validate that alignment is 256 bytes. Issue at intersection of API/WGSL.
CW: Kai proposed it's fine if the two don't match - alignment from WGSL shouldn't spill into the API.
KN: reread the conversation here. Would like confirmation, but - no reason alignment spec'd in WGSL have to bubble out to address alignments at top level (once resolved into memory pointers). Only thing that's an actual limitation - some types require certain alignments, but those are fixed. Highest alignment required by a type is 16 bytes. As long as this alignment limit's more than 16 bytes - don't think there's a conflict between WGSL, API and limit. Could issue warnings - didn't intend this - but maybe you did intend to. Don't think it's necessary. Don't think we really have this problem where requesting lower limits would break existing code. Think we can just allow the limit to be lower.
CW: what about adding normative text to Adapter min storage buffer, say these min alignment has to be at least as big as max alignment for built-in types in WGSL?
KN: we should.
CW: hearing not caring too much - can resolve this issue.

texture_2d_depth for loading depth values might be unnecessary? #2094

CW: on Metal we can do this.
MM: recap: in shading language Metal has depth_texture type. Spec says, you can't attach any depth texture to a type that's not depth_texture. But, Metal's the only SL that has this requirement. De facto true that if you have a depth texture, you can sample from it. Contrary to the spec, but de facto true. Should WGSL have a depth texture type?
MM: if you allow depth textures to be bound to non-depth-texture types, every non-depth-texture operation needs to work. We know sampling works. Not the only operation that color textures have on them. Can also gather. Think that's it. Other things like getting dimensions, but shouldn't be a problem.
MM: months ago did some research. With gather, can get the w component - depth texture doesn't have this. Well defined for non-depth textures, but not for depth textures. "Gather w component from depth texture" - tested on 2 devices, pretty different results, the values didn't match my intuition.
MM: is that OK? Garbage coming out from gather operation? Just say it in the WGSL spec? Gather from depth texture bound to color type, you get garbage? Not sure if it's safe though. Could texture unit return stuff from registers before? Not a lot of confidence this would actually be safe. But no evidence it wouldn't be.
CW: great summary, thank you.
CW: do you know if - maybe ask on the issue - are texture swizzles possible?
MM: swizzles don't solve this - any texture compatible with texture swizzles can't be used as render target. Depth textures useless if can't be used as render target.
KN: wouldn't it be two views, one with swizzle applied you use to read, and another used to render?
MM: no, it's the texture itself.
KN: I don't fully understand but that's not important.
KR: think it'd be fine to leave the w gather values undefined. Had similar issue with depth textures in WebGL - only guaranteed channel is R, and it's worked fine.
KN: iirc we also have a workaround, could recompile at use time. Pipeline would know it's a depth texture bound as color channel.
MM: options: 1) type in SL. 2) no type in SL, but annotation in BGL. Valuable because we know at pipeline creation time. 3) neither of those.
KN: oh right, we're discussing 3). workaround only works in option 2).
MM: right, deciding between 2) and 3).
CW: maybe revisit this. Ideally option 3) is best. As Ken said, web has evidence that this is safe, so probably just do that. If you don't mind offline could you send Metal doc that you can't swizzle renderable texture?
MM: don't know if it's written down. Wrote test program, it failed validaiton.

Agenda for next meeting

Next meeting is APAC. Will ask APAC folks if they have agenda items.
This one #2094 since we didn't finish it.
Please put other items here, will use for the next meeting.
Create a shared registry for vendor/architecture values that WebGPU may return. (#2990)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2022 06 01

GPU Web 2022-06-01

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

Streaming implementations and indirect draws/dispatches #2189

GPURenderBundleEncoderDescriptor needs maxCommandCount if it is to be implementable on MTLIndirectCommandBuffer #2612

constrain attribute alignment value to divide device alignment limits #2505

texture_2d_depth for loading depth values might be unnecessary? #2094

Agenda for next meeting

Clone this wiki locally