Minutes 2021 07 26

GPU Web 2021-07-26

Chair: Corentin

Scribe: Ken / Corentin / Kai / others

Location: Google Meet

Tentative agenda

CTS update
Rename SHADER_READ back to SAMPLED #1963
Add opaque region or isOpaque hint #1871
Stencil reads are S,S,S,S or S,0,0,1. #1978
Add multi-draw-indirect feature. #1949
(stretch) Request for a GPUCommandEncoder fillBuffer operation #1968 also #28
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Corentin Wallez
- Kai Ninomiya
- Ken Russell
- Shrek Shao
Microsoft
- Damyan Pepper
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Michael Shannon
Mehmet Oguz Derin
Timo de Kort

CTS update

KN: Had a lot of tests contributions, but in the tasks we track, there's just a TON of tests to do. Would be great to get help.
JG: Would be great to have trello, or list somewhere?
KN: will send the Dawn bug tracker list - gives us everything in 1 bucket.
- https://bugs.chromium.org/p/dawn/issues/list?q=Type%3DTask-CTS&can=2
CW: it's through the CTS that we prove that the API is implementable.
KN: have gotten a lot of good results so far.

Administrivia

CW: not that many issues on current spec features for today's meeting. If you think of issues you have that aren't addressed, please put them on the agenda for next meeting.
CW: people are running into some issues in the spec who really know their stuff. See below.

Rename SHADER_READ back to SAMPLED #1963

CW: had GpuTextureUsage.SAMPLED, got changed to SHADER_READ - has the problem where you can probably read storage textures, but SHADER_READ doesn't let you do that. And you can't texture all the SAMPLED textures
- Could be SHADER_RESOURCE, like SRV in D3D
TEXTURE_BINDING and STORAGE_BINDING?
KN: DM had idea to name usages after how you use them in the API rather than in the shading language. E.g. in BGL we have TEXTURE and STORAGE_TEXTURE. Also works out well with the wording. This usage lets you use it as a texture binding, etc.
DM: second mini-suggestion: storage textures are more advanced usage of textures than regular textures. Just need to disambiguate uses a little more. Rename to SHADER_STORAGE? Kai suggested, cool idea.
DM: I like the BINDING approach slightly more. Wanted to tell the group both options.
KN: second-to-last comment on the thread, right now.
MS: does this share same namespace as buffer usage?
KN: no.
JG: +1 on the evaluation slightly to the BINDING approach. Shader Resource is another cool idea since many of our restrictions come from D3D SRVs.
MM: no comment.
CW: let's go with BINDING. A bit bigger, but that's OK.

Add opaque region or isOpaque hint #1871

KN: most recent discussion was with JG. Background:
KN: here, we could probably add a mode "This is opaque data". Page asserts that to us. If alpha=all-1.0, get correct results. Otherwise will get incorrect rendering results, like you can get with premultipliedAlpha.
KN: we could take that semantic and expand it here./
KN: had previous idea: do the clear of the alpha channel to 1.0 inside a render pass so it's not as costly.
KN: should talk about that solution as well.
KN: an idea that DM had, haven't worked through completely. New StoreOP, "Present", and at end of render pass it clears or otherwise sets the alpha channel appropriately. And at the end you can't use the resource any more - removes access to the texture from the app. Like the way it happens at the end of the frame. Have to make sure further command buffers don't use the texture. But lets us put the clears inside the render passes. Android has this issue. OS needs us to clear the alpha channel.
MM: any way we could know the app's behaving correctly so we don't need to inject this extra code?
CW: one idea from long ago, have special texture format just for swap chains that's RGBX. Then we'd set the alpha writemask to 0 always.
MM: yes, that's what I was remembering.
JG: that's a potential. Many users ending up using the alpha channel almost as a scratch buffer leave it with non-1.0 values. Using it for blending, etc. RGBX wouldn't work.
KN: example case of this: used to be a technique to render front-to-back - need alpha channel in render buffer. Then use alpha test to reduce overdraw. Need the intermediate data for blending.
MM: thinking about JG's example - can't attach render target as shader resource, so if there's frame where you need intermediate data, you'd need 2 textures anyway?
JG: the data's used for blending.
KN: that data's read from the render target implicitly by the hardware. Render targets are not output-only in render passes.
JG: that's the trick. Don't need any aliasing / reuse if you're doing blending. Just mess with the alpha channel.
KR: In WebGL the applications rely on the alpha being 1.0 even in the middle of the frame. Tried to hack that semantic to make it cheaper. Broke the applications. On Intel GPUs at least, there was a large performance hit to masking alpha. Clearing the alpha with a quad seemed to work correctly. If applications use the alpha in the middle of the pass, then on some platforms it will appear the alpha channel is 1.0, or it might vary.
KR: Do we want to have to have platform differences between RGB vs RGBA?
KN: Unlikely to want to allow that.
MM: Concern that RGBX on top of RGBA needs to inject alpha = 1 in the shaders.
DM: think the StoreOp idea is worth discussing. Seems orthogonal to isOpaque.
KN: lets us do it without the undefined rendering result. Instead of assertOpaque, can have "definitelyOpaque".
DM: already have that.
KN: right, semantic we have now - clear alpha channel before present - this plus the StoreOp are solutions to get the alpha channel cleared. One gives the user a way to clear the alpha channel without extra cost.
DM: it's not free. Won't solve the solution of zero-overhead.
MM: there's no solution that's portable and has 0 overhead.
DM: it was said that WebGL has this nonportable behavior with premultipliedAlpha:true.
JG: when we run into bugs like this, show-through of things underneath the browser, these are classified as security issues. Can't leave them undefined.
KN: with premultiliedAlpha, you get correct result or washed-out result. If you assert values are opaque, there's a final compositing result, assumes things are opaque, just blits them and now you have a transparent window.
CW: 0-overhead solution :) : we have RGBX textures, sets alpha mask to 0. Force RGBX textures to be used with LoadOp_Clear, alpha set to 1 / 0 / whatever. Use the ClearLoadOp to force the alpha to 1. The pass never writes the alpha. Has some cost on Intel, because you use the writemask, but the alpha's cleared once and never touched afterward.
DM: same cost that Ken discovered was huge?
JG: maybe add a shader epilog, test a uniform, write alpha 1.0 if set?
CW: we already do similar things in chromium.
KN: can only do this if the blend mode's not writing alpha<1.0.
CW: can validate this with RGBX.
JG: and make it behave-as-if. Write my color value into the alpha value. No, we'll write 1.0 instead into alpha.
MM: isn't there simple solution - WebGPU canvases with opaque:true turn into 2 compositor layers? A layer behind it that's opaque?
JG: maybe.
KN: when you get to that kind of solution - could say, on platform basis, if it's a problem to have this undefined compositing result, clear the alpha channel. Gets into platform-specific behaviors though.
CW: compositor trick helps with security, but not with content not showing up when alpha=0.0.
JG: might be acceptable, in comparison to premultipliedAlpha solution. If doing manual compositing, helps to say we don't need blend modes for this one.
MM: the opaque flag's part of WebGPU swapchain. Can define its semantics so you can't see through it to other divs behind it. Up to us. That's the extent of the guarantees.
CW: would alter HTML spec, but we have the power to do that.
JG: yes, we can.
JG: sort of think we should implement based on what we have, and see what happens later on in the spec/impl process. Might not need any change. We have ideas about solutions for various problems.
MM: do need to figure this out before we ship, if we need a way to say textures are opaque. Need to know need for new StoreOp before we ship.
JG: post-MVP?
KN: we have a way to say "composite this as opaque"; just need an alpha clear.
MM: result of this discussion might create new API authors are required to use.
CW: JG's point is we need impl experience.
MM: understood. We can table this for 2 months.
CW: happy to do that, get more experience.
DM: are you saying we need more benchmarking data? We have tried this.
JG: what's there left to talk about?
CW: like to measure the cost on Intel and lower-power GPUs.
KN: maybe with right blend modes, using APIs with renderpasses in them, the writemask problem won't be present any more. We have pipelines and render passes.
MM: need to enumerate options.
CW: need to write things down. Have steered away from isOpaque hint.
KN: I can write those options down.
1. present one, draw a quad setting alpha=1.
1. render pass one, force a clear.
1. extra opaque compositing layer behind.
1. don't do anything.
(RGBX with Clear LoadOp ? Is that in the above list?)
MM: little unfortunate to require epilogue in every shader just to handle this flag.
CW: don't have epilogue in all cases. Clear LoadOp avoids this.
KN: sets writemask in pipeline.
JG: epilogue was an idea about getting around writemasks being slow. No way epilog's slower than writemask.
MM: so that's another option, we try to do it ourselves in the shader.
KN: Filed gpuweb/gpuweb#1988

Stencil reads are S,S,S,S or S,0,0,1. #1978

JG: coming from #1198, discussion both here and WGSL. Ended up with consensus: second, third and fourth components in stencil samples would be somewhat-undefined. But think that we can enforce that they're one of these two things.
MM: how can we enforce one of these two?
JG: all existing hardware does one of these two.
MM: ah, I understand.
JG: think it's helpful to say, it's one of these two, even if we don't specify that impls will produce exactly one of these two. But tightening specs is nice, would be good to mention this is what we expect.
MM: can we make it non-normative?
JG: if we want.
CW: can we make it a preferred option? Impls should sample as (s, 0, 0, 1), but may sample as (s, s, s, s)?
JG: think that falls down because people will write code making the wrong assumption, and they'll say it's a wrong impl.
CW: reason I say this - the only place we've seen (s, s, s, s) is on some Metal machines. Last meeting Myles said he raised a bug, so maybe it'll be (s, 0, 0, 1) in the future.
MM: found it on every AMD device on a Mac. Also, if you asked AMD engr who wrote this part of their driver, they probably would say that s,s,s,s is totally valid, etc.
CW: agree.
JG: I'm also hoping - in the future drivers update and everyone gets one of these behaviors. Then we could update our spec.
MM: agree. If we get to point where all devices return same thing - makes sense for our spec to say it - but not until then.
JG: if we enforce one of these two normatively, can ensure the situation doesn't get any worse. Implementers run WebGL conformance in their CI systems, and they might start running WebGPU tests upstream, so would prevent things from getting worse.
MM: if you were writing WebGPU app - what different things would you write if the spec said one or the other thing?
JG: probably nothing. But they might assume values are certain things.
CW: just wanted to apply some pressure to fix this in future versions of Metal. So we don't have to spend this group's time on it.
KN: only an impact on Metal. Other APIs give us s,0,0,1.
MS: we do this with other things. Can imagine user checking which one it is at load, and then using 1-2 different shaders. Vs. if you say it's always one of these. Then they'd gen only 1 shader.
KN: if you're thinking about this explicitly, just say val.rrrr in your shader
MM: so purpose of this is to help authors, and if we're not helping authors then we shoudn't do it. So suggest non-normative.
CW: let's just drop it.
JG: ok to drop.
CW: we'll just say, the first component is "s".

Add multi-draw-indirect feature. #1949

CW: how do we decide to incorporate such advanced features?
JG: for example new canvas 2d features - worth / not worth prototyping? Implementers say whether the thing'll be useful at some point? One way to start to approach it. And at another point, may have more data.
CW: so would live in PR form for some time, until we have impls/benchmarks?
JG: potentially, yes. Or if someone wanted to do something with branches.
MM: we may want to turn this type of feature into some sort of larger feature in the future. For features that are smaller now, may need to make sure we have backward compatibility - needs design thought. Or, if too hard - postpone until we can think of the full feature.
CW: makes sense. This one - self-contained, doesn't have many interactions. Device-generated command. Self-sufficient. Don't think there's design overload with it.
MS: I see this as a stopgap measure until we have device-generated commands.
MM: just want to make sure we don't have two separate features that deliver the same goal.
CW: we just discussed the process now, not sure we have time to discuss the feature.
MM: nobody saying they don't want it, so let's discuss next week.
CW: slight concern about the compute shader validation for this.
MS: after I factored out into new multi-draw methods as requested, it doesn't make as much sense to have firstInstance-non-zero as part of the same feature. It does affect the DrawIndirect and DrawIndexedIndirect.
CW: so we have to separate the two features.
MS: don't have to. Esp. on Metal, may make sense. firstInstance is required for multidraw to be useful. But it can be useful on its own, esp. if user's trying to polyfill with renderbundles on archs that don't support multi-draw.
CW: firstInstance is one of those features we need only because some HW doesn't support it.

(stretch) Request for a GPUCommandEncoder fillBuffer operation #1968 also #28

CW: could be useful, but might not need for MVP.
MM: could be easy, if you limit to what's common across all APIs.
CW: on Metal, don't need anything - on Vulkan, you need COPY_DEST. On D3D12 you need storage capability.
MM: so we could write that in the spec.
CW: if you need all 3?
MM: yes, if you need it.
DM: also a wart about clear unordered thing. Requires descriptor to be in CPU-visible heap rather than GPU-visible. There are some impl details. We're not using it right now. Copying from 0-filled buffer. We'd be comfortable with just storage cap.
CW: like memset-0.
MM: memset-0 would be useful, and easier to spec/impl.
CW: impls need this any way for 0-init of resources.

Agenda for next meeting

multi-draw-indirect
fillBuffer
isOpaque hint? RGBX? Or do we want to wait couple more months for impl experience and benchmarks?
- Said we'd wait.
- MM: if no new comments on issue, don't put on agenda.
KR: Still hoping there will be contributions to the CTS. Any even simple test?
- DM: we're behind, so we'd need to get to the point of running the CTS.
- CW: hoping that once you're caught up you can contribute more to CTS.
- KN: I'm fully expecting many test bugs to be caught once you're running it.
CW: won't be there next week, but still discussion between empty and "null" bind group layout, and how you unbind stuff from the pipeline? will be a pain point.
- wart that we need to fix sooner rather than later.
- MM: we discussed that 4 years ago if I remember correctly. But no idea if the consensus we reached will apply to today's spec.
- CW: we didn't have a spec to record the decision in.
- DM: fix can be 1 line in the spec. BGL is the same, etc. Please do fix it. We want it too.
- CW: maybe I'll make a PR but won't be there to defend it.
- MM: there are other design options. "unbind" this bind group from this slot.
DM: problems with default queue - #1977 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2021 07 26

GPU Web 2021-07-26

Tentative agenda

Attendance

CTS update

Administrivia

Rename SHADER_READ back to SAMPLED #1963

Add opaque region or isOpaque hint #1871

Stencil reads are S,S,S,S or S,0,0,1. #1978

Add multi-draw-indirect feature. #1949

(stretch) Request for a GPUCommandEncoder fillBuffer operation #1968 also #28

Agenda for next meeting

Clone this wiki locally