Minutes 2022 05 25

GPU Web 2022-05-25

Chair: Corentin

Scribe: Ken / others

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
WebGPURenderingContext should support preserveDrawingBuffer: true #2743
TAG feedback on WebGPU #2927
Internationalization Metadata for Human-readable Messages #2780
Should we constrain the location of user input-output stage variables WGSL #1962
maxBuffersPlusVertexBuffersForVertexStage limit #2749
Missing depth24unorm format? #2732
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Stephen White
Microsoft
- Jesse Natalie
Mozilla
- Jim Blandy
- Kelsey Gilbert
Mehmet Oguz Derin
Michael Shannon
Russell Campbell

Administrivia

CW: TAG review is done!
- They're happy!!!
- Woo hoo!!!!
CW: we'll discuss more later - three things:
- Would be nice if there were more options to requestAdapter like "high VRAM", "I don't want to jank the page"
- "This device is using a lot of power" - surface this to the user
- Not permissions, but figure out a way to balance that
- TAG review for WebGPU was not working - the process right now doesn't work for large specs like WebGPU. What could we do differently?
BJ: thought the third question was inward-facing for the TAG.
CW: agree. We can have some valid input though.
(KR: Maybe the process could have been improved by having a WebGPU team member or members meet real-time with the TAG during their review?)
MM: did we open issues for the first 2 topics?
CW: we have an open issue for the whole TAG review.
MM: should file bug about giving feedback if using lot of VRAM etc.
CW: agree.

CTS Update

KN: not much activity this specific week, but lot of ongoing work on tests
Talked with Jim/Kelsey about integrating testing in Firefox
Similar situation to WebKit - need to generate separate HTML files
- Something to consider if we can make that easier
MM: if not just us - sounds worth the group discussing / thinking about.
KN: have you already written that HTML-generation code?
MM: sort of. Generated HTML file for every .spec.js. Not granular enough. Some tests so large they timeout on Debug bots.
- Kai already has this code for generating variants. I need to reimplement that to work with individual HTML files.
- KN: think we can put a build step in the CTS that takes a template HTML file and makes many copies of it. Each one pastes in a query at the top, and runs things how that template file should run them.
- MM: makes sense. Needs to be browser-specific.
- KN: yes.
- MM: normally when we run a test it waits for DOMContentLoaded. Since this has a redirect, has to be more asynchronous. Template HTML file has to have "waitForExplicitDoneSignal" - specific to each browser's test harness.
- KN: what's the navigation for?
- MM: running the actual test. Setting the href is going to the CTS html.
- KN: don't think we have to do that. Can just run it in the page.
- KG: our test harness for WebGL runs it in an iframe and postMessages out to the container.
- KN: think we can do this without iframe or redirect.
- MM: sounds good to me.
- KN: should be easy.
CW: WebGPU CTS is great! Finding lots of bugs in drivers.
We can work around some, but need to figure out what to do about bugs that are infeasible for implementations to work around.
KR: When we ported over the dEQP test suite in JS, we found that some subcases failed on some hardware and skipped these. Over time driver got fixed and workarounds were added but not everything. What we want is skip as few subcases as possible, for a specific format or something like that. Skip targeted subcases instead of whole tests.
MM: sounds reasonable to me. Bug reporting pipeline - as I understand it - all companies have a way to report bugs. Because this group's public, sort of as far as we can go advertising publicly how bug reporting should happen. Companies can have private conversations though. Should reuse those private bug reporting channels.
KR: That's absolutely the intent, but bugs take a long time to get fixed, so to not block progress, we should check in skips.
MM: sounds right to me.
CW: let's discuss specifics after giving this more thought. (Adding skip list support to the harness)

“Tacit Resolution” Queue

Add reflection for container objects. #2919
- CW: adds descriptor arguments of GPUBuffer, GPUQuerySet, GPUTexture. Can query these back. Except view formats, I was lazy and there was some normalization that would need to happen.
- These are containers, probably will pass these into middleware. Shape of them should be easily queryable. How to zero, how to generate mipmaps.
- Surprised this was tacit resolution!
- MM: this came up 2 years ago? Argument was that an author wants to remember the descriptor info, we should add an expando property? I was arguing for this PR at the time.
- CW: last info on wiki - 6 months ago, agreed to discuss for V1.
- KG: different issues, linked together.
- KN: think we may have discussed this. Doubt we were committal about this. Think it's OK to make a decision now.
- KG: vaguely remember we discussed this in Redmond…
- CW: so should we discuss / not discuss this?
- MM: definitely should discuss. Convenience vs. memory use. Memory use probably small. Convenience - if people are asking for it, it's convenient.
- CW: it's on tacit resolution. If you disagree with this enough, please make a comment on the PR.
- KG: the one for parents?
- CW: no progress on that yet.
- KG: Issue #1497, potentially adding parent links - more complex, introduces cycles, marked maybe-V1.
- CW: can discuss that one later.
Relax the null buffer check for zero-attribute vertex buffers in "valid to draw with GPURenderCommandsMixin" #2864
- CW: already landed.
Empty vertex buffer slots should be skipped for OOB validation in draw calls #2929
- CW: this is basically keeping the spec coherent. One place where it's required to have the buffers, but only if the slot's non-null.
- KN: marking this editorial.

WebGPURenderingContext should support preserveDrawingBuffer: true #2743

15 min timebox (~12:35 ending)
3 ways forward based on Kelsey's feedback
KN: (1) preference - do what this PR says. Expect people to implement it. Allow reading back the canvas after it's been presented, before you start writing to it again.
(2) land this in the spec, have an implicit expectation that not everyone will implement it just yet - don't need to do it to reach parity with WebGL
(3) not add it yet. Spec will say, if you do the readback in this gap you get a blank image - but we'll change it later. Nobody should rely on a blank image, so should be OK to change.
KN: would prefer to get this fully specified now.
MM: are we all in agreement that reading out of the texture after presented, but before written to, is legal, and that we (the CG) are willing to implement it?
KG: in problematic cases it's just a deoptimization that we can revisit optimizing in the future. Sounds fine to me. Does mean that we can't do obviously most efficient thing everywhere.
MM: what's that efficient thing?
KG: after presentation, no longer have even read access to the buffer. Think this is fine. Separately, think there's a desire for a canvas.snapshot, but that can be a different thing.
MM: that would have the same behavior?
KG: yes, would be the last rendered frame.
MM: that's fine with us. (the idea that we can read from a presented texture)
KG: I'll propose the snapshot thing separately, has been an idea of mine for a while.
KG: +2 to #2905.
CW: great. If everyone's happy with this, we can mark it closed and that's the last of the canvas issues! 🥳🎉

Internationalization Metadata for Human-readable Messages #2780

CW: discussed with Francois Beaufort. Internationalizing all error messages is infeasible. What if we add "localizable" to the device? Applies to all the errors.
- You could use the English message and prefix it with the localized text. "I don't have a localized message for this, sorry."
- Probably OK, and don't need to have localization everywhere.
- MM: +1.
- KG: what's the result you're proposing?
- CW: Add localization info to the device, or navigator.gpu. This is the only thing we add to the spec.
- KG: read-only?
- CW: Yes. This is to say - the messages I give you will have this direction, etc. If not, the messages will tell you, this one's a bit different.
- MM: if user changes browser language, future devices can use the new language.
- KG: just wondering if this is configurable after device creation.
- MM: think that's an antipattern.
- KG: this is a fine direction.
- CW: MOD, opinions?
- MOD: doesn't make much sense to me, but maybe comment on this approach in the issue - if they deem it acceptable, fine with me.
- CW: I'll take an AI to comment on the issue.
- KG: would like localizable string type with annotations. Having it return a whole new level of object is the thing I like least about it and the single thing that turns me off from this approach.
- MM: the point of the String class is that it works correctly everywhere. If it existed and the entire web platform used it everywhere - yes, that'd be the right answer.
- KG: just a reason I don't like the best practices as a non-expert in the area. This is a compromise solution.
- CW: I'll answer the issue, probably next week, and see what they say next week about that simpler approach.

TAG feedback on WebGPU #2927

CW: what do we want to spin off as separate issues for V1?
KG: Yes, filed:
Agreed to postpone - not actionable at the moment.

Should we constrain the location of user input-output stage variables WGSL #1962

CW: turned into very detailed discussion. We have a proposal.
KN: made giant spreadsheet with Jiawei's help and his really detailed investigation.
- Delimits counting in various APIs.
- Some platforms have a limit - for example D3D has 32 outputs max from VS. Position output counts against it, but have to have that, so we subsume it into the limit. Our counting rules become simpler because we don't have to count vertex position.
- (1) what to subsume into those limits.
- (2) what doesn't get subsumed & still needs to be counted to conform to platforms' limits.
KN: last column talks about how to count vertex outputs / fragment inputs per platform.
KN: counting rules here are very simple. Few minor customizations. Could raise it by 1 if we counted things differently on some platforms, at detriment of other platforms. Pretty straightforward rules that don't result in too much pessimization.
KN: difference between 30 & 31 interstage variables hopefully not a big deal for people. Hopefully people not going that high - lot of interpolation to do.
KN: with the papering over done in last proposal - things become fairly uniform. Things cap out at 120 components, 30 interstage variables. Represents hardware pretty well.
KN: open q's:
- Right now - we have 2 limits, one for variables, one for components. Each applied twice, to vertex outputs & fragment inputs.
- In principle, outputs/inputs don't have to be counted separately. Vertex outputs not consumed by fragment inputs? Don't count, removed.
- Assumes we do that optimization or are on an API that guarantees it's done. For example, Metal counts the varyings after pipeline construction. We could do the same, but have to make sure the logic works on all backends consistently - not due to different optimizations on different platforms.
KR: recommend doing this at the shader level.
KN: actually not even considering static uses, just what's declared. VS declares it outputs more than FS takes as input - think we allow this. If backend API would hit a limit counting vertex outputs, our impl has to figure out how to remove the unused vertex outputs before handing to the driver. Probably not super difficult - if we can see which ones aren't used, have to eliminate them from the generated code. Not sure how many platforms we'd have to do that on. Think necessary on D3D at least.
MM: push back on "heroic" - our compilers already have to match VS/FS outputs/inputs. Question isn't that we have to do analysis we wouldn't have to do otherwise - we're doing it in another place.
KN: my point too - we don't have to do static use analysis. Just challenging at codegen.
CW: TBH it's not that useful to not count interstage variables that can be optimized out. If driver doesn't do the optimization then we have to do code generation at pipeline creation. Also a non-breaking change to improve it later. Go with the thing that's more constraining now. If people yell, can relax it in the future.
KN: yes. Forgot to mention pipeline creation-time codegen. Anyone want to go through proposal's details? Later? Any comments now?
KG: so proposal is - ?
KN: how we count against the limits in our API, and as a result, how you compute the limits in the backends (that's impl detail). Proposal's how we do counting to see if limits are exceeded.
MM: most imp't part - there are 2 limits, one for variables, one for components. We're happy about that. Details about counting - not very important to us. We think there should be these two different limits.
CW: seems general idea's good. Details if you care are on the issue. How about tacit resolution for this?
General agreement.

maxBuffersPlusVertexBuffersForVertexStage limit #2749

MM: in Metal, VBOs and UBOs / SSBOs are not differentiated. You just have buffers.
MM: it's just the total sum of buffers available to the vertex shader.
MM: in WebGPU we do have different concepts. Right now, in the limits struct, there's limit on # vertex buffers, and UBOs/SSBOs that can be bound. Since no distinction in Metal, harmful because - would like to say, you can use 30 VBOs if you want as long as you don't use more than 2 UBOs. Pessimizing - no good either.
MM: proposal here - add new limit, don't change existing ones. Sum of vertex buffers plus "other" buffers.
MM: to know if it's valid to run the draw call / compile the pipeline, have to do all the validation from today, and one more validation step to ensure the sum of these is under the limit. Precedence for this in validation of thread group sizes.
KN: all for this, think we should add this limit - just need to figure out the base value.
CW: it's the sum of all the limits.
MM: from impl perspective - Metal, you'd set # vertex buffers to 31, non-vertex buffers to 31, and sum of them to 31. Non-Metal, set VB limit to whatever, non-VB limit to whatever, and limit of sum of them to the sum of the two.
CW: gives us more leeway. Little cost - just an additional limit. Pipeline compilation, so can add as many limits as we want there. More flexibility.
KG: sounds fine. Tolerable addition.
KR: "maxCombinedBuffersForVertexStage"? Think precedence in earlier APIs.
MM: good idea. Not sure what the specific formulation would actually be. Interacts with argument buffers in Metal. Just hoping for general direction forward.
KN: sounds great.

Missing depth24unorm format? #2732

CW: we have the depth24unorm_stencil8 format. Aspect is depth24unorm+.
CW: copying from buffer to depth24unorm adds padding bytes where you'd imagine the stencil to go. Copying from - contents of the stencil byte are undefined in the APIs.
CW: Nice if we had such a format. Copyable into buffers. This issue tries to figure out how to do this.
CW: if nontrivial - maybe can move the whole depth24unorm_stencil8 extension to post-v1.
KN: think we could add the format, but not make it copyable yet. Make it copyable later.
KG: what's value of that instead of depth24unorm_stencil8?
MM: that's what we do in Metal - no depth24unorm without stencil8.
KG: because of that, and because one of the comments was - if we sign up to do the copies, we'd have to use compute shader to do them precisely/securely - also the comment says users can do this already. Feels like a post-V1 polish thing. Would make some things slightly nicer, but livable without this. One less compute shader we'll have to write.
MM: what does D3D12 do? Do they have this format?
KG: stencil must be 0 for copies.
CW: D3D12 says, if you copy from buffer to texture, you must copy in 0s into the stencil byte. "depth24x8"?
MM: so they have the format but have rules.
CW: reason this comes up - spec's in terms of aspect-specific formats. This format doesn't have depth-specific aspect. All of it, or just the depth part. Have to fix up the spec to make this happen.
MM: just implemented this - didn't run into this problem. Surprised there's something missing.
CW: run into this in the CTS. depth24unorm_stencil8 is copyable into buffers. That fails because we try to use it as depth24+ depth-specific aspect and that's (not allowed?)
KN: something we should change in the spec. Define what depth24unorm is without adding it to the format list. That's editorial. Non-editorial: should there be a format, and should it be copyable. I don't think we need the format, and don't need to make it copyable.
CW: issue - when you copy texture->buffer need to specify an aspect. Constraint coming from Metal. Blit options when you do the copy.
KN: can spec the aspect. depth-only, stencil-only.
CW: then the bytes aren't defined.
KN: but we don't allow that copy regardless. Point's to allow the copy?
CW: point of the sized formats is that you can do the copy.
CW: let's work more offline.
MM: what would help me is a test program.
KN: want to think about this more too.

Agenda for next meeting

CW: great progress this time.
CW: WebKit implementation issues? Streaming impl, max command counts, render bundles?
MM: SGTM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2022 05 25

GPU Web 2022-05-25

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

WebGPURenderingContext should support preserveDrawingBuffer: true #2743

Internationalization Metadata for Human-readable Messages #2780

TAG feedback on WebGPU #2927

Should we constrain the location of user input-output stage variables WGSL #1962

maxBuffersPlusVertexBuffersForVertexStage limit #2749

Missing depth24unorm format? #2732

Agenda for next meeting

Clone this wiki locally