GPU Web 2020 08 24

Chair: Dean (Corentin is out)

Scribe: Ken

Location: Google Meet

Tentative agenda

Post-land discussion of Internal usage flags and rules #994
PR burndown
- Add some restrictions about resource usage tracking/validation #972 (Yunchao, Dzmitry)
- New binding type for multisampled textures #1005 (Dzmitry)
- Tweak requiredBytesInCopy, make bytesPerRow/rowsPerImage optional #1014 (Kai)
- Relax the command encoding validation scope #1021 (Dzmitry)
Needs discussion burndown
- Add depth16unorm to WebGPU SPEC #508
- Add "depth32float_stencil8" extension for GPUTextureFormat #755
- Multisample Coverage on Metal #959
Agenda for next meeting

Attendance

Apple
- Dean Jackson
Google
- Brandon Jones
- James Darpinian
- Kai Ninomiya
- Ken Russell
Intel
- Yunchao He
Kings Distributed Systems
- Daniel Desjardins
- Wes Garland
Microsoft
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Mehmet Oguz Derin
Timo de Kort
Michael Shannon

Post-land discussion of Internal usage flags and rules #994

DM: historically, had simple model: each resource has list of usages it supports. These are read-only usages, or writable usages. Allow only a combination of read-only and a single writable usage at a given time.
DM: as we progressed, realized this model doesn’t work. Storage usage, e.g., can be read-only storage, or writable storage.
2 ways to fix:
- Expose read-only as separate flag for user. Have to specify that it supports being read-only storage or write-only storage. Decided we didn’t want to go there.
- or:
- internal list of usages for validation / synchronization is different from usages that user provides. There is a relation between them that we define strictly.
That’s part of what the PR did. List those internal usages.
Another part: idea that we have read-only and writable usages - “main usage rule” - also fell apart. Some of usages can be used multiple times - like storage writing can be done multiple times in the same render pass - but some only once, like output attachments (only once, not multiple times).
Went back, thought how to make list of rules that would be sound w.r.t these limitations.
Defining just a few combinations that are allowed. That’s part of what this PR does
STORAGE_READ and STORAGE_WRITE allowed together in the same scope.
More usages on the same sub-resource - doesn’t fall in to that usage pattern, and is disallowed.
There are 3 groups allowed today. After my PR landed, Yunchao merged 2 of my groups together, so now there are only 2 lists of usages which we validate.
JG: basically - this happened, and this is the new state of things. Just trying to catch everyone up to the top-of-tree philosophy.
DM: the anchor in the spec is “compatible usage list”. Currently lists 5 classes of usages. The PR in flight will reduce it to 4.
RC: are there any compromises in the various APIs that would make this usage scheme difficult to implement?
DM: good question. If anything the new scheme is less of a compromise. Previous one didn’t allow read-only storage to come together with writable storage. Reduces one of the compromises.
DJ: no outstanding work for this aside from implementations and CTS?
DM: correct.

PR burndown

Add some restrictions about resource usage tracking/validation #972 (Yunchao, Dzmitry)
- YH: same question basically. In this PR, add aspect like depth/stencil to distinguish sub-resources. Previously, had array layers, mip levels. This PR adds aspect.
- YH: also combined two read-only usages into one.
- YH: also clarifies situations - i.e., visibility=0 (or visibility=compute in render, etc) bindings should be tracked. Unused bindings should be tracked. If we don’t track these usages, might be difficult. If developer has some conflict for these usages, can give a warning, and they can correct their code. We could not track these invisible bindings - but if we have to filter this stuff out it’s difficult. If we do this tracking and report this stuff, it’s just warnings to the developer.
- DJ: Dzmitry has already reviewed this. Has asked Kai. Does anyone want to do another pass or accept as-is and handle cleanup comments later?
- KN: probably can’t look again today. It’s small, so OK to land now, or could wait a little bit.
- JG: what does “tracked at state setting time” mean? Reads like a description of how it’s implemented, and not what we’re implementing.
- YH: we say “setBindGroup”, then we track this. Maybe you call setBindGroup on the same index soon. We don’t delete the previous tracking.
- JG: talking about what is/isn’t tracked sounds like an impl detail.
- YH: yes, it is a little bit. Maybe we can say, inside a usage scope …
- JG: I’m reading this and not understanding what’s allowed / disallowed.
- KN: setBindGroup(0, something) and setBindGroup(0, somethingElse), we will validate the usages in the first setBindGroup even though it wasn’t used for anything. It’s tracked, and later we validate that the usage list is a compatible usage list.
- JG: “tracked” says we’re watching it but doesn’t say what we’re doing with it.
- YH: if it conflicts with another usage we report an error.
- DM: it is specified. This chunk is non-normative. Already spec’d in setBindGroup.
- KN: what we’re talking about isn’t contained in this PR. in setBindGroup it says, track all the resources in the tracker, or similar. Not fully fleshed out yet, but in setBindGroup it says, we track these resources. Later we look at the tracker and ask whether it’s valid. Is every sub-resource in one of these valid usage lists? Non-normative thing is just saying, the result of the rules written here and throughout spec is this. As a reader, hard to look at entire spec and understand it. This non-normative note describes the practical behavior.
- JG: a bit difficult because it relies on me understanding more about various implementations than I might. Understand why you think it’s useful, but worried that it’s non-normative and doesn’t just rely on implicit stuff elsewhere in spec, but also how you expect it to be implemented.
- KN: the spec’s pretty explicit about how these things are tracked. setBindGroup picks up all sub-resources, inserts them in usage list according to these rules, then later take the list and (the validation rule says) that it follows various usages. Sounds like an impl, but the spec writes it in this way, like an implementation.
- JG: “tracking” doesn’t mean anything by itself - it’s done for a particular thing. If it’s added to a list that’s validated later, we should say that rather than “it’s tracked”. Also worried about how actionable that is. Is this useful for readers of the spec, or would it be better to say “we have to re-validate if you do these things”?
- KN: it doesn’t cause revalidation. If you do something invalid, it’s still invalid even if you overwrite it afterwards. I’m sure this could be clearer. I’m sure this could be written in a way more explicitly tied to how it’s spec’d.
- DJ: when you mean “this” you don’t just mean this PR, but how it’s spec’d overall?
- KN: yes.
- DM: wanted ot note - this is added to the resource usage section. Right after it there’s a synchronization section that makes statements about similar things. However, I do think that Yunchao’s PR adds more value to that because it says more explicitly things about visibility. Maybe the section we have right now has to be expanded, rather than adding another section to a different area of the spec.
- DJ: I suggest we still accept this PR and file an issue to do cleanups.
- JG: if other people think it’s good, good enough for me. Doesn’t make immediate sense to me though.
- DM: maybe we should cut that note out of the PR, land it, and it’ll only affect the aspect part of resource definition. Then discuss the rest at the editor’s meeting.
- DJ / YH: sounds good.
New binding type for multisampled textures #1005 (Dzmitry)
- DM: used to have MULTISAMPLED flag, said it only affected sampled textures. Recently I raised concern on WGSL side, multisampled textures need to be their own type. Could do the same thing on the API side, no issues. Just remove MULTISAMPLED flag, add another variant to texture type.
- JG: that’s required anyway because the sampled types are different on our backends.
- DM: doesn’t change any requirements at all. Just changing API side to match WGSL better.
- KN: without this change we’d look at the multisampled flag to figure out what to do on the shader side.
- JG: I like making the validation tuples smaller.
- DM: and the switches larger. :)
- DJ: OK, rebased and merged - eventually.
- DM: will do.
Tweak requiredBytesInCopy, make bytesPerRow/rowsPerImage optional #1014 (Kai)
- JG: interesting that a 0x2x0 copy might need bytes. Is that what I’m reading? Need stride of bytes for first row?
- KN: kind of weird, but makes more sense because it removes special rule from the validation. Previously had this rule, if w,h,d==0, it’s OK, ignore anything else. Don’t think it makes sense to have that special case. It’s rare, and in principle, if you have a height of 2 regardless of how much stuff you’re trying to copy, you’re looking at a range of bytes even if you don’t read them.
- JG: like in C++, if you have 2 rows, the pointers to them both have to be valid. OK to remove special case if people don’t need it.
- KN: previously we had discussion, if you don’t use bytesPerRow because w,h,d==1, then - previously they had to be min 256. Do we ignore that? Say they have to be 0 or at least 256? Gets rid of that weirdness, says it can be undefined. Removes 0 being potentially special here.
- KN: aligns with other changes we’ve made in the spec to use “undefined” in places where the value isn’t used. In this case, we just allow it to be undefined if you’re not going to use it.
- DJ: SGTM. Anyone object?
- (no)
- DJ: accepted.
Relax the command encoding validation scope #1021 (Dzmitry)
- DM: we had a concern about whether it’d be difficult for impl to defer errors, but turned out to be OK. Our impl does validation at command encoder level. Unlike Dawn, which has it at the command encoder finish time (mostly). We’re fine with the current spec, and it’s consistent between browsers.
- KN: the key thing here that makes this the right way to do this is that it lets you do all your encodings on the content side. E.g., inject some code into the VM which transforms encoding calls to write bytes into a buffer, because there are no side-effects. Can’t raise errors in the error scope by encoding commands. Allows more flexibility in the impl. Useful.
- DM: technically, the PR didn’t mean to restrict that capability.
- KN: right. Also think this should be consistent - matters for portability that the error reporting be consistent. Edge case, but apps could easily, accidentally, write their code assuming it’ll behave one way.

Needs discussion burndown

Add depth16unorm to WebGPU SPEC #508
- KN: this was quite old, we closed it, and I reopened at some point. Thing to discuss: this is a format required everywhere that we might ship WebGPU except macOS 10.11. Q is: should we leave it out just for the sake of macOS 10.11? Very old. Was the last Mac version / HW that came out ~10 years ago. The only HW that can’t update past this restriction is that old. Don’t know what the answer is here. There’s also some iOS hardware affected but it doesn’t get major OS releases anymore.
- DJ: from Apple’s perspective, we won’t ship Safari to that build, so I’m happy to drop support for 10.11 and leave support for this in the spec. Suggest to other browser vendors that they shouldn’t support that.
- KN:
- JG: there’s some version of macOS people are staying on because it’s the last version supporting 32-bit binaries. What was that?
- DJ: that was 10.13 or 10.14. 10.15 is out now. 11.0 is what’s coming.
- KN: 10.14 still supported them.
- DJ: you might still support 10.14 but if Apple ships an update to Safari for that OS version it won’t include WebGPU.
- JG: with that in mind, does anyone object to having unorm16 depth in core?
- DJ: I don’t think so.
- JG: want to say WebRender uses it in some cases, actually.
- KN: it’s nice to say you have a depth format smaller than 32-bit. On some devices, doubling depth memory footprint is a problem.
- JG: less used for 3D interactive stuff, but for 2.5D stuff, it’s useful.
- KN: yes, useful for model viewers though not immersive environments.
- JG: approved.
Add "depth32float_stencil8" extension for GPUTextureFormat #755
- KN: this would be an extension - can’t support it in core. Fairly widely supported. Has imp’t properties. Higher depth precision if you need both depth/stencil. Can also copy out of it, and between it and depth32float. Think we always intended to add this as an extension. No reason to not do it. Don’t know if impls will do it super soon, but also should be fairly simple.
- RC: for copying this format to buffers, what are the rules? D3D12 says …
- KN: don’t know the answer. Suspect so based only on what we learned when looking at copies for other depth formats. In D3D12, if you copy a D32FS8 - depth aspect, you get a packed buffer of 32-bit floats, not 100% sure.
- RC: I was wondering more about stencil part, if there’s X24 part there.
- KN: we might use this to impl D24+S8. Would indeed pack the stencil buffer.
- RC: my only point - as long as it behaves the same, no objection.
- JG: for example, if you send it 1’s in the X component.
- RC: or you read it back and try to use the buffer in a compute shader. What will be there? Or you send stuff between CPU and GPU, etc.
- JG: if we do depth clamping where we have to clamp inputs between [0..1] or [-1..1], we can definitely do zeroing of unused / reserved bits.
- KN: we don’t allow copying into float depth formats. Undefined behavior. Have to clamp. Were going to disallow. Only copy out of D32F.
- JG: how do we upload data?
- KN: don’t think that’s super common. Upload to R32F, or write into depth buffer using shader.
- DP (or MS?): that’s what most people do.
- KN: think we will want to revisit idea at some point, but for now, we plan to disallow it.
- JG: if plan is to let user write compute shader, that’s great. Generally like idea of exposing this. Generally interesting, because do we have a D32S8 format? Or just D24+S8?
- KN: Just D24+S8. This would be the D32S8 format.
- JG: would implicitly, maybe, expose what D24+S8 was. Odds are that D24+ is actually D32.
- MS: it actually always is, if you’re on desktop. Only not if you’re on Android device. Currently relying on that behavior.
- KN: a few weeks ago, were going to add memory footprint query for various formats. That doesn’t tell you the precision of D24+S8, but when we add that, maybe should expose that info so apps can say they require a specific amount of depth precision.
- JG: I’d propose strawman - for similar reason you might want D32S8, add D24S8. If you want to do it yourself you can, but like in WebGL, we added DEPTH_STENCIL which gets you the best reasonable precision.
- KN: D24Unorm is of somewhat more limited usefulness. Main use case for D32 is higher precision. If you don’t need that then D24Unorm gives you copyability over D24+. Useful to some people.
- JG: that’s my point. I like to know what I’m working with. Has fairly slim benefits - you can copy it - but maybe that’s useful.
- KN: I do think we should have the concrete formats as extensions.
- MS: there are cases with OIT algorithms where for comparison purposes you need to know the exact format of the depth buffer.
- JG: because you have to match that with a non-depth-buffer?
- MS: yes, because you have to compare with a texture that’s not a depth buffer.
- DJ: is iOS always D32S8?
- KN: I believe so.
- MS: at least on Dawn it is.
- JG: it is the way we impl it. It’s always planar. Separate depth/stencil. Unusual.
- KN: all Metal GPU family Apple category has D32FS8 but no D24UnormS8 at all.
- JG: great, I think this is needs-specification now. I’ll take it - writing the extensions.
Multisample Coverage on Metal #959
- DM: in MS there are different states interfering with each other. 1) alpha-to-coverage. 2) sample mask.
- DM: we figured that in Metal you don’t have the sample mask as the pipeline state, so you can do this in a shader. Output the coverage mask from the shader. But Metal doesnt’ let you do alpha-to-coverage if you touch the coverage output.
- Two choices>
  - 1. can’t set both alpha-to-coverage and sample mask on the pipeline state.
  - 1. Our Metal impls would have to implement alpha-to-coverage themselves.
- JG: I like the former.
- DM: and we can always add the latter later.
- DM: a2c is a platform-dependent thing, could always implement it in a shader stub yourself.
- DJ: even if it’s a bug in Metal, have to handle the case where it can’t be done. I agree with the first step.
- DM: will you take an AI to see if it’s a bug on your side?
- DJ: yes.

Agenda for next meeting

JG: primitive restart behavior - #1002.
- KN: just needs spec. Agreed on behavior, just didn’t write it down.
- JG: could you write it down in the issue quickly?
- KN: yes.
#1001 also has discussion.
- DM: this is needs-specification, doesn’t need discussion.
#962 ordering of Promise resolution?
- KN: don’t think it needs discussion.
JG: just multi-queue.
KN: no promises, but might be able to get to adapter / device initialization stuff. If get to it by deadline, it’ll be on the agenda.
DM: maybe skip a meeting given agenda’s so tiny?
DJ: could someone close #1003? Reusable command encoders.
Skip next meeting?
JG: we could show up next week and only plan to touch base if it’s not too much overhead. Short meeting, go home early?
DJ: let’s do that, and see if there’s stuff on the agenda as well. If nothing, can just skip for two weeks.
JG: we can cancel the meeting by the homework deadline (this Thursday).
DM: I’d like to cancel it in favor of a WGSL meeting - lot to discuss there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Web 2020 08 24

Tentative agenda

Attendance

Post-land discussion of Internal usage flags and rules #994

PR burndown

Needs discussion burndown

Agenda for next meeting

Clone this wiki locally