GPU Web 2020 06 15

So as to not be anonymous animals, the doc is shared for writing with the google accounts that are invited to the meeting. If you can’t edit let [email protected] know. Also be sure to be in “Edit” mode and not “Suggest” mode.

Chair: Corentin

Scribe: Austin / Ken

Location: Google Meet

Tentative agenda

Copying of depth24plus formats #652 #789
Texture view format reinterpretation #744
Core stencil8 format #306
Anisotropic clamping for SamplerDescriptor #696
Case for optional GPUBindGroupLayoutEntry entries #851
PR burndown
Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

Apple
- Dean Jackson
- Fil Pizlo
- Justin Fan
- Myles C. Maxfield
- Robin Morisset
- Theresa O'Connor
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Dan Sinclair
- David Neto
- Idan Raiter
- James Darpinian
- John Kessenich
- Kai Ninomiya
- Ken Russell
- Shrek Shao
- Ryan Harrisson
Intel
- Brandon Jones
- Bryan Bernhart
- Yunchao He
Microsoft
- Chas Boyd
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Joshua Groves
Matijs Toonen
Mehmet Oguz Derin
Pelle Johnsen
Timo de Kort

Administrative items

CW: Does the virtual F2F date (June 22-23-24, with WebGL on Thursday the 25th) still work for everyone?
- DJ: There might be 30 minute sections when some of the Apple people can’t be there. But it should be ok. I don’t know if it’ll all be at the same time
- CW: Worth planning around?
- MM: Should make sure certain topics aren’t during certain time periods.
CW: Please brainstorm agenda items on the F2F minutes document.
- CW: We should put items for WGSL as well
CW: W3C people are still eager to push for the charter
- DJ: Apple is the last one to say yes. Is everyone else okay?
- CW: everyone else is OK with it.
- DJ: I'll do this this week.

Texture view format reinterpretation #744

CW: We said we would have investigations of what the classes are. DP provided them for D3D12. I believe Metal is the same. Has anyone looked at that?
MM: didn't have time to look at this, sorry. Next week is the F2F, so I can take the AI to do the analysis by then.
CW: sounds good.

Copying of depth24plus formats #652 #789

CW: copying of depth24plus formats
- Messy because it's split among many discussions in 2 issues
- From shared investigation: copying of depth is always possible between 2 depth textures, and always possible from texture to buffer, but Vulkan has strange restriction: copies from buffer to texture must have values clamped between 0 and 1. So we can't support this in WebGPU.
KN: depends on strategy. Haven't decided whether we'll do magic when we copy depth24plus formats. If so, could do during copying from buffer to depth24+ format. Maybe not depth32 format. But treating it as depth24unorm, might be good to have that magic.
DM: we'll have the magic for which path?
KN: if we have magic when copying to/from depth24 formats, then we might also want it for copying into depth24+ formats with out-of-range values. Same copy. More magic in an already-magic path isn't bad. depth24+ is a special case, deserves a little magic.
CW: in Vk, can you copy 2.0 into a depth32f texture?
KN: not sure.
CW: concern: if we spec something where clamping happens magically, would be unfortunate if we have to do it for depth32 textures where we don't need it.
AE: think you need VK_EXT_depth_range_unrestricted for depth32f in this case
KN: don't think that extension's limited to particular formats. Not about copying; the extension's about the actual values in the depth buffer being outside [0..1]. Copying to/from depth24unorm on Vk, you get 32-bit FP values in your depth buffer. There's a separate question about what happens when you do a copy into a format that can't support values outside [0..1].
JG: this gets to the "how much magic" question. Maximally portable thing is to always quantize into unorm24 and the unquanitize it into float, which will do the clipping. That's what we should do if people are relying on the behavior. If they write 1.5 into the depth buffer, they get 1.5 and it acts like 1.0, and if you put it into a 24-bit depth buffer does it saturate? Clip? Modulus? etc.
CW: think we can guarantee it always clamps, except for buffer-to-tetxure. The question's how much magic we want there for clamping vs. consistency.
KN: if you render into it it's clamped.
MM: Is it well defined what happens if you try to copy 1.5 into a depth texture on unextended Vulkan?
CW: no.
JG: I suspect it just works.
MM: in unextended Vk? That's the point of the extension.
JG: I read the extension as defining what happens on the API side of things.
AE: no, there's additional language in the copying aspects of the API
JG: that's not in the extension man page.
KN: the extension man page doesn't always have everything.
MM: Seems like if we’re going to support floating point depth formats in unextended Vulkan, whatever browser is using Vulkan will need to do some sort of clamping
CW: for gl_FragDepth in vertex output, we can always guarantee it's clamped. For gl_Position it's guaranteed because of clip space. Easy to clamp gl_FragDepth. Copying into the thing's more difficult.
MM: I’m a little unclear because if all browsers will need to do something special for copying into a depth texture, then it’s a sunk cost.
KN: we're talking partly about whether to have these copies at all.
CW: if you want to be able to copy from buffer->texture, because of these Vk constraints, WebGPU has to define some buffer clamping restriction. Then to be conformant every impl of WebGPU needs to do the clamping, otherwise not portable.
KN: also the Vk extension doesn't change anything. You can still go outside the range, but we still have to clamp.
MM: use case where someone wants to copy from buffer to depth texture?
CW: Uploading statically generated mipmaps. Game bake generates shadow maps.
MM: do we want to support this use case?
KN: Does that have to be in a depth format texture?
DM: Not in Metal.
AE: it's unspecified in Vulkan.
KN: so you need a depth format texture and you have to put it into it. You can do this in a shader - doing a manual blit into a depth texture from whatever source you want. The question is whether we require this.
MM: What happens if you bind a depth texture as a UAV and try to write into it in a compute shader?
JG: would have to clamp.
KN: not sure if you can do that.
CW: pretty sure you can.
JG: should just be an image.
MM: if we want to support this use case, we have to inject clamping code, and if we don't, we don't.
CW: we do want to support this use case for people who want to do copies, but don't want to clamp everywhere, so they can do the blit themselves. No magic on our side, and when we can guarantee that there's no depth24+ format, we can say you can copy.
MM: I think that would be ok as long as it wasn’t much slower than doing the clamped copy command.
KN: think we'd do the same thing in our impl. Probably wouldn't be slower. Taking that approach means no copies into depth24+ formats.
CW: from buffer.
AE: no copies into depth formats from buffer.
MM: thought it was only FP depth formats. How do you write > 1.0 into an integral format?
CW: the buffer can contain anything in float32. Copy to depth/stencil or depth texture. Question is what happens outside of 0..1.
KN: if we had depth24unorm format, what would copies to/from it look like? Would data be depth24x8, or float? Think the answer's float, but don't remember where source of that info is.
JG: it's like float- fixed-point.
KN: when you do a copy to/from what format do you get?
JG: I expect you get the bits.
KN: think you can only copy one aspect. d24s8 packed, can only ask for depth. do you get depth24x8 or depth32? Thought someone figured out you get depth32.
DM: I remember the same - has to be float. Also: if user does this it's not as performant as us doing this. They need staging copy to upload a buffer, and another buffer that's GPU-local from which shader loads values. Extra buffer, extra copy.
MM: on Metal, we'd probably not even use render pass, but compute pass, and just write into depth buffer.
KN: right. Same issue either way.
CW: on Metal, think you'd get worse compression if depth is UAV.
KN: I'd expect render pass to do better. But both functionally work, have same issue.
CW: being able to copy is imp't, but you're loading static depth shadowmaps - that's the only use case I know of. Does it matter if you have to do the blit yourself so we can avoid adding a lot of magic?
KN: we can add the magic later if we need it.
DM: Think it’s a problem if the application starts hitching with an extra copy. But yes, we can always allow it later.
KN: so this means, no copying into the depth channel of depth24+ formats?
JG: what about depth24 to depth24?
CW: that should work.
AE: also not into depth32f.
KN: I meant from buffers into the depth aspect of depth24+.
CW: so have to determine what copies are allowed, including stencil aspect. This is progress, can do more investigations offline into the other aspects.

Core stencil8 format #306

CW: We have depth32float. Should we have pure stencil8 format without depth?
KN: there's lots of cases where this would have to be emulated, but should be easy emulation.
CW: this benefits some systems, but on others you think you have 1 byte per texel but instead have 4 or 5.
KN: sometimes would prefer something else if you know S8 is going to be 5 or 8 bytes.
MM: proposal I made last time: surface this info on the device.
KN: yes.
KN: maybe we've already figured this one out.
CW: any concern about where this would expose fingerprinting we don't want to fake? Whether a device has this is something that couldn't be lied about.
KN: this is only used for heuristics. Possible for the browser to lie if necessary.
MM: wouldn't all D3D devices have this set to 0, all Metal devices have this as 1, what about Android?
KN: don't think it's a concern.
JG: the same for huge classes of devices. You'll get rasterization differences that are more of a fingerprinting aspect.
MM: site already knows if you're Win/iOS/macOS/etc.
JG: or, it thinks it knows. :)
DM: are we saying whether S8 is available, or a query about its size?
KN: proposal is: have a format, name TBD, that supplies S8 to the app. May have size of 1, 4, 5, 8 bytes.
DM: where would it have the 5 or 8 bytes size?
KN: not sure if there are devices that have only D32S8.
JG: are you saying where in the API? Or which devices?
DM: which platforms? Thought it was just 1 or 4.
KN: don't know offhand.
CW: could be 5 on Vk devices with only D32S8.
JG: are there really devices that only support that? Surprising.
DM: does Vk have S8?
KN: anything that has D32S8 is likely to have S8. Might not be implementing it as planar format.
JG: usually coplanar format. If just surfacing number of bits / bytes it takes, can just return most correct values we know of.
KN: and we can round down 5 or 8 to 4, even if it's lying.
MM: Do we want to do this for all texture formats?
JG: If there are others not like that, then maybe.
KN: We have depth24plus formats, but it’s less of a problem because they won’t see a 4x increase in memory usage. It’ll be a smaller increase. Could also be measured by measuring the precision of the depth buffer anyway, so maybe we should just expose it.
RC: is there a case where we'd want to just expose D24S8 and let them emulate the case where they just want stencil?
DM: The problem with this is it would tell us they want depth, and we may have to give them depth32f-stencil8 which is much more than they wanted.
KN: that'd be 5-8x the memory usage.
RC: are there platforms that just have stencil?
MM: Metal has that. Surfacing the size of each pixel for every format is probably misleading for authors because they’ll get the same answer for all but two of the formats. We should probably do this only for the ones that vary.
CW: agree.
DM: for other formats we'd have to spec block size as part of spec so we know how copies work.
CW: exactly, but it's always the same.
KN: think we should have it for the depth24+ formats, so only for those, S8, and XS8 (whatever you call this).
CW: seems we're mostly in agreement to have S8 plus this information about expected size of format. Kai can you make a PR for this?
KN: yes.
MM: this can be defining formats that don't exist.
JG: That’s why there’s X0+Stencil8 which is the grossest name I’ve seen. Or Stencil8ish.
MM: that's also misleading. It's not a stencil format with 8-ish bits. It's a stencil format with a backpack that might have more stuff in it.

Anisotropic clamping for SamplerDescriptor #696

DM: We have an extension that doesn’t do anything. Was wondering if there’s something we need that’s more than a single field for anisotropic-filtering.
CW: does anisotropic filtering require different GPU filtering mode?
DM: no.
CW: so just max anisotropy in the sampler?
DM: yes.
CW: let's just do it.
DM: Will make a PR.
CW: I think the patent for anisotropic texture filtering expired, so maybe we can guarantee its presence for WebGPU.
MM: maybe any device which doesn't have it shouldn't be compatible with WebGPU.
JG: could do an extension that's supported essentially everywhere.
MM: what would be the point?
JG: they wouldn't have to support it if they think it's still encumbered. This is something that should come up in TAG review. (or legal review)
CW: there's a decent number of ARM devices that don't have anisotropic filtering on Vulkan (20%).
JG: are these real devices we'd target?
CW: maybe.
JG: ok, that's fair. We don't know.
MM: this is an extension that isn't available everywhere then.
CW: maybe not.
MM: there's a clear path forward then.
CW: We’d need to get field data about the presence of some stuff on Android. FF too. And we should figure out if we can assume anisotropic filtering.
JG: when you request the extension do we tell you what the max anisotropy is? Clamp to impl-dependent value? Then can say that there are some devices which 0 max anisotropy. Won't really get porting issues - things'll look a little different but won't be too bad if someone expects 16x and doesn't get it.
DM: think it's 16 max anisotropy when it's supported. So could make it a limit instead of an extension.
JG: Maybe not a terrible way, for something we expect to be almost everywhere.
MM: Not optional in Metal, and values are up to 16 everywhere.
JG: Do we know if the sampling pattern is implementation defined?
CW: yes. Anisotropic filtering sampling patterns are very much impl defined.
JG: So theoretically we could say we support it, and the sampling pattern is all on top of each other...
MM: a little misleading.
JG: yes, but is it worth doing it as an extension and only supporting it in certain places?
MM: Think the question comes down to if the number of devices that don’t support is a legit thing we want to support, then we have a world where there’s two types. Or, we could just not really care and they can try but not get real anisotropy.
KN: for something like this, where it's exclusively used to improve visual fidelity, it'd probably be OK to just make it not work.
MM: How would you test it?
JG: can test it against isotropic filtering at different mip levels.
KN: or have a query?
JG: this is something where we might be able to add the extension for it after MVP. Clamp, don't validate, you can offer an anisotropic clamping hint, we'll clamp to what the impl has. If people really want to know when they have a device without it, we can raise the extension flag.
CW: a bit worried because devices that don't have anisotropic filtering will be less known. Obscure Android devices. Will people notice it?
KN: was going to raise same concern.
JG: If the app developer, like the big ones, gets bug reports that their old Android device looks worse, they should then surface that to us, and we can tell them that they don’t have anisotropic filtering. If all they’re going to do is just run without it, then there isn’t much point in surfacing the extension.
MM: the bug report of "my floor textures look bad", there's nothing we can do to make them look better.
CW: You could trade off more resolution for..
KN: there's something Unity could do though. Unity could do multiple texture samples and do their own filtering. Probably something the app wants to know.
JG: So if they get this bug report, and they want to do something about it, then ideally they’ll tell us. We could bring it up with them proactively.
MM: I could be convinced that we should let authors supply whatever value they want, we clamp to the internal limit, and we expose that internal limit.
CW: Maybe we could say it’s either 1 or at least 16.
MM: for fingerprinting we'd probably want to say 1 or exactly 16.
JG: probably, but not sure we want to spec that.
MM: guess that can go into the "bucketing devices" issue.
KN: don't think I've ever seen a piece of hardware exposing >16 for this.
DM: I will take the AI to make a PR for this.

Case for optional GPUBindGroupLayoutEntry entries #851

PR burndown

Introduce GPUTextureBinding.readOnlyDepthStencil #859 / Read-only GPUTextureView #860
- DM: there's a problem with read-only depth/stencil. Bind group can't have the shader_read_only layout. That's the layout we've been using for all sampled texture bind groups. I investigated possible solutions. One would be, spec whether read-only at bind group creation level. Or, at view creation level - higher and earlier. My preference is to spec it at the view creation level. createView of texture, read-only view. That matches what you'd do in D3D12. Determines whether need SRV/UAV or just SRV. In Vk, would tell impl to use depth_stencil_readonly_optimal_layout. Removed from BeginRenderPass, not required in createBindGroup, but single boolean in texture view descriptor, defaulting to false.
- MM: I’m confused.
- CW: During the whole render pass, the layout of the texture has to be constant for Vulkan. When you create a bind group, we need to know the layout of the texture. So it means that depending on whether depth stencil texture is used as readonly or not, the layout in the bind group might change. This is unfortunate. DM is suggesting a restriction to say a view is readonly, and we will know exactly what its layout is at creation.
- MM: I believe that using image_layout_general is slower, but wish I had numbers for how much slower.
- CW: trying to think if we have some benchmarks.
- MM: Should be fairly straightforward to do so.
- CW: think we can probably use one of ARM's benchmarks. They have a bunch of Vulkan samples.
- MM: then just change the layout and run it again?
- CW: yes.
- KN: this is general enough that somebody's probably already done it before.
- MM: that'd be fine.
- DM: specification at view creation's probably better anyway. Would reduce number of underlying objects. Also better API than what's in the spec.
- CW: think so too.
- KN: specifically because of D3D I think it's an improvement.
- MM: only bringing this up because it's not an improvement on Metal. I'd like to see some numbers indicating the motivation for it.
- CW: we can produce some numbers.
- KN: should we close #859?
- CW: yes.
- DM: I guess so. If we are waiting for benchmarks, I'd wait for those benchmarks before closing.
- KN: would that influence taking #859 though?
- DM: if nobody would champion #859 then we can close it.
- KN: we can always reopen it if we think it's a mistake. But right now don't think we want to pursue it.
Document compute pass creation #846
- CW: let's work offline.
Documented render pass rasterization state methods #850
- CW: let's work on this offline.
Add gpu texture format supported by copyImageBitmapToTexture destination #849
- CW: Let’s review this offline.

Agenda for next meeting

Let's just dump topics in agenda for virtual F2F. Aside from benchmark and couple PRs, is there homework people would like us collectively to do?
MM: I've got a bunch of homework to do.
CW: and we've got the benchmark.
CW: if anyone thinks of something, send it to the group and hopefully we can get to it by next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly