GPU Web 2023‐08‐16

GPU Web 2023-08-16 Atlantic-time

Note that unless stated otherwise this is a GPU for the Web community group meeting and not a working group meeting.

Chair: CW

Scribe:

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
ErrorScopes are not useful for CommandEncoder methods #4269
Extended brightness range rendering #4239 (please read beforehand)
WebGPU Compatibility Mode #4266** **(please read beforehand)
(stretch) Triage milestone 2 (tag things in milestone 2 you'd like to get to soon!)
Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

Apple
- Dan Glastonbury
- **Mike Wyrzykowski **
- Myles C. Maxfield
Cocos
- Huabin Ling
- Zeqiang Li
- Zhenglong Zhou
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Chris Cameron
- Corentin Wallez
- Dan Sinclair
- David Neto
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Shannon Woods
- Shrek Shao
- Stephen White
- Ryan Harrison
Intel
- Brandon Jones
- Bryan Bernhart
- Hao Li
- Jiajia Qin
- Jiawei Shao
- Jie Chen
- Shaobo Yan
- Yang Gu
- Yunchao He
- Zhaoming Jiang
Microsoft
- Jesse Natalie
- Rafael Cintron
Mozilla
- Erich Gubler
- Jim Blandy
- Kelsey Gilbert
- Teodor Tanasoaia
Nvidia
- Anders Leino
- Thomas Meier
Snap
- Corvyn Kusuma
- Dhritiman Sagar
- Sumant Hanumante
Unity
- Brendan Duncan
- Dave Shreiner
Connor Fitzgerald
Francois Daoust
Mehmet Oguz Derin
Michael Shannon
Rob Conde
Russell Campbell

Administrivia

Nothing today

CTS Update

KN: Will start on the work I promised next week-ish.
KN: Work happening on WGSL and compat.
KN: currently, it reuses the same device for all tests as much as possible. Considering throwing away the device after a certain number of tests, to clean up allocated objects more eagerly. Let us know if that would be useful to you and we can prioritize it.
MM: we're more interested in the limited buckets than that right now.

“Tacit Resolution” Queue

Empty!

ErrorScopes are not useful for CommandEncoder methods #4269

KG: push/popErrorScope don't work for anything on CommandEncoder aside from the validation that happens during finish(). That's roughly half the methods. We have push/popDebugGroup, can get some debugging, but not super hermetic.
- Esp. with debug groups - maybe not something we need to fix right away - but would be nice if you could use tools across the whole API.
MM: Apple has the same desire, improving ergonomics makes sense. Nobody would want the current semantics.
CW: multiple concerns. Discussed a while ago. Reason for current behavior: 1) wanted spec to allow for client-side recording of commands; build whole cmdbuf, and send in one shot to GPU process. push/popErrorScope makes it more difficult because would need to encode those. 2) right now, as soon as error in CmdEncoder, entire encoder becomes invalid. Convenient: if you fail setBindGroup and then draw, can just drop the draw on the floor. Prevents cascading errors.
MM: can we have both? Keep current behavior (errors invalidate the encoder), and have push/popErrorScope work at command granularity?
GT: what's an example of an error you can know before you submit?
CW: setBindGroup of index 1000.
MM: if we were to implement this we wouldn't do this in the web process. push/pop would go into cmd stream, and if GPU process fails validation, send message back to web process - resolve promise, etc.
KN: that's how our cmd stream works today. Could do this, but would limit what we could do in the future (client-side cmd encoding - encoding commands, instead of being sent to GPU process, are queued up on client side and sent to GPU process to build 1 cmdbuf all at once. Make cmd encoding calls cheap client-side.
KG: we do this in our WebGL impl in FF. Draw calls, uniform changes - we encode client-side, to give you good validation and glGetError stuff.
MM: would lean on popErrorScope returning Promise - can still enqueue commands. Resolve promise during their validation.
CW: that works, and is implementable. More complicated. Gains don't seem that important. Have heaps of CTS/test code relying on current behavior. Would be a large pain to change the behavior now. Not for WebGPU devs, but massive pain for us with the validation tests in the CTS.
KG: think it's too early to say we don't want to make ergonomic changes based on the CTS. Not proposing changes, but think we should make better.
KN: if we make a change I think we need to do immediately. There are a non-zero number of applications that would be affected. I don't personally think it's worth it.
MM: if hypothetically speaking the CTS tests were magically updated, would that change your opinion?
KN: no, I'm more worried about web compat.
JB: I agree with Kai, shouldn't change it right now.
LK: possible for this to be an extension on the command encoder? Immediate vs. deferred validation errors?
KN: possible.
MM: could be a compromise solution.
CW: could be a device creation option in the future.
GT: pushErrorScope only exists on device. If you want it on Encoder, would need it there.
KN: right now error scopes are a global state. Many things funnel into them. Can get encoder command errors in it if you encode something after closing them. Maybe the device-global state was a mistake
CW: I'd like to suggest no change, I don't think it matters enough.
KG: I hear you, would like this to be better, but Moz's position is that we don't care right now.
JB: you have error/debug groups now. Q is, in addition to that, error groups can be used for individual … KG, you want this, it'd be easier for developers, don't think we need to put it off if we say that we want it. Have lot of work to do, but don't think this is significant.
GT: another idea - I wrote webgl-lint. Entirely client side. Checks errors for you, gives you better messaging. Doesn't have to be in the lower-level API.
KG: thre were concerns about perf aspects. I think there's no compelling perf reason for things to be this way.
KR: I don't agree with that statement.
KG: we do this in WebGL today.
CW: WebGL has sync error semantics. Would have to do synchronous walk over error scopes
KG: if you architect things differently, wouldn't need to be a perf concern.
TBD: offline discussion.

Extended brightness range rendering #4239 (please read beforehand)

MM: lot of discussion has been what gets clamped where in what colorspace in the display pipeline. For Apple, that's out of the browser - system compositor does it. Probably true for most browsers. If compositor decides it wants to change its tone mapping algorithm it can just do that. Nothing API-exposed about it. Since this is non-observable by web content, don't think we should discuss it. Limit the discussion to the single boolean, "limit to SDR or not".
KG: don't think we can discuss the API change without discussing the broader topic, including color matching with WebGL and other things in the page.
CC: imp't that everyone agreeing to the change agrees in large part about what should happen. But color matching should be prescribed where it can be. Have an example, primaries of monitor are deep, and content was this - this is what you should expect to see. Assume non-normative, informative, maybe not even in the spec. Everyone should feel comfortable with what's going on there. Yes to both of you. Specifics, though, sit outside the spec.
MM: we won't normatively spec how tone mapping works - done by other libraries. To understand the display pipeline, need to discuss how it works, and that's OK. But don't want to say this color is clamped to this value in this colorspace.
CC: nice if we could have - this will color match this other HDR thing. Few examples where that's possible. You have enough headroom to display HDR image and you write the same pixels into a canvas - think they should display the same.
KN: concern about deferring everything to display compositor is: we do have to do something with content which pulls values that are HDR, but we don't want to display as HDR. Some definition of what we do in that case. If we say HDR content is SDR, isn't it undefined?
CC: it should color match with other SDR content. If luminance is greater -...
KN: if we have a float16 buffer containing (10, 0, 0), and HDR's off - what do you see? And can we guarantee how that color matches with other SDR content?
CC: no, think anything outside SDR range won't color match with SDR.
MM: agree with CC. Think that out of range content should be transformed however it needs to be.
KN: sounds fine. We already have premultipliedAlpha where if A < (R, G, B) then the color's illegal.
RC: on Windows the OS compositor never tonemaps for you. They think tonemapping's an art and the app needs to be responsible for it. Also it's common for some first-party apps, they have some WebGL thing with one colorspace/gamma, and 2D canvas on top, then HTML on top. Don't know what we should say in the spec, but that should show up and look good in all browsers.
CC: maybe a language difference between MM and RC; this mode's "extended", if the monitor can display it, it will. No guarantees if it's outside the monitor's caps. Tone mapping is clamping the value into a reasonable range for the monitor.
MM: basically agree
KG: I commented on the issue a few weeks ago - my comment still stands. Ambiguity between - trying to mix HDR and pushing outside the nominal gamut of colorspace. E.g. brighter vs. deeper red.
CC: with this ability, there should be a safe range you're known to be in. Is dynamic range standard or hi? Maybe you can query some quantized version of what the monitor can represent. Then you have a safe zone you can work in. And we have a media query for gamut.
KG: specific concern: display larger than either of two color spaces you're trying to match. E.g. sRGB canvas and P3 canvas, on monitor that's ??? P3. 10x / 2X HDR. Take those two things, try to make "P3 red" in sRGB - get >1 numbers. When you display that color - unclear if that …
CC: think I understand the Q. If you specify…
KN: same discussion as on Github issue.
KG: my example still stands.
KN: think no ambiguity.
CC: if I'm HDR buffer, I specify sRGB, and have some values lining up with P3 red - I think that should always match P3 red regardless of what your monitor can display. Regardless of how things are clamped. Because that's an SDR color- it behaves the same as all SDR colors, even if you requested extended range. Extended range won't get you that extra gamut unless you already had it.
KN: reason for non-ambiguity - the negative components.
KG: you can choose colors that don't have negative components.
CC: don't think it's obvious, but - if you have values >1 in sRGB, e.g. (1.001, 0, 0), it's still in the P3 gamut - still an SDR color.
KN: color.io converter breaks down in some cases.
CW: in interest of time, move on to Compat? Or wrap up with next steps?
KG: next step, me, Kai and Chris, and anyone else interested, should discuss.
MM: would like to be invited.
KG will set up this meeting.

WebGPU Compatibility Mode #4266 (please read beforehand)

SW: take 2. Couple updates from last week. meta-discussion: is everyone on board with compatibility mode being in the spec. There was a question about whether D3D10 should be in the supported spec: its compute restrictions are overly restrictive.
SW: 2 edits to the proposal. One - needed to restrict limits on texture views at bind time rather than creation time. Otherwise couldn't render to slices of 2D arrays, for example. And copies between compressed textures don't work.
MM: we did another review of this.
MM: important to start by saying - increasing the reach of WebGPU is good for everybody. We like the goal. The more devices that can run WebGPU, the more authors will target it.
MM: from Apple's standpoint, the win of this compat work is allowing people to not have to write their app twice, once for WebGL and once for WebGPU. One audience - cares about perf. They can write their app twice. Another group, interested in using WebGPU but don't want to have to write app twice, and maybe WebGPU's reach isn't broad enough, so will use WebGL. nobody wants that. Those are the people we're trying to convert over.
MM: main thrust of compat work from our standpoint is compat, not perf. Of course it's possible to construct such a badly performing impl that it wouldn't work. Main thrust - make WebGPU work, full stop, not necessarily get every last bit of perf.
MM: GL famously will run compilations whenever it wants to. One of the reasons the new APIs exist. If GL under the hood runs compilations under the hood, then we should be able, too - when running on GL. (On Metal/D3D12/Vk, have more perf guarantees.)
MM: if you accept that premise, being able to run compilation later than createPipelineStateObject gets you a lot of mileage. Looking at big list of validation, most if not all could be alleviated if we could compile later. Example: lack of texture views. In WebGPU, possible to have tex viewed as both 2D array and cube map. In WGSL we know which accesses come from which resources . Compiling late, we could recompile the program at binding time. Change the program's code. At the place which accesses the cube map view, emulate those operations in software. Will have perf impact, true. HOwever the goal isn't to get perfect perf, it's to get the API to work.
MM: to close: the reason I give this speech is that we're worried that adding this mode will fragment WebGPU usage. Having two WebGPUs is not a good design going forward. Also, the reason spending so much time - Vk/D3D12/Metal will be the way of the future, and OpenGL/D3D11 will decrease over time. Creating this mode which needs to be maintained forever, the burden outweighs the benefit.
MM: of the remaining problems that can't be solved with late compilation - think it's worth researching how to handle them.
SW: thanks for your in-depth thoughts on this.
CW: you mentioned devs that care most about perf would be OK supporting both WebGL and WebGPU, so compat mode doesn't need to be as performant as WebGL. With small amounts of restrictions mentioned here, the compat mode would be much more performant than WebGL, and that's a win for developers. If your premise is that this mode can be slower than WebGL, then I understand your point of view.
KR: Goes beyond late binding, certain operations that are incompatible with OpenGL, no practical way to emulate. Lot of texture copying behind the scenes to emulate those things. Very expensive, bandwidth intensive. Have carefully thought through - though have we done profiling?
- SW: No
KR: Our team’s experience that we want to avoid those large data copies behind the scenes as much as possible
MM: agree with you there. In this doc it was helpful to list alternatives considered. We agree that doing a full texture copy would probably be unacceptable in most cases - when you attach a texture to a shader, the shader usually won't read all the texture. So a full texture copy - much more bandwidth than the shader would use. But for the items listed in this doc, we think we have ways to polyfill all of the APIs without full texture copies. Think I should write those ideas up.
SW: would be great - or craft another doc.
MM: proposal we're making - WebGPU compat would support the full WebGPU API without restrictions, and if necessary, some way for the browser to tell the website you're running in compat mode, you may be surprised which calls are slow/fast. We don't have all the answers yet - think there are 2 items in this list that are more science projects.
CW: if we can find ways to minimize API differences, esp. efficiently, seems fine. Concerned about the claim that we don't care about performance. For adoption, people want to use WebGPU, but only if they get the reach and performance. Won't use it if it doesn't perform on X% of devices they care about. Compat was also important. That's why compat mode is optional - browsers don't need to implement it. A compat mode app in one browser will work the same way on other browsers that don't implement compat mode. Compat mode is designed to be unshipped.
MW: can we get both compat & perf by making these things warnings instead of restrictions?
SW: assuming we can polyfill everything? Could consider it then. If there were perf cliffs we would want to notify developers. Interested in some of the tougher cases. Compressed texture readbacks, cube map arrays - seems like a lot. And what restrictions does that impose on sizes of uniforms? Will need to use uniforms for passing data.
KR: Wanted to ask SW if all of the implementation options we considered internally were published in this GitHub issue.
SW: Yes.
MM: another thing: on fragmentation: one thing that could help is to enforce some of these compat-mode restrictions in real WebGPU, too. It is consistent with the idea of not fragmenting the API. Make WebGPU a little less expressive to make it able to run on OpenGL today. Only reasonable to do for non-commonly-used features.
CW: we should come back to this soon. Maybe figure out other alternatives for polyfilling to see if some more of the restrictions can be lifted.
KR: Want to confirm we haven’t come to a resolution yet. And underscore the fact that it’s designed to be unshipped.
CW: Correct, more information will help reach a better compromise.
MM: we want to be constructive, not destructive. We believe that more reach for WebGPU is good for the ecosystem.
(Out of time)

(stretch) Triage milestone 2 (tag things in milestone 2 you'd like to get to soon!)

Agenda for next meeting

Please add agenda items.
Corentin out next week.
Agenda
- Milestones have been renumbered and defined according to the tentative discussion from Aug 9. Milestones
  - KN: (1) Do the names and definitions sound good? (2) I think we’re going to need to do a full re-triage exercise but if we aren’t getting through that soon then the editors volunteer to finish triaging the now-deprecated “polish post-v1” milestone into milestones 0/1/2 based on past discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Web 2023‐08‐16

GPU Web 2023-08-16 Atlantic-time

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

ErrorScopes are not useful for CommandEncoder methods #4269

Extended brightness range rendering #4239 (please read beforehand)

WebGPU Compatibility Mode #4266 (please read beforehand)

(stretch) Triage milestone 2 (tag things in milestone 2 you'd like to get to soon!)

Agenda for next meeting

Clone this wiki locally

GPU Web 2023‐08‐16

GPU Web 2023-08-16 Atlantic-time

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

ErrorScopes are not useful for CommandEncoder methods #4269

Extended brightness range rendering #4239 (please read beforehand)

WebGPU Compatibility Mode #4266** **(please read beforehand)

(stretch) Triage milestone 2 (tag things in milestone 2 you'd like to get to soon!)

Agenda for next meeting

Clone this wiki locally

WebGPU Compatibility Mode #4266 (please read beforehand)