WGPU Web 2022 03 16

WGPU Web 2022-03-16

Chair: KG

Scribe: KR

Location: Google Meet

Previous meeting: Agenda / Minutes for GPU Web meeting 2022-03-09

Tentative agenda

“Tacit Resolution” Queue
[Process] Requirements for additional functionality #2424
Allow unmasked adapter info fields to be requested individually. #2635
Make rg11b10ufloat RENDER_ATTACHMENT:yes and resolve:yes. #2659
Streaming implementations and indirect draws/dispatches #2189
Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388
mapSync on Workers - and possibly on the main thread · Issue #2217

Attendance

Apple
- Dean Jackson
- Myles C. Maxfield
- Robin Morisset
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Corentin Wallez
- David Neto
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Shannon Woods
- Shrek Shao
- Ryan Harrisson
Intel
- Brandon Jones
- Bryan Bernhart
- Hao Li
- Jiajia Qin
- Jiawei Shao
- Jie Chen
- Shaobo Yan
- Yang Gu
- Yunchao He
- Zhaoming Jiang
Kings Distributed Systems
- Daniel Desjardins
- Hamada Gasmallah
- Wes Garland
Microsoft
- Jesse Natalie
- Rafael Cintron
Mozilla
- Jim Blandy
- Kelsey Gilbert
Unity
- Brendan Duncan
- Dave Shreiner
Andrew Varga
Connor Fitzgerald
Dzmitry Malyshau
Eduardo H.P. Souza
Francois Daoust
Francois Guthmann
Jeremy Sachs
Joshua Groves
Matijs Toonen
Mehmet Oguz Derin
Michael Shannon
Nathan Johnson
Pelle Johnsen
Rick Battagline
Rob Conde
Timo de Kort

Administrivia

CTS Update

KN: Our team's been reworking how we run parts of the CTS.
- Lot of little things that add up and make running it difficult. Wrote reply to Myles about things we're working on resolving.
Fair amount of development. This week our team's doing a CTS Hackathon. Focusing on our CTS tasks. Pretty good progress.
Made headway on the stuff I wanted to work on on the CTS.
Still - huge task list, lot of stuff to do.
MM: thanks for that email - very helpful.
KN: you're welcome - I know it's a lot - hope it helps you run the suite with less pain.

“Tacit Resolution” Queue

(none)
MM: queue descriptor - I pushed this - any concerns?
MM: right now no way of making queues, but there's a default queue. Without queue descriptor, no way to tell the browser what is the default queue. Made a descriptor object, made device descriptor take one of them. Can pass in the label of the default queue.
MM: think this will work well with future ability to add more queues.
KG: seems fine. Can always change it later.

[Process] Requirements for additional functionality #2424

KN: apologies, didn't solicit input from rest of team. But I did myself catch up on the thread and reply.
KN: had some minor disagreements, but don't think had blocking disagreements with content in the PR.
CW: doc here - is guiding principles, but also non-binding to community/working groups. To help people understand what we're doing and specifying. Can bring up exceptions to the group. Doesn't need to be perfect for us to land it.
- Only really binding thing in this group is the charter, which was ratified.
MM: purpose of doc - if we get into a situation where we think we need to modify it, we'd need to think & discuss. Can make PRs against this doc like anything else.
KN: pretty sure we're happy with it.
MM: any last minute changes?
CW: press merge button; if we see anything we'll make a PR.
KN: agree, don't need to re-review all text.

Allow unmasked adapter info fields to be requested individually. #2635

Actually, it is already merged.

Make rg11b10ufloat RENDER_ATTACHMENT:yes and resolve:yes. #2659

CW: my understanding - some sort of emulation on Adreno 500 and below. Either emulating copies, or allocating temporary render targets to render into and then recompress.
So far, avoided doing this with textures. This one's special because:
(also we'll have feature levels in the future - e.g. float32 formats not filterable for some reason. we might have float32 blendable, this other thing renderable, etc.)
If we make these features, we don't need as much implementation for V1.
MM: general question of what should be polyfilled & not - if browser didn't polyfill it, what should web app do instead? If app using fatter textures, indicates we can do that for them. If they'd need to rewrite application…
MM: app can use small or big texture. Imagine both are possible. Which one to use, if their data fit into both of them? How to choose which to use? If they're under memory pressure, for example. We can do a better job for them than the web app.
KN: last bit I don't agree with. If data fits into both, you'll always want to choose the smaller one. App would say - if this is available I'll use it, otherwise this other one. Not a complex decision. Think it's OK for this to be a feature.
KN: this one specifically - what do we do when it's a really simple thing that we don't have because of e.g. Adreno 500? This one happens to be polyfillable, so some appeal to doing that.
KN: also possibility that we drop Adreno 500 from core. Not sure if we should, but it's getting old.
KG: one position - if we have to do this polyfill it's not exciting to do before V1. We're producing V1 spec / feature level - we don't have to have Adreno 500 necessarily as part of the first release. (can add support in browsers for adreno500 in subsequent releases)
MS: one example - only time we use wider float16 is if the packed format isn't available, or on debug builds.
MM: because debug tools work better with wider textures?
MS: that was a decision made before I joined. :) not sure why.
CW: thanks mrshannon, good data point. If we can support this feature in core, something in the back of our minds is still WebGPU-Compat. Subset, but not complete core WebGPU. Might not implement WebGPU core on Adreno 500 then, and remove this packed format from the features.
MM: sounds like, eventually, this polyfill could be written. And then it would allow Adreno 500 to support core. We have some precedent with depth24+ format to use wider textures when platform doesn't have support for the smaller version.
CW: w.r.t. not supporting core WebGPU - many things you can polyfill - how much effort do you want to go to? What performance do you want? This format might be too much. w.r.t. depth24+ - explicitly specified as 23+ bits of mantissa.
MM: if we change spec language would that alleviate concerns?
KG: are you suggesting RG11B10F+?
CW: no. :) depth24+ formats are wonky - lot of strange limitations. Just copies if we had this format, would have to be removed or explicitly emulated in the spec.
KN: think we could add that RG11B10+ format that isn't copyable. Not sure it satisfies needs. Reason depth24+ is way it is - some platforms have one, some the other - nothing every platform has. There's nothing forcing us to have the "+" option like we had to with depth.
KG: goal here - floating the idea of having this in core. If we can't do it - that's fine. Think we're close to being able to. Comes down to, how much do we want to ship V1 on Adreno 500?
MM: I think the next step is to have member companies figure out how important this addition to the standard vis-a-vis adreno500 and v1 releases.
CW: we'll look at that. (and others can, too, if they're interested.) We'll come up with compat thing later anyway, and will definitely include Adreno 500 there.
MS: if you do drop Adreno 500, then firstInstanceIndirect wouldn't need to be a feature, either.
KN: we can do that query against GPUInfo, see if it looks the same.
MM: can it be polyfilled later? Here, this narrow format can be polyfilled later.
CW: unlikely to ever do that.
KG: think we should take a week and discuss internally.
MM: also - idea of this compat spec isn't something this group has a resolution on.

Streaming implementations and indirect draws/dispatches #2189

Myles has been writing lots of webgpu patches instead of thinking about this; can we defer a week?
KG: we support you writing a lot of patches.

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

Myles has been writing lots of webgpu patches instead of thinking about this; can we defer a week?
CW: UMA storage buffers - I made a proposal a couple lines long. We can have a UMA extension - enable map() with writable buffers with READ_ONLY usage. And vice versa.
CW: If later we have read-only ArrayBuffers we can have other functionality.
CW: would be nice if we were able to say - you can have any usage and it just works - but not possible on D3D, and isn't best thing to do. Lots of complexity, e.g. with discrete GPU and also cross-process. Can have a memory "thing" which spans all 3 items - GPU, GPU process, renderer process. Also need consistent behaviors on all systems. Writes to JS have to be visible to JS for readable buffers. Complicated.
CW: that's why I think only way for proper UMA support is via an extension.
MM: would this extension also be present on discrete cards? And extension would say, your writes might not be present if you read form it?
CW: no, behavior should be consistent always, regardless of extension being enabled. That's the main goal.
MM: so app needs: if (uma) { … } else { … }?
CW: yes. App can get best behavior on UMA and desktop - there are cases today where you can do the optimal path is buffer mapped at creation, or update buffer in pipelined fashion during GPU execution.
CW: case not handled: big buffer, need to change data after creation, but not always used by the GPU. Don't know when apps would do this. Useful to think of UMA extension because it helps that case. We should already be pretty optimal in most cases though.
MM: think argument makes sense. Not 100% sure I agree. First statement about 2 buffer upload mechanisms is false though - there's a third, mapAtCreation. That would work when streaming data from CPU to GPU. Not the other way around though.
MM: backward direction is definitely less common. Need to do more research.
MM: other thing - we should try to describe somewhere that mappedAtCreation's expected to be more performant than creating buffer and mapping it.
CW: should be in non-normative text at least. Brandon made a best practices doc on uploading data with WebGPU. writeBuffer - but mappedAtCreation's pretty good, too.
BJ: that doc's in flux - please suggest improvements.
MM: committed to our repro?
BJ: not yet. Not a good time.
MM: link to it please?
BJ: will do.
CW: think everyone wants to make UMA work amazingly well. But, amazingly hard while keeping consistent behavior from JS side, and keeping D3D constraints in mind, and single source for GPUBuffer, etc. Optimizations you want to in the browser later too. Happy to discuss details with people. Wish we had a better story for UMA, but I can't find one.
MM: believe you, just don't think we should say it's impossible.
CW: also happy to discuss offline more. Maybe in office hours.

mapSync on Workers - and possibly on the main thread · Issue #2217

BJ: wrote something. Kai and I discussed mapSync. Point came up multiple times - you can already do this with clever combinations of writing to the canvas + existing mechanisms like toDataURL. I tried it. Wrote a comment. Effectively polyfilled mapSync.
BJ: Jim Blandy's is the appropriate reaction. :)
BJ: you give it a MAP_READ_SYNC usage. Replaces it with STORAGE. Calling mapSync function - uses shader to write storage buffer contents to canvas. toDataURL, upload to WebGL2 texture, readPixels it back. Smallest memory usage/copies. At least 3x memory usage, probably 4 copies of the data. But, it works! Looks/acts like mapAsync, but synchronous.
BJ: at least 4x slower, maybe more.
BJ: overall it's a bad idea. Hope nobody uses it.
BJ: but we've had feedback from multiple places, including internal partners - code they're porting simply requires sync readbacks, or would take so much rearchitecting to not use it, not worth their time.
BJ: distinct disadvantage compared to WebGL - which have readPixels, getBufferSubData. A reason people would stay with WebGL when they'd rather update to WebGPU-based code.
BJ: think we should consider this, if for no other reason - make sure we represent a strict superset of what WebGL offers.
MM: q: on D3D12, where staging buffers are required - how do they do sync readback?
KG: you can do it through staging buffers.
MM: don't you need to schedule a copy from real buffer to staging buffer? Different timeline?
BJ: to be clear - we have that requirement too. Can only mapRead a staging buffer. Must have copied buffer earlier. No compute shaders on staging textures. Would have had to do the copy to staging buffer before mapSync. Appears to be doable synchronously. Wait for that operation to finish, then surface the data. No different than readPixels in WebGL.
MS: to answer a different way - we use persistently mapped buffers and GPU fences to wait.
MM: so you'd need sync map & sync wait for queue completion?
KG: the sync map is the wait.
MS: you don't need both.
KG: you can make one with the other. Think it goes both ways. Enqueue copy to staging; insert fence; wait for fence to complete.
MM: I get it now.
BJ: we already do it. Otherwise my polyfill wouldn't be possible.
KG: there's a lot of nuance to this. Part of the reason we haven't run into this with getBufferSubData is because WebGL2 adoption's low. Have seen readPixels happening; does cause problems. Difficult to do this thing because you're not supposed to. Dissuades people from doing so. Interesting point - too principled to stand by that as a blocker. You might need to do this sometimes, just please don't abuse it.
BJ: one thing - we do see readPixels in WebGL, can be a perf problem. But, doing the right thing in WebGL with buffer readbacks - not possible until WebGL2. Difficult to do async buffer readbacks in WebGL2. Annoying, non-intuitive. Here, async path's obvious. No concerns pushing that in every way. Think people will use it. Hopefully, we can present the async path as far more attractive than the sync one. But we still do run into some subset of uses where it doesn't work for them.
MM: agree with Kelsey, things we don't want authors to do should be hard. Even if it's possible for authors to do the hard thing, if they have to write hard code to do so, that's intended. Don't think having mapSync next to mapAsync will be good - it's convenient, people will call it. Think having to jump through APIs is a good thing.
BJ: think there's a stratosphere - making it hard, needing crafty incantations - not sure I'd want something sitting next to mapAsync. No qualms making it yell at the console on first use. "This is going to be slow." Open to many options to make it harder to reach for. But, at end of day if you still need it - don't want it to take 4x the time, memory, copies than a native impl.
CW: to make a note - should separate mapSync in main threads from it in workers. Seems OK to have sync APIs in workers. Easier to decide we want it in workers? Having it on the main thread introduces jank in animations, or worse.
BJ: in general yes - politically easier to put blocking APIs in workers. Have heard some pushback on that in some forums. With Kai have heard - don't want to split the API if we can avoid it. If there's the will to make it work in a worker and not on main - worth a try. May not solve all use cases. There are plenty of cases where it's difficult to do a particular kind of work on a worker. May need keyboard/mouse, poll from video - things that don't work on workers.
KR: Wanted to advocate for splitting this to something only available on the worker. If this group can come to agreement - there are other synchronous APIs on workers. There have been many discussions with people, especially TensorFlow.js and their users, where they just can’t achieve what they want on WebGPU. And they get a lot of benefits from WebGPU and would like to use it. If we can agree to put this on in DedicatedWorkerGlobalScope then people can experiment with it and we’ll have more input on supporting this generally.
MM: agree splitting the convo into 2 halves is valuable. Cost/benefit tradeoff is different on workers than main thread. Hope we aren't the first API to add a synchronous function that takes many frames. Hard to get past TAG. But, on workers there are pros and cons.
MS: concerns about adding this. If you do add mapSync - what will happen is, in Wasm - people including myself, maybe, will use it to get around the fact that we can't yield. And yet, mapAsync to return within a frame. Will cause it to be used a lot more than you intended to get around another issue with WebGPU in Wasm.
MM: what's the other issue?
MS: yes. Essentially - other APIs can map, then query when the map's completed. Reason that's an issue - can't do it in Wasm because you aren't yielding. None of buffer maps resolve until next frame unless you proxy things. If mapSync's available, it's an easy but terrible workaround. We'll use it because mapAsync never resolves in the same frame.
KN: that's a really good point. Thanks for raising it.
GT: trying to understand guiding principle. Can block JS by iterating through million element array, etc. Why is this required to be async and others not? JS's new findLastIndex can block if you have an expensive callback.
MM: think the answer is - because the website's written the code that takes a long time to execute. Here, you're blocking on another process. In JS you're not.
GT: the other process is an implementation detail. There are single-process browsers; there are SW WebGPU impls. Trying here to find out why this has to be async.
CW: think difference is - some things are algorithmic things you have to do in JS right now. Others are I/O we don't know how long will take. Maybe you wrot ethe WebGPU code - know how long it willt ake on the GPU. Major difference - when you need to do DOM transformations, need to do it now - GPU operations are inherently async. People want sync mapping, because we're used to it in native, and in WebGL! But people don't want to change their code. It's a friction point. There's nothing blocking the developer from updating their code. When doing JS to do DOM updates - need to do them for this frame, so can't wait. That's the difference. JS has to do the update right now - or in the other case it's developer convenience.
MS: there is a case where it's not possible at all to refactor - we have that, and the other Brandon mentioned they have the same issue in public APIs we can't change without breaking everyone's code.
CW: in terms of web platform - it's still your code. The code that runs in the browser. All under the developer's control. Agree, amount of friction's large - but it's still friction.
KR: TF.js feedback is they just can’t get done what they need to get done. Orders of magnitude slower than what they can get with WebGL.
KG: Hope we [could resolve that]
MM: Can you link that?

Agenda for next meeting

Next meeting: Wednesday March 23 noon-1pm (America/Los_Angeles)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGPU Web 2022 03 16

WGPU Web 2022-03-16

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

[Process] Requirements for additional functionality #2424

Allow unmasked adapter info fields to be requested individually. #2635

Make rg11b10ufloat RENDER_ATTACHMENT:yes and resolve:yes. #2659

Streaming implementations and indirect draws/dispatches #2189

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

mapSync on Workers - and possibly on the main thread · Issue #2217

Agenda for next meeting

Clone this wiki locally