Minutes 2022 10 26

GPU Web 2022-10-26

Chair: CW

Scribe: KR

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
mapAsync(WRITE) zero-fills the relevant region of the buffer #2926
Permissions Policy for WebGPU and for unmaskHints #3483 (Kai detailed the possibilities)
Shader stage input-output subsetting rules #2038 (new info from the Chromium implementation)
How do we specify the timing of external texture expiration? #3324
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Ken Russell
Microsoft
- Rafael Cintron
Mozilla
- Erich Gubler
- Jim Blandy
- Kelsey Gilbert

Administrivia

APAC meeting next week! And for the next N weeks.
MM: final schedule?
CW: 2 APAC, one EMEA, I think.

CTS Update

CW: lot of emails on CTS.
KR: Contract with Igalia still ongoing with an increased focus on operation tests. We're hoping to have our teams work on more tests as other work ramps down. Writing associated tests for new work like interface matching between shaders.
CW: also tests for new builtins like float16 pack/unpack.

“Tacit Resolution” Queue

Empty!

mapAsync(WRITE) zero-fills the relevant region of the buffer #2926

MM: gave it a read-through and some thoughts. Not confident I can cover it - there are a bunch of configurations / tradeoffs. If group wants to discuss I can, but I'm not ready to drive the conversation.
CW: think it's important we find resolution. This is complex part of the API. Changes will take time to mature. If we make a change, browsers will implement, and find problems with various areas. Similar to how buffer mapping sat a while, and Jim, while implementing, found various aspects of multiple promise resolution.
KR: can AE speak to this?
AE: I wrote this comment covering this topic.
MM: couple thoughts. Part 1, should there be an extension? I'm not sure about opt-in vs. any other form I described. Second one, big motivator for my proposal was - Austin made a good point that many configs will be triple-mapped. We expect all Apple Silicon & Intel to be. For the non-triple-mapped buffers, this has a big memory (...) - think it's important to give those consideration. WebGPU needs to be good on all devices.
CW: this is opt-in? You said "How's this different from the opt-in you suggested?"
MM: I suggested a couple options - boolean where the browser tells the website, you might not want to use mappable buffers for your own computations, like storage buffers. App would need two code paths. Other option - a "bake" step, app can opt in and call in for increased performance.
KG: perf to increase, or mem usage to decrease?
MM: bake step - performance.
KG: something user can't do themselves? Or ergonomic thing?
MM: bake step would cause the buffers to be flushed or GPU-GPU copy. Not super sure.
CW: speaking for Austin - diff between your opt-in and Austin's opt-in is, if it's a browser extension, browser can not expose the functionality on HW where it's inefficient - like discrete GPUs. Mappable storage buffers - really slow.
MM: trying to balance this against unextended WebGPU which has bad memory usage without the obligation to do anything.
CW: would it satisfy this worry about increased mem pressure if WebGPU had the UMA extension in the spec today?
MM: stated many times before - looking for a solution that's not an extension at all. Single code path that doesn't have 2-3x memory usage.
KG: this is inherent complexity. There are inherent tradeoffs. Something has to give. If you want something in a different place, there are different disadvantages. There isn’t a perfect solution here.
CW: how the memory system's structured is far-reaching architectural decision of GPUs. Hard to make a one-size-fits-all solution. Not doable to have something working great on UMA and great on discrete and no disadvantage. Spec today's skewed toward discrete, because we don't have a UMA extension. Would be worried that if we baked in the UMA extension, or making UMA most optimally usable, would have large perf footguns on discrete. Right now spec increases mem traffic/pressure on UMA. But this is cheaper than having giant compute shader go through PCI-E every time it needs to access a storage buffer. One - making transfer operations slower. Other - making every GPU memory access slower for that resource. Scary - I develop on M1 Mac, then deploy on NVIDIA GPU and it's slower, because it goes through PCI-E for every mem transaction. This is why the tradeoff's leaning more toward what we have in the spec now, compared to the other side. If we find a solution that has a better tradeoff, or more balanced, keeps the same properties for discrete GPUs while leaving iGPUs alone, that's fine. Worry really is about memory transactions over PCI-E bus.
MM: don't think I can contribute more now. Let's talk more next week. I'll take an AI to do more formal review/analysis.
CW: I'll take an AI - I have another proposal, will think about it.

Permissions Policy for WebGPU and for unmaskHints #3483

Kai put together list of permissions that could be passed to iframes that we should discuss.
KG: do we need this for V1?
CW: if we don't have it, then to make future changes backward compatible, we must disallow any usage of WebGPU in iframes - is my understanding.
KG: my general feeling - I wouldn't have real concerns using WebGPU with software backend in iframes. Still functional. Backwards incompatible perf change, perhaps - with right deprecation strategy, think that's survivable. Functionally, we have turned this off for WebGL when we started tightening it down. More in background you are, you can't use powerPreference:high-performance. I still intend to look at a mode where iframes use software. I'm not concerned about the WebGL backward compatibility aspect that much. I'm OK pushing this post V1.
KR: About software, Chromium doesn't (and likely won't) ship Swiftshader on Android. So there's no software fallback. That's why we should probably give it some thought now. Kai already enumerated the ways to use WebGPU in an iframe, so would be lightweight doing now. Otherwise backwards compatibility problems will happen in the future, and WebGPU in iframes won't be exposed at all.
KG: think it's more nuanced - but OK. We do things on mobile - we don't autoplay sound by default, for example.
CW: to refresh - this is only for third-party iframes? Same-origin, would work?
KG: I'm less concerned about that, but we should spec for same & different origins.
CW: do same-origin iframes inherit same privileges?
KG: think so but not sure. Consider subdomains on same origin.
CW: think Twitter's Sketchfab previews. Minimal change in WebGPU V1 to allow this? Maybe, you can pass down webgpu-restricted capability? Only base device, no limits, no extensions. Allows some embedding of WebGPU, can figure out the rest later. Point of restricted mode is that there's no significant fingerprinting concern.
KG: good places to start - ability to use WebGPU, sounds good. Hardware acceleration - would like to see. I'm not that concerned about unmaskHints, could add that later. Think these are fine.
KR: What it terms into would be new tokens in the iframe sandbox attribute. Maybe there's a team (security?) that needs to review any change to that in the HTML spec.
KG: probably. Not against having an actual, concrete proposal.
CW: next step - make a PR on one or more of these attributes. This group can say yes/no whether we want it, then go to HTML spec and decouple from V1.
KG: is it important to have this done by V1 or not?
KR: Maybe we should ask our HTML spec mentors how hard this would.
KG: guess we should just ask.
CW: next step, ask our HTML spec mentors how hard this would be. Probably something semi-normative, defined in WebGPU spec referencing forthcoming things in the HTML spec.

Shader stage input-output subsetting rules #2038

CW: came to conclusion - there's a way to make this work in D3D. After bunch of investigation, Shrek found it's not possible if you're using fxc. You can't control the types of the outputs. You have problems matching vec4<u32> with vec4<float32>. If fragment shader doesn't use an attribute - you don't know what to put in there to match the vertex shader.
CW: what we're resorting to - doing shader generation at pipeline creation time.
CW: in the future, a browser that can control its DXBC / DXIL output could do this, and fix up a tiny thing at the end.
KG: difficulty seeing the concrete problem.
CW: VS outputs location 0, u32; location 1, f32. Transform suggested earlier in the issue - turn that into outputting vec4<u32> and vec4<f32>. Then have fragment shader that uses location 1, f32. That's it. What do you output as the I/O interface in D3D for this? Suggestion was, have vec4<f32> for both in fragment shader.
KG: you don't know the formats then?
CW: not at shader compilation time. Only pipeline compilation time.
MM: does D3D return an error in this situation?
CW: yes.
CW: we've pushed for I/O subsetting, but don't see this changing for D3D any time soon. Generate your own DXBC/DXIL - you can fix this up at the very end during pipeline compilation. Probably OK in the long term.
KG: conceivably compile shader optimistically? Fail later, have to recompile?
CW: that could be a way. If you produce your own DXIL/DXBC, you could generate most of the shader, and emit the shader I/O at the very end. We should keep the I/O subsetting thing, because all the other APIs to it.
KG: If D3D12 says we might do it in the future, I'd like to do it in the future too maybe. D3D12 saying they might want to do this isn't a hard requirement
KR: The point is that if we keep WebGPU's current subsetting rules, we can implement them today, just slightly less efficient. There are multiple paths to getting things more efficient: using DXC instead of FXC, or starting to use DXBC, or …
KG: D3D12 wanting to do this in the future isn't a good reason to do this today. There might be other reasons to want to do this today.
MM: you don't know it failed until you try to create a pipeline. What you're proposing would involve compiling twice in the failure case.
KG: what are we concretely talking about here?
CW: group was discussing this. WebGPU had perfect matching of interfaces. D3D12 can only do prefix matching. Then we convinced ourselves that we could implement the subsetting rule - the best we can hope for. Tried implementing that via the planned strategy - ran into issues. Still think we can keep the subsetting rules, so let's try to keep it. But because we made the spec decision based on a flawed assumption, we're coming back to tell you.
MM: thanks, sounds like violent agreement.
KG: let's keep it.
MM: so Chrome's impl strategy - compile shaders during pipeline creation?
CW: yes. That's what we've been doing for a while for many reasons.
RC: so with D3D12 will you get different WebGPU errors at different times than other platforms?
CW: no. For example, register allocation failures might come at pipeline creation time. These can come at random times anyway; it's OK.
CW: OK, let's close this again.

How do we specify the timing of external texture expiration? #3324

CW: we talked about canvas texture expiry last week. Similar problem with external texture expiration. We might want to defer this until Kai's here - but whatever solution we come to for canvas texture expiration, we can use for external texture expiration.
KG: that's what I'd prefer too - same rules for both. [specifically, I believe that the canvas texture would satisfy our needs for video expiry, and because it would satisfy and we have a path forward there, we should just do the same thing for videos]
CW: do authors have to re-import external textures every frame?
BJ: no - primary difference between this and canvas texture expiry - with canvas, getCurrentTexture, want it to expire within a given time. With these, want to check for expiry at reasonable, predictable interval. Kai's original spec - main problem was checking too frequently - at end of every task. Here - if we go with what we do for canvas, whenever you create an external texture, we'll create a similar expiration task with high priority. Then reschedule it for another check, until it's expired. At that point, we won't schedule another check until you've imported the external texture again. Maybe not what Kai had in mind but I think it's the logical conclusion.
KG: so keep external textures alive longer than canvases?
BJ: yes. But only if the actual frame associated with the external texture is still alive.
KG: you want to spec it that way? We might not expire it immediately, but getExternalTexture again might give you the same frame back.
BJ: if the video hasn't advanced, sure.
CW: sure? Maybe can import same video frame on multiple devices. Get different GPUExternalTexture objects.
KG: yes, but - if you have a device, you call createTextureExternal and getCurrentTexture, and you encode a draw of the video onto the canvas, then return - what I'm proposing is that both of those get collected at about the same time. Not useful to keep the external texture alive longer? Guess so that the app can know it's the same texture.
BJ: yes. Main guarantee we want - once you've checked something, it'll live for the course of the task. Texture won't disappear for X amount of time. Other thing - giving good general sense of when the texture expiration happens is more of an impl detail. But if it details with guarantee of how long texture will last, good to keep in spec.
MM: what is the check we're discussing here?
KG: if a new texture frame's available from the video.
KG: think we should consider - is there a new video frame ready - separate from, is this external texture still valid?
BJ: that's something I wanted to ask about. In most video encoders, does decoding a new frame tend to invalidate / discard memory from previous frames, or do we keep multiple frames alive?
KG: we keep a few frames alive. More than that, we deadlock.
BJ: then we should keep guarantees about how long this frame stays alive separate from, is a new frame available. Can have basic stuff - if there's a new frame available, we'll flip a boolean and tell you you can fetch a new frame. But won't revoke the old one necessarily.
MM: from author's POV - author can tell frame rate of video? But don't know the phase - if they're in rAF, don't know if they're in the middle of the 1/24 of a second, or at the end. Authors will have to write - when code runs in a task, check expired attribute. If they have to check that every task anyway, don't see the benefit of having the texture last longer than a task.
KG: concretely - what I'm proposing is that there's still a way to know it's frame 173 from the video. Calling createTextureExternal will tell me it's different from the previous one. Have to request it again even if it's the same.
KR: The reason to do this is to save work. E.g. if video runs at 15fps, don’t want to force the app to do more work every frame.
KR: I think we should use requestVideoFrameCallback, since that’s the way to tell whether a new frame is ready.
KG: saying you should use requestVideoFrameCallback to decide if you want to repaint, is compatible with early external texture video discard.
KR: I think we should allow external textures to live across multiple tasks.
CW: at spec level - textures are only expired at task boundaries. If in microtask, you know it won't expire. But, implementations should not expire external textures until a new frame's ready. Apps should be able to reuse external textures across frames.
KG: I'm saying we should expire external textures.
BJ: even if the frame's still valid?
KG: yes.
MM: the reason I prefer what Corentin just said is that exporting an external texture might be a copy (strange color space, etc.). Would be bad if we required apps to do this every frame.
KG: agree that would be bad. My proposal's contingent on that not being true. I'm encouraging authors to have the naive thing - each frame, I want to draw this video - want them to be encouraged to re-request the video, so that task timing issues don't cause texture expiry in a way the app didn't expect.
MM: so you're worried about the case where app doesn't check expired boolean, but - on my dev machine, frames get expired every 3 rAFs, I implement a counter to refresh the external texture every 3 frames, and it breaks on someone else's machine.
CW: I volunteer a warning in that case. If they use an expired external texture without looking at expired attribute, you did something bad.
KG: not satisfied with that solution. Other browser has to issue a good enough warning that my browser's covered.
BJ: Myles' scenario - developer keeps some manual frame counter - more work than checking the expired flag. I don't see any other scenario where you can do a pattern that's less work than checking the expired boolean, and have it working reliably.
KG: think we should think more about this in the group.
CW: I'll think of a warning.
KG: concerned that canvas / video have such different semantics. Would like people who thought a different way on the previous topic to consider the contrast.
CW: we'll discuss this more at the next meeting. On Chrome side, think we can propose something that'll alleviate some of the concerns here.

Agenda for next meeting

APAC timed meeting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2022 10 26

GPU Web 2022-10-26

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

mapAsync(WRITE) zero-fills the relevant region of the buffer #2926

Permissions Policy for WebGPU and for unmaskHints #3483

Shader stage input-output subsetting rules #2038

How do we specify the timing of external texture expiration? #3324

Agenda for next meeting

Clone this wiki locally