GPU Web 2022 02 16

GPU Web 2022-02-16

Chair: KG

Scribe:

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.
texture_2d_depth for loading depth values might be unnecessary? #2094
Expose WGSL errors in the API #2308
Consider removing GPUPipelineLayout #2550
Limit for the maximum buffer size #1371
Why are bind group layout fields spelled differently from their WGSL counterparts? #2469
mapSync on Workers - and possibly on the main thread #2217
Streaming implementations and indirect draws/dispatches #2189
Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388
Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

Apple
- Dean Jackson
- Myles C. Maxfield
- Robin Morisset
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Corentin Wallez
- David Neto
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Shannon Woods
- Shrek Shao
- Ryan Harrisson
Intel
- Brandon Jones
- Bryan Bernhart
- Hao Li
- Jiajia Qin
- Jiawei Shao
- Jie Chen
- Shaobo Yan
- Yang Gu
- Yunchao He
- Zhaoming Jiang
Kings Distributed Systems
- Daniel Desjardins
- Hamada Gasmallah
- Wes Garland
Microsoft
- Jesse Natalie
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jim Blandy
- Kelsey Gilbert
Unity
- Brendan Duncan
- Dave Shreiner
Andrew Varga
Connor Fitzgerald
Eduardo H.P. Souza
Francois Daoust
Francois Guthmann
Jeremy Sachs
Joshua Groves
Matijs Toonen
Mehmet Oguz Derin
Michael Shannon
Nathan Johnson
Pelle Johnsen
Rick Battagline
Rob Conde
Timo de Kort

Administrivia

Corentin out for a few weeks
Dzmitry moving on - thank you on all counts!

CTS Update

KN: our team's working on the CTS a bit. Did 4-week review. Didn't make as much progress as we'd like, aim to focus more on it in next 4 weeks. Intel team collaborating.
KN: we've been getting some tests done though. Need more help. Lot of testing work to do. Big list of tasks - all available - anyone can take one.
KG: CTS is a big blocker for us collectively.
KN: in Chromium's shipping timeline - want all the core stuff in the CTS done. Good coverage. Running on Mali/Adreno to make sure everything we spec works on those mobile platforms. Important, blocks our shipment. We are also investing a lot in it.
MM: are these the only blocking GPUs?
KN: we test on all the desktop GPUs already. We can do Apple Mac GPUs too - those are on the list too. We don't do iOS though - at least not in Chrome.

“Tacit Resolution” Queue

Extend lifetime of GPUExternalTexture imported from video element #2302
- KN: Importing external textures from video elements
- Conservatively spec'ed initially. Textures would expire quickly.
- Devs need to re-import the same frame multiple times. Looks expensive, even though we'd cache inside the browser.
- Extends the lifetime in a well-defined way - until frame's no longer current.
- Expires texture at very specific time in frame's lifetime.
- Think this is a fairly safe change, and better constraint than what we had previously. Should work better for programming against.
- Does add a boolean attribute to GPUExternalTexture saying when it's expired. Transitions at a well-defined point.
- KG: this is at some microtask boundary?
- KN: yes. Right before requestAnimationFrame, or if you have requestVideoFrameCallback - it's right before that.
- MM: if I import a video frame, and next line does the same thing - guaranteed to get the same frame?
- KN: I don't think so. Not 100% sure. Videos behave asynchronously, change behind the scenes.
- MM: what happens in WebGL if you texImage2D twice in a row?
- KG / KN: not guaranteed to get the same frame. Same guarantees here.
- KG: if you had 1000 fps video for example you could theoretically get two different frames.
- BJ: "expires" flag and requestVideoFrame loop aren't necessarily ticking for each frame in this case.
- KN: if you imported twice you could get 2 different objects that expire at 2 different times.
- MM: memory use would be proportional to frame rate of video if you imported a bunch of times.
- KG: potentially.
- BJ: could you observe in a lightweight way that you did get a different object back? Guaranteed to get same JS object back if frame didn't change?
- KN: pretty sure you get a new JS object.
- BJ: imagining blog post where someone tries to determine frame rate of video by repeatedly re-fetching frames.
- Discussion about frame rates, time stamps of frames.
- KG: texture uploads from video are naive in WebGL - tried to provide additional information like width, height, timestamp - didn't end up finishing those proposals.
- KN: and WebCodecs came along while implementing this, and VideoFrame handles this case.
- KN: concerns with landing this?
- MM: no - just think some more design work may be needed to ensure the tutorial Brandon hypothesized doesn't get written.
- MM: one more question - there's requestVideoFrame - will these handles expire at the next 60 Hz rAF or the next frame of the video they came from?
- KN: the latter. Doesn't expire until after the video frame isn't current any more. At next boundary when video advances to next frame - the GPUExternalTexture will expire. Last and current external textures' lifetimes may overlap slightly.
- KG: think we're good to land this.
- MM: this is good, you can write a naive requestVideoFrame loop and it'll work.
- KN: yes. Can drive rendering both by video frame rate and animation frame rate. Works well in both cases.

Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.

MM: still blocked on me, still haven't done it. Not fun, keep putting it off. I'll try to do it by the end of the day. I apologize.
KG: I need to make a pass through it too.
MM: I realize the deadline for this task isn't June, but taking into account testing time, PING discussion time, etc.
BJ: anything you can share about the direction(s) you're thinking about? Let us better prepare, what API shapes would be workable?
MM: would prefer to not answer this yet - think people may be concerned
BJ: understand the concerns. Still any text would be appreciated.

texture_2d_depth for loading depth values might be unnecessary? #2094

MM: idea here - think we had consensus on this months ago but didn't write resolution - only one popular human writable shading language has a type for depth textures. Others don't. When transliterating shaders, source shader likely won't have types for depth textures. Ease-of-use concern. Want to produce WGSL, but source shader doesn't' have depth texture type.
MM: general consensus - allow color texture types in the shader to be bound to depth textures, can then sample from them. In API, continue to have bind point be depth_texture bind point.
MM: 1) In WGSL-to-Metal you'll have info about what are depth textures at the API level. Can produce valid MSL.
1. Not needed when translating to HLSL.
1. Doesn't delete the depth_texture type from WGSL; you just don't need to use it when binding a depth texture. Methods on depth texture types are different than color texture types. Don't have to use the type if you don't need the extra methods.
MM: think this satisfies everyone's concerns.
DM: it was the gather operations where we stopped. DOn't know how Metal will handle gather + stencil textures.
MM: stencil textures are just textures. Wrote test app, they do work.
DM: they also work on depth textures seen as depth32 textures?
MM: don't remember.
DM: depth textures also have combined formats. Shader does the gather - still need to do it as a gather on a depth+stencil format.
KG: so you want to make sure this works by law rather than de facto?
MM: it's valid ot bind stencil texture to color attachment.
KG: for depth though. You're not supposed to use those as color. Should we double check, or are you pretty confident?
MM: I think we should assume it works.
KN: concern with stencil - not that it's not allowed - but that G,B color channels might not be defined.
MM: it's not defined. I ran test on different hardware, they produced different results. Don't remember 100%.
KN: if depth works the same way - legal, but not defined results for G,B,A channels - we need to at pipeline creation time transform these in some way so that we hide the undefined results.
KG: same thing we need to do for stencil.
DM: we don't have a stencil binding type.
KG: thought we decided that.
KN: right.
MM: not a security concern. "sample this" is something the compiler can see, and expect all the results are defined.
KN: wanted to ensure it's legal with depth so we can define it the same way.
MM: we should do that.
DM: so - do we have to have a depth binding used for all the depth textures? or can i use f32 binding type?
KG: if just sampling - can use f32 type. depth comparisons -> need depth binding type.
KN: that's distinction between options (2) and (3). can the mismatch be between the pipeline and the shader? or texture and pipeline?
DM: that question has been in the air for months. If pipeline knows about depth - we can polyfill everything.
KG: so this would require pipeline time compilation on Metal.
MM: would require producing MSL when you have the pipeline layout. Already true.
KG: should we just ditch depth and see how far we get?
KN: option (2) then. Fine choice; concern that it's still annoying. Have to get pipeline layout correct.
MM: Kai's comment from Dec 15 2021 (?). OK. DM's asking whether option (3) is acceptable, should follow up on that.
KN: I'm happy to do option (2) and consider (3) post-V1.
KG: I thought we were focusing on (3).
DM: thought we had all the info we need.
MM: info missing - if you write MSL program that doesn't use depth texture type, and bind depth texture to it, and call gather() - what happens? I don't know.
KG: we suspect it behaves like an R32F texture.
MM: not necessarily. Program I described, if you bind stencil texture - behaves differently.
KN: only on the other channels?
MM: yes.
KN: I strongly suspect depth would be the same. Don't know whether it's legal / spec'd as not legal.
MM: spec's clear here. Depth texture requires depth type in shader. Option (3)'s against Metal's spec.
KN: I agree
KG: but we know it works.
MM: we know it works for sample operations. No evidence that gather() operations de facto work. The combination is rare.
DM: concern - if we decide option (3) is fine, all the work going into option (2) is wasted.
MM: will anyone implement this in next couple weeks?
DM: option (3) doesn't require impl - just allowing it.
KN: if you can come back with an answer in a couple of weeks, we can leave this open.

Expose WGSL errors in the API #2308

KN: needs a proposal. Nothing to discuss.

Consider removing GPUPipelineLayout #2550

KN: in editor's meeting - concerns about getBindGroupLayout. (1) rather than automatic pipeline layouts, automatic bind group layouts. At that point GPUPipelineLayout object becomes pointless. These are orthogonal. Can do one without the other.
GPUPipelineLayouts - value objects. No internal state. Typically represent an object in the API. We can do value-to-API-object cache keyed on (…)
It's an object in the API used to pre-bake a value. Not super necessary.
BGLs are similar, but used more widely. Passed to pipelines, bind groups. Pipeline layouts mostly only passed in pipeline creation, can be passed by value there.
MM: what's the benefit?
KN: getting rid of boilerplate of creating pipeline layout so you can immediately use it. Can do things with pipeline layout description - e.g. partial. Spec one for one BGL, leave one undefined, let default algorithm to it for me. Make automatically gen'd pipeline layouts finer grained.
MM: could keep pipeline object and say if you want to create one of these you have to fully fill it out. Could supply either it, or array of bind groups.
KN: yes. That's why they're orthogonal. Original idea was to do that. Pass array of bind groups rather than pipeline layout object.
MM: fine with me.
KG: sounds like consensus is we should try this.
MM: sure.
KN: primary concern - making changes at this date where arguably not necessary. This is definitely not necessary.
BJ: implicit pipeline layouts have been consistent source of developer complaint. Confusing. We need better communicated errors. All-or-nothing thing right now. This would make them much more useful. Could share same frame/camera uniforms across all pipelines, and have pipeline only implicitly fill out material variables they need. Not necessary, but useful change addressing most complaints about API I've heard.
KG: needs-spec then.
KN: even if we decide we want to put off finer-grained implicit layouts - this makes it cleaner to add that later. That is a bigger change. This one is smaller.
MM: point of implicit layouts was to make devs' lives easier. If they're adding complexity we have to do something here.
BJ: they're useful in certain narrow scenarios now. This will make them more generally useful.
MM: OK.

Limit for the maximum buffer size #1371

KG: sounds like there might be consensus to add this. Do people need more time?
MM: this is the question - expose this limit to website, or define a new kind of error?
MM: second one's better from fingerprinting POV. Sounds like Kai's not sure whether it's sufficient.
KN: I'm not concerned about fingerprinting the limit. Impls have to make sure it exposes 0 add'l bits of information. We'd probably have ~2 buckets for this - 1 GB, and some big number.
MM: there'd be at least 3 buckets off the top of my head.
KG: think more about this?
DM: was there an argument - on devices where this is low, like 256 MB - you're likely to run into real OOM?
MM: no. Not about memory on device - about limits. App on iPhone - you only get a fraction of the device memory.
DM: I'm saying - we have iPhone 6 with 256 MB, iPhone 8 with 700 MB - fair fraction of their total memory. Like half.
MM: jetsam limit for apps has been ~700 MB. Different than device memory.
DM: that's independent of memory amount and this limit too?
MM: yes, different thing. Device memory; amt of memory app can use (historically >700 MB); maximum buffer size. The reason latter exists - in Metal, no concept of binding. Attach buffer to shader - just attach buffer. Other APIs - there's a max binding size, but Metal doesn't have bindings. We express the limit as max buffer size. Imagine pointers only had certain number of bits in them on the GPU.
DM: do you suggest doing anything different than OOM?
MM: yes, you aren't OOM in this case. Imagine GPU pointers were 4 bits wide.
DM: I'm talking about the third limit.
MM: confused. (a) memory on device. (b) memory you can allocate before you're killed. © max amt of memory you can allocate in one buffer.
(b) causes the app to be killed.
KN: it is OOM. if you release resources you can handle it.
MM: historically we don't - just clean things up inside the browser.
KG: for this limit we're talking about ©. is there a max useful size for buffers? thinking about exposing this in device limits. That's one way to do this. Other way proposed - we don't tell you the limit, just try making a buffer, if too big we'll fail.
GT: I don't understand the fingerprinting issue. You can bisect the buffer sizes.
KN: and there are only a few limits.
KR: I can't imagine this taking more than a few hundred ms. Think we should just tell developers the limit.
KN: also Metal has a few of these small limits. Think we could just say the limit is 747 MB.
KG: for this issue can we get consensus on this being a limit? Want to think about it more?
MM: let me do more investigation to understand how many buckets there would be.
KN: we don't have to expose all possible buckets. We could do 747 MB as the base, and a couple of larger buckets. It's useful to know what HW landscape looks like.
MM: if every device has wildly different limit that indicates buckets won't work.
MS: in vulkan it's always 2 GB or greater.
GT: why is this limit special? Are the others bucketed?
KN: we want to bucket the other limits. Actually bucket into device categories, then return the same limits for classes of buckets.
GT: is there a database for devices? What about big textures but small buffer limits?
KG: it's a concern - we see families of devices incompatible in these ways when they're also incompatible in other ways. Might have templates for e.g. Intel iGPUs; dGPUs; etc.
KN: open research problem. Also open question about whether there are tiers.
KR: can we make a deadline for making a decision? Got to start driving some of these smaller discussions to conclusion
MM: Kai's discussion about bucketing the whole struct was convincing. Think we can proceed this way for now. Want to do some research on this, so will come back to see if these limits are close or are a big spread.
KN: defining the buckets will be interesting.
MM: something we (Apple) would like to work with the CG on. Want the buckets to be widely agreed upon - if not normative
KN: also part of the open question
KG: ties into questions for the PING.

Why are bind group layout fields spelled differently from their WGSL counterparts? #2469

Already closed!

mapSync on Workers - and possibly on the main thread #2217

Streaming implementations and indirect draws/dispatches #2189

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

Agenda for next meeting

Feb23
Topics not covered this week + more
[Process] Extensions directory in gpuweb/gpuweb #2301
[Process] Requirements for additional functionality #2424

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Web 2022 02 16

GPU Web 2022-02-16

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.

texture_2d_depth for loading depth values might be unnecessary? #2094

Expose WGSL errors in the API #2308

Consider removing GPUPipelineLayout #2550

Limit for the maximum buffer size #1371

Why are bind group layout fields spelled differently from their WGSL counterparts? #2469

mapSync on Workers - and possibly on the main thread #2217

Streaming implementations and indirect draws/dispatches #2189

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

Agenda for next meeting

Clone this wiki locally