Minutes 2022 08 31

GPU Web 2022-08-31

Chair: Corentin

Scribe: Ken / others

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
Specify that texture_external sampling functions clamp coords to [0, 1] #2766
Colorspace conversions for GPUExternalTexture are arbitrarily complex #3293
Make error reporting unordered #3362
Multiple components cannot effectively share a GPUDevice because it is stateful #3250
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
Microsoft
- Rafael Cintron
Mozilla
- Jim Blandy

Administrivia

CW: Next week is the APAC meeting.
MM: TPAC?
We'll not go to TPAC this time, and carry the meeting schedule as usual.
CW: made new editors admins of the gpuweb org. Dzmitry is still an owner, want to talk with him before removing him.
KN: What about cross-functional reviews? I18n and others.
- KN: Not much to say about i18n, would like it to get solved independently solved (the spec issue we have open about error strings)
  - MM: does Oguz need help?
  - KN: we're blocked on people from W3C's i18n group.
  - MM: Oguz should ping them?
  - KN: we've pinged them on the Github issue. Maybe need to ping more people.
  - AI: cwallez add WebGPU to i18n agenda.
- CW: Security review, have been pinging them monthly but no response. But given the browser engines have stringent security reviews already I’m not expecting anything new. I told them it would be nice to have their review before CR but if we don’t then we don’t plan to block on them.
- CW: Privacy review, issue about fingerprinting vectors which we should probably talk about in this meeting again soon.

CTS Update

KN: asked Loko to go through spec changes marked as needing CTS changes, get bugs filed so we have tasks tracked
- Lot of these are already done
- Planning targeted operation tests soon. Make sure this sampler field gets passed through and has appropriate behavior, for example. May ask Gyuyoung from Igalia to work on this.
KN: Ryan's working on f16 stuff, wants f16 library in the tests. Asking Francois about pulling in a third-party open-source library for this.

“Tacit Resolution” Queue

Add the "internal" error type #3123
- BJ: Back and forth between editors and the group.
- With editors, initially wanted to reduce number of errors we're reporting
- Kelsey wants to distinguish OOMs by looking at object
- Editors thought not worth consolidating given that
- Now adding a filter for it
- Current proposal allows distinguishing these
- BJ: my assumption is that this design will address Kelsey's feedback.
- MM: three error types now?
- BJ: yes. Validation, internal, and OOM.
- JB: think that should be fine. Kelsey should confirm.
- BJ: have a few other things in flight that assume the internal error will be there. Hope to land it soon, after rebasing.
(mentioned at end, no time to discuss) Add maxBufferSize limit #3366
- KN: maxBufferSize is 256 MiB.

Specify that texture_external sampling functions clamp coords to [0, 1] #2766

MM: few weeks ago came up with spectrum of different options. Brainstormed possible ways forward and commented on the issue.
- If we're going to have a function for external textures where the name of the function says we ignore parts of the parameters, that's more acceptable if it also worked for regular textures.
- Would still be a bit unhappy about that, but would be a way forward.
KR: How is this different from the proposal to have textureSample for regular textures and external textures (just clamping for external texture).
MM: where we ended up last week - the regular texture function for external textures would be renamed. One conclusion we have agreement on - if we had a sampling function that had this clamping behavior, that function should not be named textureSample, but a name indicative of the clamping behavior.
- KN: agree.
MM: then we'd have two different functions, sample, and sampleClamped01 or similar. First for regular textures, second for external textures. Compromise - fill out the cartesian product of these things more. More acceptable if we had a textureClamped01 that worked for regular textures in addition to an overload that does it for external textures.
CW: easy to implement with small transform, but - what's the rationale for filling out the cartesign product more? Consistency?
MM: avoid the problem of - state without doing this, we have a function where user passes data into the function which encompasses addressing modes and the function ignores it and does something different. More developer-friendly - we have a bunch of sampling functions, and a bunch of sampling functions that override the repeat modes.
KR: Thinking this is baking the address mode in the name of the function. The naming is already split between external_texture and regular texture. So that external texture only supports clamping for now but in the future maybe there is an extension for other wrapping modes. Would be unfortunate to have a separate function for each wrapping mode. More extensible / forward looking functionality.
MM: Another thought, today we have static analysis of shaders for float filtering. Another compromise - in shading language we don't add any restrictions. In API, like you can't use some textures with some samplers - you can't use external textures with samplers that have non-clamping address modes.
CW: in BindGroupLayoutSamplerEntry (?) where we say it's a filtering vs. non-filtering sampler, we also have - it's a non-wrapping / clamping sampler.
KN: we talked about this. Gregg proposed this in July. Had a short discussion on the issue. Problem - it doesn't pave a clear path for adding repeat modes in the future. Works fine as a way of restricting use of the API, but doesn't give us a way to add address modes in the future - have to get address mode into the shader somehow, doesn't give us a way to do that.
MM: in future add'l addressing modes would presumably be an extension? If you're using a sampler with non-clamped address mode with external texture, it takes up more space in the bind group.
KN: the BGL doesn't know if you're using them together or not. Only the pipeline does.
CW: that's fine, the pipeline could define that.
KN: BGL needs to know whether it needs another entry or not.
KN: thkn compromise we started with today - add'l textureSample for non-external txextures that also does clamping - think if it's palatable to you then that's the best path forward.
MM: how would you implement that?
KN: we'd inject a function to the generated code that clamps the address and then calls textureSample.
KR: can you comment on splitting it that way vs. textureSample / textureSampleExternal?
KN: we don't think people are going to request address modes for external texture. Think they'll write clamp / mod / etc. in their shader. Don't want to take a hit now for the possibility that we'll add this in the future. Adding an SamplerExternal object is costly, needs more data going into the shader.
Discussion about ExternalSampler, and the cost of this vs. regular samplers.
Discussion about passing in address mode separately.
KN: or, we could add external samplers in the future.
CW: compromise you suggested, Myles, seems the most ideal. We get more of the regularity, not that expensive to implement in browsers. Right now we clamp, and in the future we still have options. Only thing that's hard - adding sampler descriptor to external texture, because that would change the cost in the BGL.
MM: think there's some confusion about whether extensions can break programs that are correct without the extension.
KN: they shouldn't. We decided that because if you have some library and want it to be able to write against core WebGPU, it should work with apps that use an extended device. We have discussed this a long time ago.
MM: ~4 years ago there was a discussion about manual barriers. Concluded auto barriers were fine because later we could add support for manual barriers. Only way this could work - when you enable extension, original program wouldn't work.
KN: it was that on the CommandEncoder you could request manual barriers, requiring extension to be enabled.
MM: something on BGL could enable this.
KN: lot of options. All can be done in the future.
CW: this means that - having textureClamped01 right now basically doesn't limit the panel of options in the future it we want to extend it later. Kai and I believe that it might never be needed, hopefully.
BJ: to avoid introducing clunky names - is there a way we can do an overload where you must use this form of the function, and have to pass an explicit wrap mode?
MM: Kai proposed that 27 days ago.
KN: we don't have an enum feature in WGSL, but it's possible.
BJ: do we have booleans?
MM: prefer no booleans.
KN: there are 3 possible address modes at least.
MM: instead of address mode in name of function, have that be a parameter? Benefit, more elegant and extensible.
CW: OK. Let's figure out a name and move forward.
MM: my proposed name: suffix "Clamped01".
KN: let's figure out the name afterward.
CW: awesome, glad this is resolved.
KN: glad you proposed this.
CW: thank you Myles.

Colorspace conversions for GPUExternalTexture are arbitrarily complex #3293

MM: Please defer again :( sorry
MM: blocked on discussions with our team, and research. Seems video formats have different ways to describe colorspaces; popularity of different methods varies.
KN: in principle this shouldn't matter. Using 3D LUT should let you make approx representation of any colorspace conversion. Whatever the video does, should be able to approx with a LUT. Should never have case of a video where you can't implement.
KN: in our case we fall back to extra copy, but LUT gives a way to not do that.
MM: prefer to postpone discussion, not prepared yet.
KN: when you talk with your team - want to make sure they agree with that conclusion, if possible. If they do, popularity of different ways of doing things isn't relevant.

Make error reporting unordered #3362 / Make lost devices appear to function as much as possible #1629

CW: #3362 landed?
KN: agreed on making it unordered. Split out extra thing.
KN: discussion here: from editors' meeting: should there be an order between device lost promise resolving, and, in this case, ErrorScope promises resolving? There are other Promises we should discuss ordering of. Or does it not matter? We didn't come up with a good reason why it matters. Most likely, we don't put a constraint on it. Can constrain it later if we need to.
CW: think reason why it shouldn't matter: as a rule we're trying to make lost devices work, so we don't break app flow. Lost has to happen after all promises - shield applications, we tell you about it later. But, lost devices shouldn't break the app flow anyway.
KN: several things in APi where that's not true. Esp. async stuff, e.g. buffer mapping. Spec doesn't say what happens if you're in middle of createPipelineAsync and device is lost. Bigger discussion.
BJ: overall desired approach: "harm reduction". Lost device should cause as little harm at JS level as possible. Want people to be able to respond to it. If relying on event ordering - probaby not the right approach. If order matters, you require users to have a specific order of operations.
CW: any concerns with leaving things unordered?
KN: also, moving forward with harm reduction strategy as we flesh things out.
Consensus: move forward with unordered.
MM: also satisfies Ken's desire if you have an async compilation and you have a command which errors immediately, the error can be raised immediately.

Multiple components cannot effectively share a GPUDevice because it is stateful #3250

KN: at some point would like us to have a resolution of this problem. Would like apps to be able to do async work on this thread while they're doing rendering, without ErrorScopes being shuffled around because of async work.
KN: not needed for V1 because - it'll be a bug if you have nondeterministic ErrorScopes, can avoid by not awaiting in middle of ErrorScope. Probably some utility in the future for providing something like this.
KN: what path paved for being able to do this in the future.
KN: conclusion was - not written in this issue - the only thing we have to do now is something we have to do anyway with the GpuDevice - we can allow you to have multiple ErrorScopes at the same time for different parts of the app at the same time.
KN: fork a GpuDevice - one ErrorScope for each one. Can run concurrently without collisions.
KN: will need multiple ErrorScope states per device anyway, assuming we share device among multiple threads for multithreading. It's a small extension.
KN: proposal now - do a little thing to the prototype chain, but almost not a change - uncaptured error event target has reasonable behavior.
KN: in the future, with multiple threads- probably dont' want an uncaptured error handler on each one. Which error goes to which side? Or maybe on both sides. Want to set up prototype chain so we can make that decision later, not now. Adding extra interface on prototype chain is JS-observable, so breaking change - make it for V1, figure out everything else later.
MM: in long term - forking a device? - given a root device, any resources created by any derivative devices would be passable to any devices? But not a distinct device tree (for a different GPU?)
KN: yes.
MM: gets confusing. There should be a different name of the root device, doesn't make sense to have a whole tree. Root device, derivative device. Root device shouldn't be stateless. Root thing would be "DeviceProvider" or something.
KN: yes.
MM: "Resource" namespace?
KN: considered that. Adds unnecessary API for people who don't care about it. Similar to default queue.
MM: today we have normal adapters/devices. In future we'd add a new property to a device to get its top-level root device object from it?
KN: we could do that. Essentially bikeshedding.
GT: do you need access to the root device?
KN: requestDevice'd give you the derived device. To get another, ask it for its parent to get another one.
BJ: important thing - appears to be paths from where we are now to future we want that don't involve changing the API shape significantly right now. Desirable, given that we want to ship in the near future.
BJ: unless anyone has a reason to think we can't do a multithreaded future given our API shape, think we should stick with it.
GT: saying, I get the fork when I get the device.
KN: think I understand what you're saying, we can discuss it later. Don't need to do for V1.
MM: think what you're saying - "root device" would be adapter? Don't think we want to be able to share resources created by 2 separate devices coming from the same adapter.
CW: think I'm hearing 3 different things. Root device, getting device multiple times, … . Kai's point - want to do something about this in the future, but there are multiple solutions. Only thing we need to do is separate the stateful part of GpuDevice to its own JS prototype.
MM: would like to see this proposal written down.
KN: we can talk about that specific proposal. Will be close to editorial.
KR: how about discussing it in next week's Editors meeting, and maybe don't need to bring it back here?
JB: if we allow multiple devices to share resource namespace - is there value in queues being a separate concept in the spec?
CW: my understanding - they're not different devices. They're proxies. They have one piece of state that's different (the ErrorScope). Everything else is the same device.
MM: concrete example: 2 different …'s on the web page - want ordering, second one renders on top of the first.
JB: so multiple devices, all sharing same queue, to allow content to express ordering of operations across set of devices?
KN: essentially.
JB: all sibling devices would return the same queue.
CW: what we call GpuDevice today becomes "GpuDeviceAndErrorScope". Like a tuple. GpuDevice is the same among all of them.
KN: only difference is the error scope stack. Queues also need different ErrorScope stacks. 2 GpuQueue objects too. Each has associated ErrorScope. Underlying, there's one device, one queue, two ErrorScope stacks.
JB: series of ops on 2 different queues are still ordered by JS call ordering.
KN: yes. Including across multiple threads, still TBD.
JB: then the queue doesn't contribute anything? Methods on queue -> device, already have a way for sibling operations to be ordered?
KN: we have plans to add multi-queue in the future.
MM: counter proposal here, make change now taking all state of GpuDevice and put it in its own object. Have GpuDevice own that.
KN: proposal is essentially that, but it's an internal slot you can't see.
CW: more complicated than this. Have to revisit next week.
KN: Editors will revisit doing prototype chain. If no problem, not painting ourselves into a corner, won't need to revisit here.

Misunderstanding of external texture expiration frequency #3324

KN: take out of tacit resolution queue. Needs to be on main agenda at some point.

Agenda for next meeting

APAC meeting. Will ask what attendees want. CW: not sure I can be there.
Colorspace conversions for GPUExternalTexture are arbitrarily complex #3293
- MM: we can talk about that next week, optimistically.
KN: one thing from tacit resolution - max buffer size is 256 MB, #3366
- If anyone has a problem with that, please raise it for next week.
Add HTMLVideoElement to copyExternalImageToTexture #3410
(Eventually) Privacy review feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2022 08 31

GPU Web 2022-08-31

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

Specify that texture_external sampling functions clamp coords to [0, 1] #2766

Colorspace conversions for GPUExternalTexture are arbitrarily complex #3293

Make error reporting unordered #3362 / Make lost devices appear to function as much as possible #1629

Multiple components cannot effectively share a GPUDevice because it is stateful #3250

Misunderstanding of external texture expiration frequency #3324

Agenda for next meeting

Clone this wiki locally