Minutes 2022 06 22

GPU Web 2022-06-22

Chair: KG

Scribe: KG / KR / others

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
maxColorAttachments not sufficient? #2965
Specify that texture_external sampling functions clamp coords to [0, 1] #2766
Make GPUDevice.destroy() a queued operation #2599
Expose WGSL errors in the API #2308
Consider making ShaderModule compilation errors special #2119
Agenda for next meeting

Attendance

Google
- Brandon Jones
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Stephen White
Mozilla
- Jim Blandy
- Kelsey Gilbert
Mehmet Oguz Derin
Michael Shannon

Administrivia

CTS Update

KN: Looking into infra issues to make it faster to run the whole suite. Want to look at longest-running tests, so that we can make them run faster, once we have that data.
KN: I’m happy to help get CTS running on other peoples’ infra.

“Tacit Resolution” Queue

Empty??? :D

maxColorAttachments not sufficient? #2965

KN: limit in Metal feature set tables that Gregg found again. Had an issue about this for a while.
- Tables don't describe how it works. Myles figured it out by looking at validation layer's code.
- Complicated. Layout of attachments in tile memory. Some padding as well.
- At what LOD do we want to describe these? Can't assume e.g. 128 byte alignment.
- Encode Metal's exact layout algorithm in our spec?
KG: don't need to match exactly. Don't leave too much on the table by pessimizing.
KN: will need some layout algorithm - probably not complicated.
KG: worst, probably, formats have both size and alignment / padding, which is part of the format table.
KN: probably spec complexity, but not complexity for users.
KG: size per sample?
KN: both size per sample, and size per tile limit. Size per tile = max threadgroup allocation, same as compute shaders. Assuming smallest tile, 16x16, the limits line up with each other - but don't know if we can subsume them into one, probably too much of a pessimization.
KN: might add two new limits. Not sure if we should add to threadgroup memory allocation - constraint doesn't exist on other APIs. Maybe 1, maybe 2 new limits.
KN: found some Imagination docs - you shouldn't go over tile size, but if you do, it spills into main memory and will work. Then could have consistent validation rules across backends. Our limits might prevent you from spilling out of tile memory. But Vulkan drivers won't tell us if we spilled. A little bit making something up.
KG: it's safe. How pessimal is it? If not too bad vs. Metal, probably fine. Think we should try imposing Metal limit on everyone.
KN: do we make a separate limit for this? Or use compute shared memory limit for this, which is what Metal does?
KG: probably try to do what we don't know is possible. Try using same limit, and if we run into system where that doesn't sound good, we change it.
KN: not easy.
KG: could also say it's a different limit. That's fine.
KN: true. Concern w/making them the same: can't determine whether spilling's occurred on device. Can't verify correctness.
KG: think that's OK.
JB: reason multisampling affects the tile when it's not the main render target - resolution occurs at a point where tiles are resolved. Correct?
KN: yes. Don't know underlying limitation of Myles' #1. For #2, there's tile memory - have to do all working operations there. Written out to memory from there. #1, I don't know what it's about. Maybe amt of memory they can feed out of each individual shader core?
KG: the stuff that has to go through blending pipeline?
JB: why would that have a size limit?
KG: because hardware can't take arbitrary amount.
KN: … Myles mentions there are exceptions to the rule "multisampling doesn't apply to the first limit" - because of sample-rate shading? Not sure. Not too simple, but hopefully Myles can share more details, and we can craft a limit around it.
KG: rough consensus to have 2 new limits?
KN: I think we should try designing that and see where it takes us. Will hopefully come back to this group to discuss more later.

Specify that texture_external sampling functions clamp coords to [0, 1] #2766

KN: discussed at length already.
KN: slight variation. Propose: do what Corentin suggested - spec that these always clamp (0,1). Ignore the address mode in the sampler. And, possibly V1 possibly post-V1, add new member to external texture descriptor defaulting to clamp. Don't have to roll the texture+sampler together, which isn't ideal (option 3). Would end up using more slots of binding costs.
KN: I think this is good - future-proof, and no strong opinion on whether we add address modes now.
KG: having just "clamp" seems MVP to me - but sounds reasonable. People can also do this themselves.
KN: agree, implementing address mode yourself isn't that complicated. Can get the dimensions of the texture and do it yourself. SImilar to what we'd do internally. Maybe won't need it.
KN: Myles wanted flexibility here.
KG: let's re-discuss next week.

Make GPUDevice.destroy() a queued operation #2599

KN: thinking about post-V1. Only makes invalid programs valid. Proposal was: destroying the device doesn't jump ahead of already-issued work. Submits, maps, render-to-canvas will complete.
KN: don't think this is critical. But would clean up a confusing area about ordering.
KG: how observable is this?
KN: observable if you e.g. render to canvas, then destroy the device. Different impls might have different behaviors. A weird ordering issue.
KG: if you destroy device, is visual output guaranteed?
KN: not really spec'd now. De facto now, no. Spec doesn't say enough to guarantee it. Impl could go either way.
RC: Is destroying normal textures a queued operation? If so, this should be queued too.
KN: destroy w.r.t. spec is all about validation. Also true of device.destroy(). If you go through spec & steps - this is probably already true. Not really the intent though. Will take editorial work to figure out how it fits into spec. At high level - if we think this is a good idea, we should decide this and take care of it in editing.
KN: our impl for example - end of frame, we transfer a resource from WebGPU to compositor for display. If you did submit, then destroy the device, then end-of-frame happens after that - is the thing dead, and you can't get resource over to compositor? Or allocated outside device, so it's still alive? This ambiguity is why it could be different in different browsers.
KG: to me - dependent on destroying a device. Does anything show up on screen? If we don't say anything about that, this doesn't matter. Even if we say this waits - still unspecified after you finish waiting.
KN: for that situation I think we should have tighter specification here. No mention of that now.
KG: different question - if you destroy device, then call canvas.toDataUrl() - do you get anything? Do we say, you don't have a device any more so you get nothing? Maybe if you destroy the device, we take one final snapshot of the canvas, even if just in an ImageData. That'd give us some resolution here. Either you asked for device to be lost, and you still have canvas output - or you have device lost, and the canvas was unrecoverable. Could "try" in the cooperative case. "Implementations should preserve the final contents of the canvas rendering context"?
KN: makes sense. Also question of: if you render frame, wait for end of frame, then you destroy device - does canvas disappear, or leave the image behind? Spec currently should leave the image there, think that's the correct behavior. Given that, think we should extend that behavior here. Destroy before present has similar behavior.
KN: device loss trickier, can happen any time. Need to think about presentation interactions.
KG: think I am more convinced that GpuDevice.destroy() waits for stuff to happen before it finishes.
KN: spec-wise with 3 timelines - destroy() goes to device timeline to execute. Do we wait for queue timeline before destroying device, or cut things off? Ordering violation between timelines if so. But we don't have good explanation of getting data out of queue timeline.
KG: is this device.destroy(), or device.release()? Don't just knock things down for no reason. Think we really want release(), not destroy().
KN: if you destroy a texture while something in queue timeline, things don't blow up.
JB: making destroy() the default mean that some impls will do it now, and some impls will wait. Useful to have cancel() operation? Shouldn't be the default. Default should be to wait for things to complete. No race conditions.
KN: makes sense.
KG: say we're moving that way?
KN: sgtm. We'll figure out details in editing.

Expose WGSL errors in the API #2308

KN: WGSL has uncategorized errors. Examples: shaders too complex, exceeding impl capabilities, defect in WebGPU impl. Most common example: couldn't allocate registers for shader. Probably in pipeline creation, but can also happen in createShaderModule.
KN: how should these be exposed? Call them validation errors though they're not strictly so? Find new way to expose them? Silently give you a broken object?
KN: if something goes wrong in createShaderModule, do we expose it there? Or throw an error in createPipeline later? Don't have a good example.
KG: expression too complicated?
BJ: lot of this - how do we expect users to react to this? Probably large number of developers paying attention to validation errors during dev, but not capturing that on shader itself. They'll assume they built it, it works, and if it runs on a device where it's too complicated, they won't be prepared for that. Message deep in the spec won't change that for them. Validation error vs. not won't matter. It would matter in Shadertoys. Would prefer if Shadertoy/WebGPU can surface this. More likely they'll have caught it & react. Calling it validation error is probably the most foolproof way to go.
KN: Corentin had similar point. Sites generally won't react to uncategorized errors. Making it look like validation error is fine. My counterpoint - I think it's realistic to respond to uncategorized errors. Maybe hits them on some devices, but want to use this shader normally. (Optimized in some way?) Know you can respond to the error by falling back to simpler shader. Might figure this out from adapter info.
BJ: why isn't validation error suitable for this?
KN: no reason really. Main thought - have a design invariant now, you don't need to watch for validation errors in production. This would break that assumption. Probably not a huge deal, but a small concern.
BJ: theoretically saying, this is more like OOM. Some device limit reached that we don't know about.
GT: how's this different from a limit?
BJ: you know the limits beforehand. Ideally you know all the validation errors that could've happened beforehand. Maybe this should be OOM? Maybe OOM should be tweaked slightly? Saying a "system resource" error?
KR: GL OOM was not recoverable, would push back on reusing that concept.
KN: WebGPU OOMs are recoverable. Unrecoverable ones lose the device.
KG: also in FF WebGL OOMs are mostly recoverable.
BJ: another reason to rename WebGPU's OOM.
KG: we could create a different name for it, but putting it in that category would be OK with me.
KN: we could expose separate field, "allocation failed", "pipeline creation failed", etc.
BJ: we've already given OOMs a message field. Could put these error messages in it. Not sure we need to make this a special case.
KN: makes sense. Think we should categorize this as OOM.
RC: how will impls determine if OOM is recoverable or not?
KN: if you can't determine it, you lose the device.
KN: don't know the details. For example: sub-allocating out of a buffer - under your control. At least Vk/D3D give us ways to do fallible allocation. Not catastrophic failure.
RC: for OOM in general: trying to allocate 1 GB texture maybe would fail, be recoverable. But for cases where multiple web pages running at same time - sum total's close to the limits - and you allocate a small texture which fails, isn't something you'll try to recover from. OOMs won't always be recoverable by the developer.
KN: maybe we give you OOM if nothing's broken. Give you device loss if things might be broken. Up to the app whether you're able to free up resources & try again.
KG: example from WebGL: buffer allocation, esp. index buffer allocation. We truncate buffer, return OOM. You can try the allocation again. We do all the validation that we should.

Consider making ShaderModule compilation errors special #2119

KG: nice diagram that's still up to date.
KN: basic question - difficult to discuss - original issue was that createShaderModule is mainly doing CPU work. Bunch of compilation/translation. Model of device timeline is that everything's well ordered. If you naively do what the spec says, you'll put compilation work on device timeline. In practice, want to put this on another thread. Don't want it to block execution of other things that don't depend on it.
KN: reason this is spec problem - ErrorScopes are defined on strictly-ordered device timelines. If not for those, could split them off onto other threads. Error reporting gets in the way.
KN: proposal I came up wtih: keep the total ordering of stuff on device timeline and errors reported. Impl is responsible for reporting the correct error. Impl would have a way of collecting errors and reporting them. Naively, think this should be possible. Not seasoned multithreaded developer. Can think of ways you could do this. Not 100% confident. Would make things simpler for the spec. Impl can do things as long as you can't tell.
KG: that's already true. Think it's doable. Array of Promises? Have to wait for them all to complete, then give back the first error in the list?
KN: that's what I have in mind.
KR: ShaderModule creation like it or not is special. On some platforms it delegates immediately to a platform api, sometimes tying it up for a while (seconds?). Tieing things up for seconds would be bad. E.g. in webgl we had problems where as soon as you queried any program/shader info, bam you get hangs. I think it’s safer to make it special.
KN: Original idea not to check. We already have another way to do this. Once you use SM to create Pipeline, then you’re joining back to the thread that does the pipeline creation. If we want to generalize all the way through CSM, we should probably do it for CreatePipeline also.
KN: If you do the common pattern of creating SMs and then creating the pipeline, I don’t want to prevent our optimization opportunity there.
BJ: Not entirely sure why moving off the main thread means we can’t get messages out of it. If I take an error scope, wrap around three commands, it feels like conceptually I’m asking for a promise.all() for those three, where I want them ordered within the scope, but not so much with how they are ordered more globally.
KN: I agree.
GT: Is there a reason we have to return in order?
KN: The spec says we do, but it probably isn’t necessary. We do want an order between popScope, so that e.g. different promises are still ordered reasonably.
GT: But compared to fetch, they can totally come back in any order.
JB: fetches are considered independent. Reason for reporting first error is people avoiding cascades.
KR: think it won't be a big deal if we allow the compilation errors to float to the end of the error scope. Esp., given experience from WebGL where the parallel shader compilation was so fragile.
KG: think this is "as-if" - agree with Kai that we ought to be able to guarantee ordering. CTS tests would also have to be written carefully to cover only one error. The error flags coming out of WebGL - supposed to get the first error in the command stream that happened, even if they came back in a different order. Similar to Promise.all() instead of Promise.any().
KN: same here where we only need to give you the first one. Don't have to save off all results.
BJ: push back on making CTS writing harder. Would need to only wrap one validation error in an ErrorScope. Two independent commands failing - that's on me. Matters in CTS, not in the general case. createTexture, then createView - if I got view creation error before texture validation error, impl did something wrong. This is where we should resolve first command first.

Agenda for next meeting

Next week, same time, same place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2022 06 22

GPU Web 2022-06-22

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

maxColorAttachments not sufficient? #2965

Specify that texture_external sampling functions clamp coords to [0, 1] #2766

Make GPUDevice.destroy() a queued operation #2599

Expose WGSL errors in the API #2308

Consider making ShaderModule compilation errors special #2119

Agenda for next meeting

Clone this wiki locally