Skip to content

Minutes 2021 10 04

Corentin Wallez edited this page Oct 11, 2021 · 1 revision

GPU Web 2021-10-04

Chair: Corentin

Scribe: Ken

Location: Google Meet

Tentative agenda

  • CTS update
  • Milestone triage (timebox 15m) #2073
  • Implement "depth_clamp" on D3D12 #2100
    • Replace clampDepth with disableDepthClip #2139
  • Require a render pipeline flag to enable sample-rate shading #2133
  • Which corner is the origin of a copyExternalImageToTexture? #2110
  • (stretch) Consider making ShaderModule compilation errors special #2119
  • (stretch) Refactor API for load/store OP and read-only attachments #2131
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Corentin Wallez
    • Kai Ninomiya
    • Ken Russell
    • Shrek Shao
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Francois Guthmann
  • Michael Shannon

CTS update

  • KN: worked on depth clipping/clamping tests to understand behavior, so we can spec it.
  • KN: Also some shader testing going on. Sarah M from the Tint team has been working on keeping tests and WGSL spec in sync. Has a tool that - for builtins, etc. - generates test plan - list of tests that need to be implemented for each builtin. Keep up-to-date between spec and test. Can run a tool on the newest spec, then know what changes are needed in tests, to know where the gaps are. Pretty cool.
  • KN: Ben Clayton has written Node bindings for Dawn/WebGPU, and is running CTS against that vs. Chrome. Found a bunch of test bugs, fixing them. Motivation: allowing his/our team to develop CTS against impl more quickly, instead of against Chrome, which is much larger than their components.
  • DM: no updates, but: how'd you discover more failures in the CTS?
  • CW: these were all things we suppressed because they were known to be wrong in the CTS but we never got to fixing them. Ben did.

Administrivia

  • CW: I have conflict on Monday evenings. Been 4 years (?!) we've had this meeting on Mondays. OK moving it to a different day?
  • JG: diff day or diff time?
  • CW: I could do one hour later but diff day would be ideal.
  • CW: maybe another Doodle poll with bunch of timeslots, morning-ish Pacific time? Maybe little afternoon? Evening for EMEA.
  • CW: also, maybe rescheduling WGSL meeting, if ppl interested in that. Or don't do it for WGSL.
  • JG: we can see if there's appetite to change it.
  • CW: let me know what result is tomorrow, then we can send Doodle to group.

Milestone triage (timebox 15m) #2073

  • CW: only 5 open issues with no milestones assigned.
  • CW: #150 (old one). Perf benefits of buffer/texture usage.
    • CW: think texture usage has been clear in the past. Buffer usage stable, can we just close?
    • MM: I would like to do this - i.e. come up with a program in D3D/Vulkan that shows perf differences if you supply different usages for buffers. If the answer is, I'm unable to do it, there's an indication we shouldn't have this flag.
    • CW: in drivers - for example Intel - there was a bug, and they capped vertex usage to be 1 GB. Different memory heap for that reason. There's some buffer usages that aren't super useful - copy_src and dest are basically free. Stuff like map() needs to stay because it's a different memory heap. Hasn't been too much of a burden to developers to provide the list of usages.
    • MM: if there's no benefit it's only cost. The issue describes the benefit - if we're unable to describe it, we shouldn't have it.
    • KN: perf's not the only reason for usages. There are limitations of usages for mappable buffers. These make synchronization more straightforward. If something's MAP_WRITE it can't be written from the GPU, if not MAP_WRITE it can't be written from the CPU.
    • MM: we don't have to solve this issue in this meeting.
    • KN: perf analysis won't tell the whole story. There's more to usages than just perf.
    • MM: makes sense. I'd prefer if this issue's not closed, I think there's more work to do. Possible resolution of this issue would affect first version of the API.
    • CW: OK, tag for v1.
  • #1842. Writable storage buffer bindings don't alias.
    • CW: think this is V1. discussed in different groups, definitely need to figure out for that milestone.
  • #2134 allowing null entries in bind groups.
    • CW: Jasper found when you have a BGL that might contain many things but shader only statically/dynamically uses a part, it's a pain to have to specify a resources. Nothing gets null in the BGL. Similar to WebGL.
    • DM: is this something they can do themselves?
    • CW: it is.
    • DM: if multiple ISVs complain, we can revisit.
    • CW: mark post-v1 then.
  • #2151
    • KN: more of a question than an issue. Can close, re-discus if more questions.
    • DM: convert issue to discussion.
  • #2160
    • CW: we discussed this a lot, and right now there's no way to list all available adapters.
    • MM: I can write a description of why we don't want to do this.
    • KN: thanks.
  • #2073 again
    • CW: finished triage of all open issues. Now need to triage post-V1 issues.
    • MS: one that we duped into #354. Will that be in one of these other ones? Workers, multithreading?
    • KN: it's in V1.
  • CW: can start triaging.
  • KN: like to only look at things we didn't just put into post-V1.

Implement "depth_clamp" on D3D12 #2100

  • Replace clampDepth with disableDepthClip #2139
  • KN: did a bunch of testing/reading. D3D12 always clamps, regardless of whether clipping's enabled. No control, have to do it.
  • KN: Metal, despite naming of enum, always clamps on desktop. Apple chips, don't know. Can get same behavior by doing clamping in shader.
  • KN: Vulkan, have to clamp in shader - it always doesn't clamp.
  • MM: was confused. Like to enumerate all degrees of freedom.
  • MM: 1) seems to be distinction between clipping/clamping - what's the diff?
  • KN: there are 2 things can be on/off. Clipping: when HW does clipping stage & clips edges off to fit onscreen for raster, can also clip if the tris go outside depth bounds.
  • CW: that's what you get in games when you get too close to objects.
  • MM: for a particular sample that gets clipped, behavior is sample disappears?
  • KN: primitives, not samples, get clipped. Take in primitives, output different set of tris.
  • CW: I think that's a GL-ism. Would expect clipping per-fragment, non-observable. Same thing.
  • KN: model is, it happens to the primitive itself.
  • MM: if there's a sample covered by unclipped tris but not by clipped ones, it ceases to exist? No write to attachments?
  • KN: yes, doesn't get rasterized at all.
  • MM: and clamping?
  • KN: happens after fragment shader, after it runs. Different point in pipeline. Takes output of rasterizer, clamps to viewport bounds before it goes into the depth test.
  • CW: no clipping happens. Fragments always in the viewport box.
  • MM: apparent behavior - sample that was far away from camera still exists, just moved closer to camera.
  • KN: these aren't alternatives - two separate toggles. Both use the viewport bounds. Configuration's conflated in Metal, Vk and OpenGL.
  • DM: clipping doesn't use viewport bounds. Just -1..1 or 0..1.
  • KN: 0..1 corresponds to viewport. 0..1 compressed into viewport range.
  • CW: confusion comes from: if clamping's enabled, it's like clipping's disabled?
  • KN: no. Clipping comes first, clamping comes later. In principle, all 4 configurations can exist, but no API supports that.
  • MM:
  • KN: don't think there's an option to clamp depth before fragment shader.
  • MM: that's what I meant by observable.
  • KN: pipeline diverges at that point. Many things unobservable.
  • DM: confusing - we call these "clipping" adn "clamping" - these aren't stages. If I write to depth in frag shader, clamping happens "after" fragment shader. But it gets a frag coord with a clamped depth?
  • KN: no. Z is unclamped at that point, as input to fragment shader. Consistently. I tested that.
  • DM: that's good.
  • CW: based on your extensive testing, what's recommendation?
  • KN: PR that I opened. Might be another change too. Right now have optional feature, lets you enable clamping depth. But because fo D3D restrictions, depth clamping's always enabled. Only configurable thing is depth clipping. Not core feature in Vk or Metal. Have to emulate it to enforce consistent behavior. Change to optional feature - make it configure depth clipping the way D3D12 describes it. Then have to update spec to say that depth clamping always occurs.
  • CW: so feature is disableDepthClip = false?
  • DM: we can bikeshed the naming in Editor's. Now we know the solution space and the changes needed to the spec. In an hour, we can figure out the exact name.
  • MM: all APIs have the Z component unmodified?
  • KN: yes. My tests tested the input value to the fragment shader, and I got consistent results.
  • MS: can't think of a reason you'd need clipping in the first place. Not sure it matters to have that feature.
  • CW: clipping will be on by default?
  • MS: can't think of a reason where you need to turn it off.
  • CW: you need it for shadow maps. Need to put things at far Z-plane rather than clipping them.
  • KN: most hardware does have support for turning off depth clipping.
  • MS: thought you said it didn't.
  • KN: widespread supported feature. Also on Metal.
  • CW: as soon as we have a WebGPU feature level it'll be present.
  • MS: ok, thought you said Apple doesn't have it.
  • KN: it's an OS version thing, not a hardware thing.
  • CW: think everyone's OK with the direction. Can bikeshed names of feature later.
  • MM: need to think more deeply about this. Two major requirements are (1) implementable everywhere, (2) if webgpu author doesn't name fragdepth inside shader, we don't need to.
  • KN: both should be true. Be great if someone who understood this in the backend APIs that our analysis is correct.

Require a render pipeline flag to enable sample-rate shading #2133

  • CW: right now, if you use sampleIndex or samplePosition in fragment shader, the backend APIs will automatically turn on sample-rate shading. Evaluated at all samples of multisample buffers.
  • CW: do we want explicit flag for that, or not?
  • DM: I don't think it gives us anything. At most, you can describe your intent, then have validation confirm it. But fully redundant. Also D3D12 doesn't have that, so we probably shouldn't either.
  • MM: think I agree. Concerned that someone removes static use of sample mask, but pipeline state says it should still be an error.
  • KN: wouldn't be a flag that enables sample rate shading, but a flag that allows its use. That wouldn't be an error.
  • MM: makes me think of GL_ENABLE bits, confusing.
  • KN: you'll get an error though. Wouldn't just not happen.
  • CW: extra boolean on pipeline creation - are you sure you want sample rate shading?
  • MM: kind of agree with that.
  • KN: also sample position, some others, not so clear they'd enable sample-rate shading. Just a safeguard.
  • MM: if nobody's fighting for this we should just close this.
  • CW: OK.

Which corner is the origin of a copyExternalImageToTexture? #2110

  • CW: question mostly comes from WebGL.
  • KR: Discussed this internally. Main problem is maybe a future interop API where you could import WebGL textures in WebGPU.
  • KN: Don't think we should discuss this
  • KR: Then the only API that has an origin in ImageBitmap as a concession to WebGL. If all you can do is copyExternalTextureToImage a WebGL canvas in WebGPU then you can do the flip during the copy.
  • KN: Wasn't sure if there was going to be a cost.
  • KR: The canvas backbuffers are special surfaces that can be imported. No extra cost. There is going to a blit anyway. So we don't need the API and keep things simple.
  • JG: theoretical cost. As opposed to being able to use copySubResource, which I don't think you can use transforms on. You can in BlitFramebuffer. But not CopySubResource. There is a difference there. There's a possibility of a very fast path, 1-copy that's faster than a draw-blit. How important is that? I'm not sure.
  • CW: if we find that becomes really important, we can add an orientation flag. But I'd be surprised that a draw-blit would be that much more expensive than copySubResource.
  • JG: it's a program state transition - separate render pass. Tentatively, I don't think it's worth trying to enforce that we have a vertical flip. We can figure it out later.
  • CW: agree, think we can figure this out later. We already have the code to flip in Chromium, it's trivial to flip.
  • JG: not concerned about the code. Or the API side so much.
  • KR: Sounds good to keep it simple for now.
  • JG: what's our orientation for textures then?
  • KN: when we copy out of image soure, top-left is considered first pixel, regardless of whether it's a WebGL canvas.
  • CW: the source you put in the call, if you put it on the page in the most trivial way possible, the top-left is what becomes the first pixel.
  • JG: so texel (0,0) will be what was in the top-left.
  • KN: it's a vertical flip from WebGL/OpenGL things.

(stretch) Consider making ShaderModule compilation errors special #2119

  • CW: while implementing async pipeline creation, you still have to create ShaderModules synchronously. If lot of shaders, costs a lot in WGSL parsing/validation on main GPU thread because of how ErrorScopes work right now (have to be done on main GPU thread).
  • CW: trying to change a little bit how ShaderModule errors are reported, so there isn't this issue.
  • MM: ErrorScopes infra has pushErrorScope, no return value, and popErrorScope, which is a Promise - so why can't compilation be done async? Promise resolved when compilation's complete? And if createShaderModule & pipeline, errors get queued up?
  • CW: would work for ErrorScope. (1) extra latency for errors. (2) we have a well-defined ordering of Promises right now. Not only ErrorScope promises, but also device promises would have to be delayed until ShaderModule's done. Maybe fine, but…
  • MM: so if pushErrorScope, createShaderModule, popErrorScope, then map buffer - buffer promise resolved later?
  • CW: not exactly. createPipelineAsync has its own ordering. Unsubmitted work on queue - their own timeline. Then Device timeline. push/popErrorScope, uncaptured error, device lost (?). ShaderModule compilation info?
  • KN: that's about it. not that many. only about 5 promises.
  • MM: so what has to be postponed?
  • CW: here, essentially if you do pushErrorScope, long stuff, popErrorScope - promise A. Then uncaptured listener - promise B. Promise B has to wait for promise A before it can fire callback. Maybe it's fine. Maybe we say there's no real ordering of things on the device.
  • MM: so the draw's not using your new module? Totally unrelated draw?
  • CW: yes.
  • MM: so we're adding a data dependency where there doesn't need to be one.
  • CW: yes. would be nice if error were surfaced in the ShaderModule. But it breaks the model for all other objects. DM also had idea of async factories.
  • DM: essentially timelines. If user could manage those timelines on which errors are sequenced, they can achieve what they need. Main concern with your proposal - it makes createShaderModule very special, even though there's nothing about it which needs to be. Errors for buffer creation need to be on separate timelines too?
  • KN: I think the reason it's for createShaderModule is that it's the main place where … simplest solution.
  • CW: you can createShaderModule on a worker. Implement multi-worker WebGPU. "Simple."
  • KN: would like to have slightly better design that removes this issue entirely. Error sinks, essentially.
  • DM: can we come back to this when we have workers?
  • KN: right now we have error sinks, and the only way to have control over them is to use workers. It's a workaround.
  • KR: It would be nice to have not require the user application to use a worker for async compilation. This was a problem in WebGL. Synchronization with the shader compiler would be nice. If it meant just sending the error back a different way, or reordering the way promises are resolved. Don't think applications will rely on the ordering.
  • KN: I suppose we could lift the ordering restriction on ErrorScope resolution, but a little afraid this could result in weird app behaviors.
  • MM: pushErrorScope, createShdaerModule, popErrorScope, then bad draw in second error scope - this says these 2 Promises can be resolved in any order?
  • KN: if they're sibling scopes, yes - if parent/child, then no.
  • MM: in that situation there'll still be an app code path where you have to queue up these errors. So if you have to do that I don't see the point.
  • KR: they probably wouldn't be parent/child in the first place.
  • MM: what's the motivation for this in the first place?
  • CW: right now, shader module creation happens synchronously in the GPU process, because we might use it for a pipeline created synchronously immediately afterward. Would be nice if ShaderModule creation can be offloaded to a different thread, so if pipelines created from them are all async, everything's async. We could do this automatically. ShaderModules' creation always async. If needed sync, then steal them. But if you promote something that looks sync to be async, you run into issues with ErrorScopes.
  • MM: there's just two kinds of errors. One has no data, the other has a string. Don't see the issue with queueing.
  • CW: if we add more ordering with other types of Promises in the future that can be unrelated. We had this async ErrorScope thing implemented couple of years ago. Not pretty, but was fine. More about "other" device promises that have to be ordered.
  • KR: let's discuss further next time - this was a stretch goal for this meeting anyway.

(stretch) Refactor API for load/store OP and read-only attachments #2131

Agenda for next meeting

  • Two "stretch" agenda items.
  • #2131 refactor maybe can be discussed offline?
    • KN: ideally.
    • MM: I don't think I have opinions about that.
  • CW: other issues? Other than that, list of V1 stuff.
  • DM: issue about timestamps? (AI: CW: please link to issue number)
    • MM: did we resolve that?
    • CW: think we resolved. Timestamp attachments for both passes + … command encoder.
    • DM: consider removing execution time?
    • CW: happy to resolve this.
    • MM: what about Metal devices that don't have anything else?
    • CW: less API, and what about Vulkan devices that don't have timestamps?
    • KN: and D3D.
    • CW: we can discuss next time, or on issue.
Clone this wiki locally