Skip to content

Minutes 2021 08 23

Corentin Wallez edited this page Aug 27, 2021 · 1 revision

GPU Web 2021-08-23

Chair: Corentin

Scribe: Ken / Corentin / others (add your name here)

Location: Google Meet

Tentative agenda

  • CTS update
  • Do pipeline blend states need to be validated against the pipeline render targets? #2025
  • Data flow stream subsetting #2038
  • Dealing with holes in the pipeline layout. #2043
  • Binding a zero-sized range of a vertex buffer #2044
  • timestamp-query is unimplementable on TBDR architectures #2046
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Ben Clayton
    • Brandon Jones
    • Corentin Wallez
    • Gregg Tavares
    • Kai Ninomiya
    • Ken Russell
    • Loko Kung
  • Microsoft
    • Damyan Pepper
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
  • Francois Guthmann
  • Michael Shannon
  • Rick Battagline

CTS update

  • KN: Shrek's working on reftest coverage. Would be useful for you to run these manually in your browser. Haven't done different formats yet.
  • KN: contributions welcome!

Do pipeline blend states need to be validated against the pipeline render targets? #2025

  • CW: this one's specifically about the alpha.
  • KN: summarized last week. Q: when blend state reads dest alpha, what happens when dest format has no alpha channel?
  • KN: Rafael & Myles confirmed this is well defined in Metal and D3D12. Haven't checked Vulkan yet. Doesn't seem likely Vulkan will be super different. Mobile archs can be different though.
  • KN: think the way forward is to assume this is correct, and do something about it in the future if we need to.
  • CW: think it's safe to say the Vulkan WG will amend the spec to say this was the case the whole time. Otherwise we can change the blend state to put 1.0 as the dest alpha.
  • MM: if there's a device that doesn't adhere to this it probably has small enough market share that it won't have to get WebGPU.
  • KN: also unlikely this will happen because WebGPU doesn't have RGB formats, only R and RG. Unlikely you'd use dest alpha with these.
  • CW: let's call this consensus.

Data flow stream subsetting #2038

  • MM: followup from last week. Trying to determine if vertex shader can use subset of vertex attributes, or if fragment shader can use subset of vertex shader's outputs. Metal allows both, would be good if both supported.
  • DM: we support the first case but not the second one. I don't see new data to make us revisit this decision. Restriction may be annoying to OpenGL users, but natural for people coming from HLSL. Allows us to move forward with DXIL in the future.
  • MM: we're more strict than HLSL.
  • DM: in what way? You mean the ordering?
  • MM: HLSL does allow subsetting, just there are rules. We don't allow subsetting between VS and FS.
  • DM: have to declare varyings in same order. Can cut off the tail of varyings. Don't know of anyone relying on this. Can revisit if useful, but want to hear from people using HLSL.
  • MM: why is this hard to implement?
  • DM: today when we create HLSL at pipeline creation time it's not hard to implement. But want to make DXIL bytecode at shader module creation, and don't know how the interface will match at that point. Need to not have gaps. Wouldn't be able to create it at ShaderModule this way.
  • MM: or we could add a validation rule. Data structure the VS emits and FS consumes have to have validation - match somehow.
  • DM: that's what we have now. The set of outputs and builtins have to be the same.
  • MM: we could relax that. Instead of data structures being equal, one has to be a prefix of the other.
  • CW: DM mentioned problem with SV_Position?
  • DM: yes. It counts as internal varying register in D3D. Can have matching locations for user defined varyings...but if you have SV_Position mismatch it doesn't link.
  • BC: we ran into this in Tint. SV_Position's always moved to the last element of the structure.
  • MM: can we put this in validation rules? Fragment has to consume prefix of Vertex's emission, and VS always has to produce position? If structures aren't identical then FS has to consume position.
  • DM: we aren't comparing structures anywhere. They can be entirely different. We have a way to sort them for HLSL output. This isn't the concern for this issue. It's DXIL output at shader module creation, and we haven't started working on this yet.
  • MM: sounds like some subsetting is possible even if we want to emit DXIL.
  • CW: in HLSL you have the problem of where to put SV_Position. Assume you don't put it in the middle. Assume you put it at the start of VS. Then in all fragment shaders have to put SV_Position at the beginning even if you don't use it. Has a cost. IF you put it at the end of VS…
  • MM: only a problem if frag doesn't consume it.
  • CW: No, if FS consumes SV_Position then it has to consume all varyings.
  • MM: not every FS uses SV_Position.
  • DM: today they don't have to consume SV_Position. Only user specified varyings have to match. THey can already benefit from this from current rules.
  • MM: today if VS emits 2 varyings and FS only wants one, can you write that in WebGPU?
  • CW: no
  • MM: that's why I'm asking for it.
  • CW: you'd be happy with
  • MM: if you consume SV_Position in FS, then structures have to match, but if you don't, the structures don't have to.
  • CW: and, only the prefix of the structure has to match?
  • MM: yes.
  • CW: that's complicated. Don't know who can take advantage of that. I fail to see how ppl take advantage of this in HLSL, maybe I'm missing something.
  • MM: this is a language feature that all 3 backend APIs support, feels natural to support it.
  • CW: TBH I think we'd be OK with just compiling shaders at pipeline creation. But that's our impl.
  • DM: it is true that this case is supported. HLSL's restriction is portable to the other APIs. Formulating the rules seems complicated. Not clear who'll take advantage of it. If formulated in good way - no reason to not accept it.
  • MM: so should I make a PR?
  • CW: slightly concerned about this. What about not putting it in the spec and seeing if people ask for it? This is adding complicated validation.
  • MM: not sure it's that complicated.
  • CW: developers will look at it and be concerned. Similar to strip index format. There's a good reason for it. Non-obvious though. Think maybe we should delay.
  • MM: think you and i have different conceptionsn about how complicated the PR will be. If it's a complicated thing requiring lots of thought, it's an argument against it.
  • CW: it's the arbitrariness of it. Convinced it's only 2 if-statements in the spec. Feels arbitrary to have this limitation.
  • MM: think that level of arbitrariness is acceptable to support this feature that all backends support.
  • KR: Concern (from experience with OpenGL) is that adding this ordering in the spec right vs. allowing any subsetting or disallowing all subsetting. Let's keep the spec simpler instead of adding this constraint semi-arbitrarily and keep it forever.
  • MM: oen thing I can do - ask Metal team if people use this feature. If they say no I agree with what Ken said.
  • CW: I'd be interested in asking early adoperts of WebGPU of this - like Babylon.js folks are getting a lot of stuff running on WebGPU.
  • DM: not sure if Metal developers would give you the right answers. Their semantics are much more free than what we're considering here.
  • (FG async: always match them between VS and FS personally)
  • (MS async: always match them except maybe debug)
  • MM: two questions. How often is SV_Position used in FS? And how many people do subsetting? If #1 is 'all the time' it doesn't help us in WebGPU. If #2 is 'never' it isn't an argument for what I'm proposing.
  • DM: you'd ask them if subsetting's done in HLSL semantics, not Metal semantics? We're not considering Metal's subsetting semantics.
  • MM: I'll see what they say. People who produce Metal code often do it cross-platform.
  • CW: let's do that. We'll ask people to see how useful this is. We can have the PR if you want. Let's
  • KR: Get feedback from the Chromium Origin Trial, see if there are complaints.
  • MM: we can do that...I don't think we should not do the other things though.
  • CW: we should reach out directly. People have to renew their Origin Trial key; they're explicitly asked for feedback on the spec.
  • MM: I'll make a PR, we'll all talk to other people, and let's come back in a few weeks.

Dealing with holes in the pipeline layout. #2043

  • CW: various bind groups are for data that varies at differing frequencies. What if you have no model data? No view data? Create empty BGL. Device.createbGL with no entries, then a BG with 0 entries, then bind it. Awkward. Should we make an allowance for binding a null bindgroup? Or pass null to say "there's a hole here"?
  • CW: this is developer friendliness, maybe we should wait for someone to complain before we add more complexity to the spec.
  • DM: this isn't allowed on Vulkan. Anyone writing tot hat spec won't expect this to work.
  • MM: arguments against? More code we have to write in browsers?
  • CW: might need an extra branch in draw call validation, or magic when someone binds the empty bind group. Equivalent to binding null. Slight complexity.
  • FG: weird that you have to bind an empty bind group. Used to have dynamic indices; switched back to static indices, and now i bind an empty bind group when needed.
  • CW: how annoying / difficult was it to figure this out?
  • FG: you have to know that it'll bite you. First implemented thing to be fully dynamic and squash the holes. Then had to go back. It was fine, not that hard.
  • DM: don't understand the case. Maybe need more details in issue. You have different logic layers and each one wants its own entry in pipeline layout? Each one defines its own BGL and BG. Done.
  • CW: if you have 3 layers and you only used 0th and 2nd, then first one's the empty BGL.
  • CW: FG can you add more comments into the issue?
  • FG: yes, will figure out the starting point of it.
  • CW: defer this for another week then.
  • MM: not clear what cost would be. Would be useful to have more details. Runtime pef cost? Implementability? Time to shipment? etc.
  • CW: will do.

Binding a zero-sized range of a vertex buffer #2044

  • AE: currently spec lets you bind 0-sized range of vertex buffer. Works in our impl, except for Vulkan where there's a validation layer error if you bind 0-sized range at very end of buffer. Not allowed. Either: disallow this specific situation, or paper over it in implementations. Or disallow 0-sized vertex buffer bindings completely.
  • AE: we currently disallow these for storage and uniform buffers, but for different reason. Need at least 1 element in runtime sized arrays in shader.
  • AE: I'm in favor or papering over it in the impl.
  • MM: at time developer asks browser to do this we know they'll fall into it. So we'll have a singleton buffer around with 4 bytes in it?
  • AE: yes, that's one way. Or, technically, could make all Vulkan buffers 1 byte larger if they're vertex.
  • CW: we're finding various cases where we need extra padding in buffers. D3D, need 256 byet alignment. Others need 4 byte alignment. Already have facility to have extra padding at end of buffer and zeroing it out. Might as well add 1 byte to every Vk vert/index buffer.
  • MM: sounsd good.
  • DM: fair to say zero-sized binding has no use? Can't draw anything with 0-sized vertex buffer. Maybe better to skip whole thing?
  • KN: you can make 0-sized draw calls and they won't do anything. In the past we said if you generate data and it happens to be 0-sized it'd be good if it worked.
  • MM: we said we'd do this for indirect draws. Set count to 0. Sounds similar to what developers would want to do.
  • CW: we run into issues on some Metal devices - empty dispatch crashed the GPU, but empty dispatch indirect worked. Empty copies caught by Metal debug device. It's nice when the runtime just works, up to the limit values.
  • DM: you suggest to make the buffers longer and guarantee the extra bytes are 0?
  • AE: yes. You have to make the buffers longer anyway for 4 byte alignment. vkFillBuffer requires offset/size 4 byte aligned.
  • DM: also have branch in setVertexBuffer for empty range and use 1-byte range?
  • CW: no, that's empty range that's not at end of buffer. That's allowed.
  • MM: check in Vk is that the 0-sized part has to be less than the size of the buffer?
  • CW / KN: yes.
  • DM: it is a weird Vk validation rule. I have no objection to this.
  • CW: sounds like consensus then.

timestamp-query is unimplementable on TBDR architectures #2046

  • MM: unfortunately don't have good solution / proposal. Extension we have "timestamp query" adds 3 functions. Each, behavior is: app calls this function, when it's executed, the current timestamp's written to a buffer. 3 functions are inside compute pass, inside render pass, or outside any pass. Reasonable.
  • MM: except: considering the one inside the render pass: that concept that app says "draw draw draw", then tell me what time it is - TBRs execute all draw calls from a pass together. No place at which one draw calls is complete inside a pass and the next one hasn't started.
  • MM: Metal solution's complicated. Dialog between application & device. Wish we could do better.
  • KN: pretty sure we were aware of this years ago when we first discussed this topic. Mostly aware of it since then. We decided - hoist WriteTimestamp to begin/end of render pass on archs that can't support it. Get garbage data, but also obvious that it's garbage.
  • MM: thought about that. THink that's antisocial. We're exposing an API that says you can do this operation but it can't. ONly on some systems. If you say - ask for time, do a draw, ask for time - the times might float together to begin/end. Your draw took 0 time. Nobody wants that.
  • GT: can this be an extension not advertised on unsupported platforms?
  • MM: it is an extension. All iPhones won't be able to support it. THey support the philosophical feature.
  • GT: then maybe you should separate those extensions.
  • MM: think that's reasonable, we could proceed that way. Wish we could do it with one extension. Not totally against it, but think we could improve upon it. iPhones & M1 chip - you can ask for what time it is, but not between adjacent draw calls. Ask at beginning/end of vertex/fragment processing of render passes.
  • GT: think there's no way to make these identical for developers. Think they need to check for what platform they're on.
  • MM: when ppl make web pages they make them on computers, not tablets/phones. Would be unfortunate if they don't get the extension they want.
  • GT: better than failing in an ambiguous way. Debug for a couple of days.
  • MM: not proposing we change semantics to have current time queries float around.
  • CW: that'd work - keeping extension but removing it from render pass. Not that many people timing individual draw calls. Instead passes. Almost all GPUs are tile based today, so not sure how well timestamp queries work. Remember you could ask for begin/end of draw call of render pass of your tiler. We could take advantage of this. Maybe it's only CPU data. Floating things around, making draw calls 0-time, seems developer-hostile.
  • CW: think we can remove this.
  • MM: text on Metal - kind of queries the iPhones and M1s support - you can ask for times at beginning/end of render pass. Also can't do it around individual dispatch calls in compute pass. Has to be around entire compute pass.
  • CW: is there concern about splitting the compute pass?
  • MM: think it'd be novel. Not sure there's another place we have to split compute passes.
  • MS: if you provided an API that worked differently based on tile-based or not - I'd use that to optimize for tile-based deferred rendering or not. So you might as well tell me when it's on that architecture so I can handle things differently.
  • DM: goal if our group's to converge on API implementble on target platforms. Apple phones are a maojr platform. Can we adjust the API to be compatible? Is it enough to say, you can do timestamps within compute/render passes? Would that cover the target hardware base we need?
  • MM: how would developer interact with these flags?
  • DM: exposed in adapter caps in some way, like limits are..
  • MM: dont' change API here, but have extra bits that developer must query to figure out how to use existing API? I'm worried about the case where the developer doesn't check these bits, and they'll get validation errors.
  • KN: question / proposal. For rendering, can we take writeTimestamp out of render pass, and have extra fields in render pass descriptor like "write certain time when the render pass runs"? And on non-tile architectures, beginning of fragment floats to top, end of vertex floats to bottom?
  • CW: you specify the stage in vulkan. but we could do that on d3d12.
  • MS: I don't want to capture the setup at the beginning. If I can get just the draw, that'd be enough for me.
  • MM: explain more?
  • MS: wouldn't want to bound the whole setup of the render pass. Pipeline setup, etc. As long as I can get just the draws - want to start that timer after setting up pipeline/bind groups / etc. I don't care if the draws are lumped together, but don't want them combined with the pipeline setup.
  • MM: don't know if that's implementable on Metal.
  • CW: here you're not asking about asking individual draw calls. Just the setup you want to avoid?
  • MS: yes, for perf eval, want to separate the two. RIght now you can because you can interleave the timing commands.
  • CW: right now if you put timestamp after SetPipeline in WebGPU it comes after that. Impls and driver use dirty bits to figure out what state to apply. If you do setBindGroup; writeTimestamp; draw the setBindGroup will be after the timestamp.
  • MS: OK, maybe doesn't matter then.
  • KN: feel setPipeline / setBindGroup are instructing encoder, not GPU. Expect these to not generate GPU commands at all.
  • MM: on many devices, just setting a pointer. What would help - if in your (MS's) app you ran on desktop where you can distinguish between individual commands. If 30% is setup and 60% draw calls - would be useful. But if 1% / 99% , distinguishing doesn't make much sense.
  • MS: will have ppl coming from OpenGL / WebGL - we think differently - assume that the old presentations from NVIDIA still apply, where setPipeline / setBindGroup are very expensive. Ppl will think we need to perf test that, do time queries around it to see what it's doing. If not that much, we need docs to convince people it's not.
  • KN: I wonder about that. My understanding of pipeline switches in GL is that they're expensive on CPU. If you issued timestamp, then did a lot of pipeline setup work and then another timestamp, assume those time deltas would effectively by 0.
  • MM: I'd expect the same thing.
  • CW: think we're getting to a resolution here.
  • MM: when I opened this issue I didn't make a proposal. How about I make a real API proposal, can do by next week.
  • CW: can you have writeVertexStart, etc. in RenderPassDescriptor, or should Kai do that?
  • MM: if I were to write a proposal it would be to add extra things to RenderPassDescriptor. Would all be declarative.
  • KN: SGTM

Agenda for next meeting

  • For the week after: Do pipeline blend states need to be validated against the pipeline render targets? #2025
  • Dealing with holes in the pipeline layout. #2043
  • timestamp-query is unimplementable on TBDR architectures #2046
  • Whatever else comes up.
  • If old issues / proposals you feel haven't had enough attention - please put them here. Surface them in the project, but there's a lot of stale ForMeeting stuff on project.
  • MM: will we try to do some burndown?
  • CW: yes. We need to go through them systematically. Let's discuss this next time too.
Clone this wiki locally