Minutes 2019 06 10

Agenda / Minutes for GPU Web meeting 2019-06-10

Chair: Corentin

Scribe: Ken + Austin

Location: Google Meet

TL;DR

AI: CW to call for spec editors, DJ to call for comments on the charter.
GPURenderBundles
- Concern that JS perf shouldn’t drive API decisions but bundles still needed for other uses.
- Suggestion that bundles could contain any kind of command, but agreement to discuss that as a follow-up.
- Agreement to merge the PR with a restricted set of commands.
Nooping invalid indirect draws
- Suggestion to add nooping draws to list of robust buffer access behaviors to help with GPU-based validation of indirect dispatches.
- Concern that this extends robust buffer access a lot.
- Discussion about programmable vertex pulling to emulate robust buffer access in the shader.
Shading language ingesting
- Concerns that WHLSL was started 1.5+ years ago and still shows no result while SPIRV works well today. Apple says it’s getting close to something and asks for a little more time.
- Concerns about SPIRV are size of WASM compilers and unknown performance pitfalls when translating to MSL or HLSL (but no specific pitfall was mentioned).

Tentative agenda

GPURenderBundles
PR Burndown
Shading Language Ingesting
Indirect operations + noopping invalid draws
Select the date and time slot for the next F2F?
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Justin Fan
- Myles C. Maxfield
Google
- Austin Eng
- Corentin Wallez
- David Neto
- Idan Raiter
- Kai Ninomiya
- Ken Russell
- Ryan Harrisson
Intel
- Yunchao He
Microsoft
- Chas Boyd
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
Joshua Groves
Mehmet Oguz Derin

Administration

CW: We need to make a call for specification editor. I’ll do it.
CW: I think we want to gather suggestions for three weeks and then decide.
CW: progress with WG formation?
DJ: think we should just pick a date to say people are happy with the draft charter. 2-3 weeks out? Have an action to fix one thing in the charter from the F2F. 2 weeks after that?
CW: how about fix that and then send out call for requests on the charter.
DM: one question from charter is: who are we making this API for? Maybe we can re-clarify in the charter.
CW: this is coming up because there are several discussions about making the API more approachable if we did various things. To resolve that discussion we can decide who our customers are more precisely.

GPURenderBundles

CW: At F2F we agreed on reduced versions of render bundles so you can only set resources and do draw/dispatch.
CW: still some concerns about how this will map to D3D12. No discussion about what might need to change. RC, do you have anything you’d like to raise?
RC: Are we agreeing that we’re going to declare bankruptcy on Javascript + JIT === native code?
CW: that’s not this group’s call to make.
RC: Seems clear that in Chromium the Javascript overhead is too large for people to make command lists from scratch every time. If that’s the case, then a bundle or some kind of replay is unavoidable.
CW: seems clear we need a way to record some commands in a render pass.
KN: even under the assumption that JS+JIT is arbitrarily fast, we still have the fundamental overhead of going across the process boundary, and valiation. Separate problems that can’t be eliminated as easily.
CW: And validation
RC: agree. If we are going to have this reusability, I worry having the restrictions will make them use unnatural constructs. Should we have an additional facility where we can record and replay anything into these bundles?
CW: we think this is a good idea but would like this to be a separate PR.
MM: How about we don’t call it bundle because it’s not a bundle and call it something else, like a “command list” and later, we can make a hardware-accelerated one.
CW: would be a “render command buffer” or “render command list”?
MM: yes, maybe that or a “recording” bit.
CW: OK. Just need to make sure it’s inside a render pass. Fine with us too.
RC: to clarify, this would have no restrictions like the ones in the current PR?
MM: No restrictions. I think it’s okay if we come up with an API that doesn’t work similarly on all the platforms. The once concern is portable performance.
DM: will there be a way to tackle that later? If we allow certain things we can’t take them away later.
MM: It would be a separate API concept. Separate software + hardware command lists.
CW: my intuition: even the software form would likely get accelerated by some implementations using bundles or other hardware concepts. Just a matter of optimization at that point.
DM: think we’re going slightly in the wrong direction here, because instead of mapping to hardware, we’re mapping to JavaScript. People expect to be closer to the hardware than JS.
CW: if we’re talking about gpu-rs details. If we have bundles, every time there’s a command not supported in a D3D12 bundle, you can split the bundle. So a WebGPU command list would be a list of D3D12 bundles.
DM: Then with all of the various platforms APIs, the users have no idea how many hardware bundles are under the hood and they have no idea what the performance characteristics are.
CW: perf characteristics of GPU render bundles would be to be faster to execute the bundle than re-record the command list. Always weird cases like D3D12 descriptors. WebGPU API isn’t in the business of specifying the 1:1 mapping between WGPU and D3D12 calls.
DM: People would want to have an idea of how it works to use it efficiently. If we allow everything, you would see articles popping up informing you what to use/avoid if you’re targeting Vulkan, etc.
CW: we’ll have differences between Dawn and wgpu-rs and WebKit. These articles are bound to happen because we have differences.
DM: don’t you want to have an option to use the low level primitives? Or will we never end up using them?
CW: I’m not clear that Dawn will ever use them (low-level bundles).
DM: ok...so from your API standpoint it doesn’t matter how closely we stick to the lower level concepts?
CW: Sure, but my point is more that WebGPU API can’t guarantee that GPURenderBundles will translate 1:1 to D3D12 bundles because that’s an implementation choice. Right now I’m not convinced we’re ever going to do it.
DM: do we have enough evidence to say that this isn’t going to be more efficient? The conversation with Rafael is exciting - thank you for it. When recording at the driver level, there might be a perf advantage in some situations, but too late to change the API to enable that sort of efficient recording.
CW: That’s why we discussed we start with a restricted version and then can talk about extending it. We start with a version that can translate 1:1 to D3D12 and then we can open another discussion to talk about extending it.
DM: maybe I misunderstood. Though the conversation was going toward being able to record all commands into it.
CW: that’s what MM suggested and we’re OK with it because you can just split the bundle recording if you see a command that’s not supported in the bundles.
DJ: Myles wants to make sure that if this is coming up just because of perf issue in Chrome, we should at least do analysis in another browser first.
CW: this is not specific to Chrome. Bundles are important (offline in the pull request) for multithreaded recording of things inside render passes. Imp’t so app doesn’t have to re-walk the scene graph, do redundant validation again, etc. Not an implementation detail of Chrome.
RC: plus, there’s serialization / deserialization from renderer to GPU process which I understand that all browsers will be doing eventually. For D3D12, bundles were intended as lightweight reusable concepts. But on the whole they’re a performance loss because they’re too restrictive. This is why I worry about making these too restrictive in WebGPU. Devs will do awkward gymnastics and find they didn’t work as well as they could if they were unrestricted.
CW: So, this could be an argument for saying we need an unrestricted version and when implementing WebGPU on D3D12, it’s a pessimization to use bundles, so you can trade CPU performance for emulating with command lists.
DM: not sure if RC’s quote about devs getting lower performance would apply to us. We have very specific uses of D3D12. We will never use the data static, for example. For us the perf expectations should not be based on what the native apps get. We would like to profile it and get actual data.
CW: ok. My understanding: Apple’s OK with restricted or unrestricted. MSFT would prefer unrestricted. Moz would prefer restricted to translate better to D3D12 bundles. How about we merge the PR and then consider unrestricting the render bundles in a separate PR?
DM: right. Doesn’t seem to be opposition to some sort of reusability inside the render pass. We can start with what’s in the PR.
RC: Seems fine to me, so long as the follow-up.. what would be the criteria for accepting it? Will DM try D3D12 bundles and report back?
DM: that would be ideal, yes.
CW: ok, let’s do that. If you have comments on the PR please make them, then I’ll merge it and put up another PR discussing unrestricting the render bundles.

Indirect operations + noopping invalid draws

KN: We want to make the list of optional behaviors for OOB accesses to include draw calls that access OOB data in a vertex buffer could be no-oped instead of clamped. This allows checking validity and then canceling it instead of duplicating vertex buffer with the clamped values inside. This could be worked around by doing vertex pulling and modifying the shaders such that incoming data is clamped in the shader. However, I’d rather allow more behaviors now than restrict it further.
DM: my first reaction: seems quite unsound that this behavior is different for vertex buffers than other buffers. In particular “may” no-op the draw call.
KN: Robust buffer access already says it may do some set of things. This is just a little more drastic.
DM: do we agree that the robust buffer access as it’s defined in the Vulkan spec is the way to do safe buffer access
CW: That’s the ideal version. For example in Metal in uniform buffers, there’s no way to implement robust buffer access easily. It’s much easier to simply produce an error.
KR: There is no way to emulate the vertex hardware. You have to duplicate the index buffer and clamp the values.
KN: There was a suggestion that if you do vertex pulling you still hit the vertex hardware, but I have no idea how that works.
DM: I don’t think it’s fixed function for the most part. It’s the code compiled by the pipeline layout. Don’t think there’ll be a huge perf difference iff the vertex cache is used. That’s the main reason Ken mentioned to not do this.
KR: The article I read with regards to vertex pulling, used the texturing hardware to store vertex data (OpenGL). Other APIs have SSBOs. I don’t know completely the different options to produce the vertices.
KN: You can use an SSBO in the vertex shader and dereference everything yourself.
CW: another concern is uniform buffer access when we don’t have robust buffer access because of the layout rules. Can’t always dynamically index and clamp. Something Myles raised as well. For uniform buffers it’d be best to produce an error if they’re too small. Either queue submit time or another time. Can’t practically do clamping in the shader.
DM: you can’t clamp in the shader but can figure it out at command recording time?
CW: yes.
DM: ok, so that’s your argument for discarding the whole draw call?
CW: Yes, because we know in some cases we’d like to give the option to produce errors. I think we need to structure the discussion and have a document that explores all cases. We’ll do that for next week.
DM: I’ll try to clarify whether vertex cache is hit for vertex pulling.
MM: There’s a difference for clamping uniform and vertex buffers. You can clamp uniform buffers in a shader, but vertex buffers happen before the shader even runs.
KN: Found a benchmark, haven’t looked at it yet.
- https://github.com/nlguillemot/ProgrammablePulling
KR: It seems there is a way to transform the vertex shader to feed in all the vertex data as a SSBO and dereference into that as an index buffer, but this changes the bindings significantly and consumes a lot of binding slots.
MM: Yes, we shouldn’t do that. It’s bad.
DM: well, I wasn’t thinking about binding the index buffer as a shader input. Shader would use vertex ID which would enable caching. Follow-up I mentioned in case people didn’t read it, it would require us to put vertex attribute definitions in the pipeline layout, which enables us to build binding layout properly. We would have to put it in the PipelineLayout.
CW: Yes, that’d be unfortunate.
MM: Isn’t it in there?
CW: No it’s in the RenderPipelineDescriptor
DM: it’s not completely unfortunate. In Metal it would allow proper fixup of the binding slots, when Indirect Argument Buffers are not supported (since Metal would need both vertex buffers and bindings to co-exist in this case).
CW: action items: DM to clarify vertex cache. Idan/Kai, clarify all sources of robust buffer access and how it can be emulated in the shader, as supporting materials.

Shading Language Ingesting

CW: not sure who requested this. Jeff? Not present.
DM: not sure what input Jeff wanted to bring aside from what we had.
DJ: did he just raise the topic? Not more specific than that?
CW: I think his assumption was that he would be here today. Pretty sure he just proposed shading language ingesting.
DN: when we started these discussions we had stated our points of view, and why. My view: WHLSL is a nice ideal with high-minded principes. After working with that spec, trying to write a compiler to SPIR-V from it, and running into areas where there were divergences between grammar and spec and implementation. And in working with programming model, most of the corner cases were discovered via SPIR-V, and no corresponding issues in WHLSL spec. Found that my initial suspicion was that WHLSL wasn’t getting the complexities of parallel GPU execution. Nice serial language, but not ready for GPU execution. Is an “extended subset” of HLSL (e.g. JavaScript vs JScript). No ecosystem, no users at the moment. When we abandoned writing a compiler for it, asked, what do users need? They’re coming from WebGL and need a GLSL compiler. Would have liked WHLSL to be better than it is atm, and think there’s a triathlon to be run and is not suitable at this time. Don’t have high hopes that it will land well - just not mature.
CW: the conclusion from this on Google’s side is that we see the discussions with SPIR-V and WHLSL started 1.5 years ago, left the door open as a group for a long time, but we think the only direction that works is SPIR-V.
DJ: We’ve been doing a lot of work on WHLSL the past couple of weeks. I respect your position. I would ask to hold off for a couple of weeks before a final conclusion. Maybe it won’t change anything. I don’t expect we’ll have run or biked, but we’ll have more to say.
CW: that’s what we’ve been saying for quite some time now. Are we able to set any deadlines for this? We’ve been deferring this discussion for a long time.
DJ: I guess, but at the same time, not much progress has occurred and it hasn’t changed a lot of our original concerns about the security model or the desire to have a human-readable language.
CW: I think the security aspects have been answered. They’re a property of the implementation of the language and the programming model has been ensconced in the WebGPU execution environment for SPIR-V.
MM: I think David outlined a bunch of arguments. Many of them boil down to the fact that SPIR-V has a longer history. That’s true; we can’t debate that. Everyone knew that going in, and I don’t think anything has really changed. You’re right that at some point we’ll have to make a determination. Right now we’re very close to getting an implementation into Safari’s developer previews. When developers are able to use it, we’ll have a lot more data to see its merits. Right now we have a lot more data with one language. We’re asking for a fair comparison.
RC: from my side I haven’t seen the size question be addressed. Babylon.js people continue to raise size as a concern. Asking them to ship 500-700K compiler is too much; costs their customers money.
CW: To clarify, this is a general constraint with Babylon.
RC: the times I’ve talked with them they continue to raise it as a concern.
RC: second thing: has anyone made a mapping to see how well SPIR-V maps to DXIL or MSL? If they don’t map very well then it could put some backends at a disadvantage.
CW: there are already dozens of games taking SPIR-V to Metal. DXIL vs. SPIR-V; pretty sure David’s team knows how to do this.
DN: we’ve built a SPIR-V backend to the DXC compiler. Starts from the AST. Both SSA, infinite registers, strongly typed. Haven’t tried to do SPIR-V to DXIL or vice versa. Assume most people will go thru spirv-cross path. That’s what MoltenVk does to bring Valve’s catalog to Mac by pretending it’s Vulkan. Has demonstrated good perf.
DM: As to how efficient it translates to Metal, clearly there’s a lot of Steam games that use this translation path. The translation was developed under the assumption we only have SPIR-V as an input. There might be ways to do this if you were writing into MSL directly. The question is open and AI available to investigate what we could do better with MSL features in a direct translation.
KN: think it’s clear that we can not invent something new. We’ve already tried to do that with WHLSL and our argument is: it’s too complicated to do in a reasonable amount of time for this spec. But if there are capabilities in MSL that we want to expose that we can impl on other platforms and want to expose, I’d be more concerned about how well the original intent of the shader is carried through from source language to SPIRV to whatever. We don’t have the time to build up a new ecosystem; it has to be GLSL, HLSL, SPIR-V, etc.
MM: Isn’t a safe profile of SPIR-V something unique?
KN: we have the vast majority of an ecosystem already. It’s a very small step from Vulkan SPIR-V to WebGPU SPIR-V.
MM: our claim is that it is not a small step.
CW: Ryan has been working on adding WebGPU-specific validation constraints to make safe SPIR-V. There’s a very small number of things that need to be done.
KN: I don’t want to say it’s trivial but it’s much, much simpler than creating a new ecosystem.
RC: One of the goals of this whole endeavor is to make an API that will work well on all of the underlying APIs. But with WebGL, there were problems with specific slowdowns on D3D, etc. So if going from SPIR-V to DXIL is very slow or awkward, then I think we need to revisit that.
CW: real games are using spirv-cross today. To go to HLSL, MSL, etc. Game devs care a lot about perf, and if there were huge perf gaps they’d have found them by now.
MM: RC just listed some things that the safe spirv profile will have to do to prove that it’s ready. There are clearly some things WHLSL will have to to do prove it’s ready.
CW: Can you list the things? I didn’t understand if there was anything to prove.
DN: concern about: size of tools on the web page (my team working on), good mapping on non-Vulkan platforms (mac, ios, and dx12).
CW: we already have perf proof; game devs are using this today and they wouldn’t be if perf was a concern.
MM: Yes, they’re using it for their specific application.
DN: MoltenVK is the tool for the Mac and iOS side.
CW: DxVk uses Spirv-Cross as well.
RC: out of curiosity how many games does dxvk ship with?
CB: are these perf competitive or are they proofs of concept?
DN: In press releases, I’ve seen Valve very happy with MoltenVk on MacOS. They say it exceeds OpenGL. I don’t have information for DxVk.
CW: the point being, they’re selling software for real money.
DM: I was just about to bring the same point as Chas. Point was to make something run with least effort on macOS, not reach max perf.
MM: Also comparing Metal to OpenGL is a difficult comparison to make.
CW: I don’t think we can prove that SPIR-V can be the same performance for all shaders. It’s not tractable to prove for every shader.
RC: then what’s going to happen when we find an inefficient mapping that leads to a perf problem?
CW: That’s going to be the case for every shading language. For any language, there’re unknowns, and we can’t assert the performance on every kind of workload compared to other languages. We can do tests on shadertoy, etc. but we can’t do it for every possible shader.
RC: but if we do quantify for a large portion of shaders that there is a problem, then we have to solve the problem.
CW: agree, but want to limit the number of Ph.D’s we mint in this group.

PR Burndown

Add read-only storage buffer bindings #303
Set initial frontFace value to "ccw" #313
Change GPUObjectBase to a mixin #315

Agenda for next meeting

PR burndown.
No-op’ing draw calls again. DM may be less accessible next week because of all-hands.
Shading language again?
- MM: aiming for next Safari tech preview. Need more than 2 weeks. Can get back on a date by the end of the day.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2019 06 10

Agenda / Minutes for GPU Web meeting 2019-06-10

TL;DR

Tentative agenda

Attendance

Administration

GPURenderBundles

Indirect operations + noopping invalid draws

Shading Language Ingesting

PR Burndown

Agenda for next meeting

Clone this wiki locally