Minutes 2019 06 17

GPU Web 2019-06-17

Chair: Dean (while Corentin was out)

Scribe: Ken

Location: Google Meet

TL;DR

AI: Kai to run the test of no-opping on iOS, and report back.
No-opping invalid calls discussion
- Three difficult solutions. More investigation needed.
Spec editors - call for nominations is open. Volunteer as tribute.
Bitmask naming - still to decide
Structures / sequences for GPUColor and friends - accept the merging solution
Add read-only storage buffers - agree to accept the proposal, but would like to see more explanation of validation for dynamic in the specification

Tentative agenda

Nooping invalid drawcalls
PR burndown
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Justin Fan
- Myles C. Maxfield
Google
- Idan Raiter
- Kai Ninomiya
- Ken Russell
- Shrek Shao
- Ryan Harrisson
Intel
- Yunchao He
Microsoft
- Rafael Cintron
Mozilla
- Jeff Gilbert
Mehmet Oguz Derin
Timo de Kort
Dayman Pepper

Nooping invalid drawcalls #320

KN: the plan was to continue the discussion from last week.
IR: we believe the only place we’ll likely have to do this is on Metal, because the other platforms sufficiently support robust buffer access.
IR: Test:
- A modified version of Apple’s Hello Triangle from https://developer.apple.com/metal/sample-code/
- 500,000 triangles were rendered with position and color. One method used vertex attributes (push), and the other used a buffer to pull position / color based on vertex_id. FPS was measured using xcode.
IR: Results:
- We ran the test on Nvidia / AMD / Intel and found no difference in FPS. This indicates that pulling works about just as well as pushing. However, we didn’t test mobile devices yet. Benchmarks on Nvidia with OpenGL from https://www.reddit.com/r/opengl/comments/6mj2uz/opengl_programmable_pulling_benchmark/ suggest it may be very slightly slower.
IR: this indicates that vertex pulling may be OK to do and that we wouldn’t need to no-op invalid draw calls on Metal. We’re open to not doing it but would like to see if having the option would still be good or not.
MM: thought older Vulkan versions didn’t require robust buffer access?
KN: you don’t have to turn it on but think that you’re required to support it.
- Vulkan 1.0: All Vulkan graphics implementations must support the following features: robustBufferAccess. All other features defined in the Specification are optional.
JG: think it might be that it’s not required but has 100% activation.
MM: curious to know what similar study on iPhones would look like.
KN: us too. Plan to look.
MM: just one attribute or a lot of attributes?
IR: just one attribute.
KN: we also tried a bunch of attributes, but all within the same buffer. We also tried the Reddit link in the minutes.
MM: if I remember correctly it’s published at AMD that vertex pushing is implemented by augmenting the shader. So I guess running a study on iPhones would be important to proceeding.
KN: if we do find vertex pulling is significantly slower on iPhones then would you like to allow no-op’ing?
MM: instead of clamping?
KN: yes.
MM: I think no-op’ing is better than clamping. You either need to reset the clamped values, or reset them later.
KN: we’d have to make a shadow copy. Can’t modify the buffer. If we don’t find on iOS that vertex pulling is a problem, is that enough reason to disallow no-op’ing in the spec? That still seems like a strong move to me.
MM: from the data I think you could make that conclusion, but it’s concerning because every 3D graphics API has all had this facility.
JG: which facility?
KN: modern graphics APIs all allow vertex input rather than requiring you to do pulling.
KR: NVIDIA is saying we shouldn’t go off the beaten path here. It’s inevitable that we’ll find some GPU that hasn’t implemented robust buffer access correctly, as we’ll have to implement a workaround. Thus I think we do need to allow no-oping.
JG: the beaten path is to rely on robust buffer access behavior and the details of how it works. Allowing no-op’ing out-of-bounds draw calls is different than that. Concerned about it being a portability issue with code that writes out-of-bounds. Order of magnitude more of a correctness issue.
KR: We’re talking about implementing RBA on platforms that don’t support it. To do that you have to implement pulling if you don’t allow no-oping. We’ve allowed that in WebGL.
JG: in WebGL, we started out with not no-oping but generating an error. A stronger guarantee. Most impl’s still do that on most hardware. FF still does. Would like to rely more on robust buffer access, but today we don’t on most systems. Average person developing sees their draw calls not go through at all. On systems with robust buffer access, they get a slight speed boost from us not having to check. The opposite would be true for WebGPU because we’re not becoming more lax, we’re saying some systems are more restrictive. But most systems will rely on robust buffer access. Doing the draw calls with all their side effects - if we don’t always not do the draw calls, the average developer will set themselves up for a bad surprise when they try to run on a system that no-ops.
KN: I agree it’s a concern and I don’t like that.
JG: have been some concerns about some perf aspects. Main concern: correctness is still there even if some draw calls go out of bounds. If we’re able to efficiently detect and slow-path bad draw calls, probably workable.
KN: was going to bring that up. At end of discussion of #320, talked about it a little bit. Probably possible to do segment tree, check whether draw call would be valid, and if not insert the shadow copy, clamp and re-issue with shadow copy. Tricky to do well because naively it requires a round trip and without the round trip would be very very complicated. Would need ring buffer impl staying entirely on GPU. Generate GPU calls as argument buffers or indirect command buffers - not sure whether that’s possible. That impl is interesting to me but I don’t feel comfortable prescribing it. More comfortable prescribing vertex pulling.
MM: if the index buffer is built on the GPU are we suggesting downloading to CPU to do the checks or run a compute shader to modify the indices?
KN: we’d have some scratch space we can use as temporaries. If the index buffer will go OOB, make copy into scratch space, rewrite all draw calls to use that scratch space instead of the index buffer. Either change calls on the CPU side or use indirect command buffers to do so.
MM: any time a shader attaches a buffer that could be used as an index buffer, and you feed that into a compute shader, then you have to feed it into another compute shader to do the detection?
KN: have to do at start of render pass because during render pass the index buffer can’t change.
MM: is that part of the spec?
KN: I think it’s written somewhere that buffers can’t be used in both vertex or index input, and as writable usages.
MM: what about different pipeline states treating index buffer as shorts, and some as 32 bits?
KN: range of valid values in the index buffer depends on pipeline, and on the currently bound vertex buffers, and on arguments to draw call. Have to potentially do the shadow copy for every draw call in the render pass. Hoist, do the modifications, then draw.
MM: if we no-op’ed then we’d have to do some of that analysis?
KN: yes.
MM: that sounds preferable to shadow copies.
JG: only preferable if you want to optimize for exec speed for draw calls that go out-of-bounds.
KN: and for making impl a lot less complicated.
JG: it doesn’t sound that much less complicated. Instead of writing a boolean, writing an index buffer.
KN: if you don’t do it for all draw calls but only for those that might go out of bounds, it becomes a lot more complicated.
MM: agree that populating these in the middle of a render pass becomes really complicated.
JG: but we’re not in that situation. Can be done at the start of a render pass.
KN: It does have to be done in the middle of a command encoder.
MM: our impl doesn’t buffer up commands. That means each encoder would have to be a command buffer so we can generate an interstitial command buffer and stick it at the beginning of a render pass.
KN: the no-op’ing impl is non-trivial. The clamping impl on top of that is more complicated.
JG: one of them is required unless you can decide to do vertex pulling.
KR: but the no-op’ing is simpler.
JG: not sure about that.
IR / KN: it was pretty simple for Dispatch. If it’s > 64 KB then it’s invalid. Also we just make a copy every time. The amount of data in a Dispatch is 3 ints, not an entire index buffer.
MM: what is the criteria we’re using to decide? Implementabilty?
KN: Given the results of experiments, it seems likely that our impl will use vertex pulling. Don’t want to limit ourselves right now.
MM: One impl might use vertex pulling. Should some impls be allowed to not? Not sure which one we’d like to use. Apple would like the flexibility of choosing among these. If we decide to go with vertex pulling we can revisit this.
KN: Maybe this could be added to Metal.
DJ: have you exercised robust buffer access in Vulkan impls on mobile? Likelihood that they actually work?
KN: don’t know yet.
JG: sounds like we need some tests.
KN: some Vulkan folks might know. Not sure how well Vulkan CTS tests it.
MM: say I make a phone in my spare time. Can I make it run Android but not pass the CTS?
KN: a shipping Android phone that supports Vulkan should pass all the Vulkan CTS tests at the time it qualified. But the CTS was incomplete at various points in time.
MM: do we have a way forward here? Sounds like flexibility is a good idea for now and if it turns out nobody uses the flexibility we can remove it? And we’re going to write tests for this?
JG: I’d like to settle on robust buffer access but we don’t have tests yet.
KN: working on it. :)
DJ: next steps?
KN: more investigation. Would like to run tests on iOS. If we do use vertex pulling then it would be OK to table this and not formalize it yet, I suppose.
MM: and the question of behavior on Vulkan? Clamping, or can it no-op draw calls?
JG: anything from inside the buffer; 0; 1.
MM: but not allowed to no-op.
KN: only defines robust buffer access behavior but it can’t no-op.
JG: could clamp all out-of-bounds indices to 0. In-bounds vertices should draw correctly.
DJ: anything else to say to summarize? Or come back to this in N weeks?
JG: we can table this for now. Further investigation and prototyping needed.
KN: had a lot of good discussions so far, and we know more than when we started.

Spec Editors

DJ: CW had a call for editors, we were going to discuss again in 3 weeks.
MM: made thread, called for nominations, we’re going to nominate some people soon.
MD: will there be an opportunity for people not spec editors to contribute to e.g. build scripts, tooling, internationalization, etc.? I could devote time to tooling.
JG: yes!
DJ: go for it! Anyone can submit a pull request who’s a member of the group. Whether directly to the spec (in case editor would merge it), or etc. If you have any experience in translation, tooling, etc. you should feel free to submit pull requests.
MM: we’d love any help you’re willing to give.
MD: great.
JG: in general spec editors appreciate any help you can offer.
MD: do we have a format we’ll write the spec in?
JG: Bikeshed will be our source language. Compiled into an actual HTML document.
MD: OK.
DJ: you can write HTML but also Markdown. The spec has the IDL in it but needs the actual text and rules we’ve discussed over the past six months. Have to be actually put into the spec. Even if it were one sentence we’d decided it would be great.

PR Burndown

Bitmask naming #292 / #318
- DJ: one approved, alternative also approved.
- JG: just need to normalize on something. Least contentious way is to copy plurality of what source API does, which is to use FLAG_ prefix. I’ll update my PR with that rationale and hopefully we can come down one way or the other.
- DJ: JG will update #318 to use FLAGS_. Then decide between the two, or choose #318?
- JG: we should decide between the two.
Structures / sequences for GPUColor and friends #299 / #319
- KN: in my proposal the IDL would be less clear but we can express that it’s for floats - 3 or 4 floats, for example.
- MM: right now all 4 values are required? If you drop alpha it’s no longer a color?
- KN: to preserve that behavior I’d write text saying that this sequence must have a length of 4.
- DJ: and be numbers?
- KN: the IDL does say that.
- MM: the writeability is easier but the intent is less clear.
- RC: no strong opinion but spec should say that r is first, g is second, etc.
- MM: some people might want to write BGRA colors. If it’s an array that info is less clear.
- RC: some people like height / width / depth.
- MM: Rafael has made me lean toward keeping it the way it was.
- DJ: keeping the dictionary?
- MM: Yes.
- JG: is this out of concern that people will write colors in BGRA order?
- MM: just for clarity. Harder to understand which fields go where.
- JG: I agree in principle but RGBA is what’s well specified everywhere.
- MM: in CSS anyway the color function is named RGB / RGBA.
- KN: was going to point that out too.
- JG: think that if you’re confused about this, this is not the right API for you.
- MM: really?
- KN: if you’re trying to clear a BGRA texture that might be where it’s less clear.
- DJ: don’t think it becomes less clear in that case. Because it’s BGRA then the R goes into the B, etc. Think there’s not much decision between these. But think we should make it now.
- KN: I think we should say, this array is in RGBA order. For BGRA textures, their R and B channels would be skipped.
- JG: agree.
- DJ: any strong objections?
- (no)
- KN: DM didn’t like the sequence thing for what it’s worth.
- DJ: I think we accept it and move on for now.
- KN: I can insert a comment saying “this is RGBA” into #299.
- DJ: thought Jeff was going to write some spec text along with it. Then we’ll merge it.
- JG: so we close #299 and go with union types from #319?
- JG: the thing we need to decide is, accept both dictionaries and sequences, go with sequences, or leave things as is.
- DJ: I thought we were going to accept just sequences.
- RC: I thought we were going to accept both.
- KR: twice the number of concepts?
- JG: so we merge #319?
- ...more discussion…
- RC: will we always require 4 colors in the sequence version?
- JG: designed to be pure syntactic sugar for instantiating a dict.
- RC: what about just clearing a red texture?
- JG: still have to specify all 4 colors.
- MM: WebAssembly will likely want sequences.
- KN: don’t think it will make a difference. Have to convert their ArrayBuffer memory anyway.
- DJ: let’s just move on with the merging.
Read-only storage buffers #303
- KN: the question is whether we want to have a flag for it instead of a bunch of differently named things.
- RC: why does the API need to know whether something’s dynamic or not?
- KN: it changes how we interact with the native APIs.
- MM: if a buffer isn’t dynamic, when you put it in a bind group, its offset / length are fixed forever. Otherwise you can choose the view into the buffer.
- KN: you need to know this when you create the bind group.
- MM: in Metal we need to know to reserve some extra space to stuff these values in.
- RC: so if you tried to do dynamic offsetting it’d stop you?
- MM: yes.
- …
- DJ: it would be nice to see some text like that in the proposal.
- RC: agree.
- DJ: right now it just says what the dynamic … is and not how to validate it. Anyone against the pull request in general? If not then suggest we’ll accept it, but would still like more validation / summaries.
- RC: agree with that. Was wondering about both dynamic offsetting and validation. Sometimes we agree on things but it doesn’t make it into the IDL.
- MM: will StorageBuffer and ROStorageBuffer be the same type?
- KN: this will change the bind group layout. When compile Pipeline w/BindGroup and shader module, RO bindings would be RO in the shader as well.
- MM: do we have to augment the shader’s type system? So when you write into a buffer you have to know what type it is?
- KN: think so. GLSL already has this concept.
- MM: HLSL too.
- KN: think we’re OK on the type side then.
- MM: didn’t we come to a resolution?
- DJ: yes, we’ll take it with more wording on the validation.

Agenda for next meeting

Shading languages? Apple wanted 2 weeks, also Corentin will be out.
- MM: we didn’t ask for 2 weeks. Were slated for Safari Tech Preview in 2+ weeks, need some time after that to gather feedback.
- KN: Shader discussion was supposed to be in 4 weeks.
- DJ: OK, not next week then.

PR burndown

Indirect commands #317
Set initial frontFace value to “ccw” #313
Add slot into VertexBuffer #314
Make sampler descriptor optional #327
Mixins and splitting device #315 / #316
Expose "GPUDevice : EventTarget" to worker #329

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2019 06 17

GPU Web 2019-06-17

TL;DR

Tentative agenda

Attendance

Nooping invalid drawcalls #320

Spec Editors

PR Burndown

Agenda for next meeting

Clone this wiki locally