Minutes 2019 09 16

GPU Web 2019-09-16

Chair: Dzmitry

Scribe: Ken, Austin

Location: Google Meet

TL;DR

Please think of potential topics for the F2F and add them to the agenda!
Process for CTS
- Spec contributions should now come with a test plan, as part of the PR summary
  - Basically “notes to implementer” (and usually the spec writer should also write the tests, so “notes to self”)
  - Also: test everything that anyone considered ambiguous when reviewing the spec
- We’ll add a column to the spec tracker kanban, which is “specced but needs tests” - look at the PR for the notes.
#311 out-of-bounds clamping may require additional draw-call validation
- Resolved to return default instead of clamping access, when needed
  - Pending input from Myles, who opened the issue
#320 allow discarding out of bounds indexed draw calls
- Helpful for Metal, extremely helpful for Vulkan drivers with bad RBAB implementations.
- Resolved to allow it. We can/should tighten it later on if implementation experience really shows we don’t need it.
  - Pending input from Myles
#430 error scope callback order
- Child scopes should always resolve before their parents.
- Sibling scopes may resolve in any order, though in practice browsers are likely to resolve them in a reasonable order if there’s no reason they should be out of order.
  - Out-of-orderness shouldn’t hurt debugging too much because error scopes are not meant to be a debugging tool.
  - It’s actually hard to spec that they are in order: it causes unnecessary synchronization of things that happen on multiple queues.
    - (KN: tldr-writer’s note: I said it would cause implicit semaphores in the meeting, but I just realized I don’t think that’s true.)
#421 offsets in setVertexBuffers and setBindGroup
- setBindGroup should allow dynamic offsets to be passed in as Uint32Array. This is fine since Vulkan requires dynamic offsets to be uint32 anyway.
  - Other alternatives aren’t great, like prebaked a bindgroup+offsets object.
  - Resolution: allow it, consider requiring it.
- Both VB options: setVertexBuffer several times, or setVertexBuffers with array or Uint32Array of offsets, are theoretically optimizable.
  - We should at least allow Uint32Array offsets, perhaps require it (but that disallows 64 bit offsets).
  - Continue on GitHub.

Tentative agenda

Brainstorm for the F2F
Process for CTS contributions
Out of bounds clamping #311
Error Scope Callback Order #430
Default offsets to setVertexBuffers #421
PR Burndown
Agenda for next meeting

Attendance

Apple
- Justin Fan
Google
- Austin Eng
- Kai Ninomiya
- Ken Russell
- Shrek Shao
- Ryan Harrisson
Intel
- Yunchao He
Microsoft
- Chas Boyd
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Mehmet Oguz Derin

Brainstorm for the F2F

Agenda in progress
- Brainstorm from last week
AE: Multi-queue?
- JG: yes.
KN: swap chains. Want to add one about getting rid of swap chain & doing things with ImageBitmaps, raise that topic again. Will open a PR for that this week.
DM: so we need to port topics from the last concall to the agenda.
JG: be liberal about adding topics; we can always decide not to discuss topics.

Process for CTS contributions

KN: complicated part is relationship between CTS and spec. So far we have some tests in. Francois from Google’s Paris office has been writing a bunch. I have some work to do still on CTS but it doesn’t block writing tests. We need to start maintaining a list; as we write spec text, add to a test list “test that thing, test these corner cases”. Maintaining this sounds like a pain; don’t want to have a really complicated relationship with the spec. Best to track this outside the Github repo somehow. As a Project maybe? Every time we make a spec PR you’re supposed to add something to the Project. Whoever implements this makes sure the corner cases the spec writer came up with are covered.
KR: I think the person writing the spec text should at least start writing the tests. The person writing the spec has all of the context in their head. I would encourage writing them at the same time.
KN: makes sense. Think we should try to get things in the spec more quickly. Even if they write the text and add something to the to-do list, and then they themselves write them, that seems fine. Agree that people should probably write their own tests.
JG: Layer this? Tag on issues / PR if a piece of spec text still needs tests, etc.
KN: might work.
DM: we could have that flag. So, first step, you write a spec PR? Then you have a list in the body of the PR? You merge it, then you have list of test tasks in Project? Then you document the tests? Three-step process?
JG: I suppose. Fundamentally have to do the spec change first. Don’t know whether we need the middle step. First have the spec change and what it’ll look like. Merge PR, mark as Test-Needed. Ideally author of PR or person most crisp on it will write tests. Ideally we don’t need to enumerate all of the things we’re going to test, because the person who wrote the spec text will write the tests.
KR: Speaking from experience, sometimes when we have spec changes, if it’s more than about a month or two we’ve forgotten all the corner cases. So having someplace to write this down is essential - no opinion on where. But should write down the thoughts since the spec text can be subtle - extensions have complex interactions with core, etc.
JG: some of those interactions would be useful as non-normative (or normative) spec text. “Here are the errors where this doesn’t work”. Then tests fall out. Tests usually cover more things than in the spec. Don’t want to overburden us with process. There will be times when you don’t need to write it down. There are times we don’t do that but those are our responsibility to remember what to test.
KN: Needs to be notes to yourself to write the test -- whatever you think you will need.
JG: there will prob be requests in spec discussion for clarification. We don’t need to test those things but should.
DM: we could use the same board for spec changes and tests. Column to the right of the spec changes. Things are discussed; things merged but requiring tests; and things that have tests. Could expand on the right implementations but could be tricky.
KN: having a column in the spec project makes sense.
DM: that would require us to move project from gpuweb repo to group, then add the column there.
KN: what would we have to move?
DM: would need to move the spec up to the organization level.

Out-of-bounds clamping may require additional draw-call validation #311

DM: no comments in 3 months.
KN: I think there were comments on this topic. In another thread? Yes. On #320.
(... discussion on #320, below)
KN: OK. That’s #320. We have #311 as well. Little more complicated. Myles opened this. To do out-of-bounds clamping you need at least enough data to clamp. Having not even 1-length data is a problem.
JF: Myles called this out as being pretty important. Might be a good idea to send him email with any conclusions we come to today. He’s active on email though at TPAC.
KN: #320 is pretty orthogonal I think. Still no resolution on #311.
DM: the size of the Foo struct shouldn’t matter here. Should be clamping address into the buffer, not arbitrary indices shader is using.
KN: is that possible though? If you’re using clamping, you have to have a single read always be able to succeed. Maybe the read is not that big at least in SPIR-V.
DM: won’t it always be the word? 32 bits?
KN: I think so. Thinking about Metal impl, and how this is converted to MSL, I have a hard time thinking about how you’d do that clamping. You’d translate the struct read into a struct read in Metal, and couldn’t do word-by-word clamping. Struct read would have to become word-by-word reads.
DM: isn’t this an artificial problem? Can always declare a local Foo instance, fill with zeros, and return it.
KN: I’m not sure. Not sure why clamping is necessary here.
JG: #311 is talking about - you’ve defined out-of-bounds elsewhere, in the robust buffer access part of the spec.
KN: Robin said, you could … (return a temporary / default?)
KN: do we have more concerns? Or should we just return a default value?
DM: unless they can’t do that, that seems like the reasonable path.
KN: I’ll record this and we’ll follow up with them.

Allow discarding out of bounds indexed draw calls #320

DM: I’m lost at where we are. Eliminate vertex buffers? Sophisticated tracking Google proposes? Others?
KN: on impls where it’s necessary, i.e., Metal, have to transform all vertex input into vertex pulling. Impl has to do it.
DM: b/c of lack of robust buffer access?
KN: yes. it’s hard. Not sure how hard, but pretty hard.
DM: at same time it won’t be the same impl for all backends. For Metal it’s easier to add a few buffers being bound, where in Vulkan you have to add descriptor set and have them bound there. Metal impl would be specific to Metal.
KN: that’s one of the reasons this came up again. To impl robust buffer access on Metal, can transform everything into vertex pulling. Doesn’t break things. On Vulkan where you need descriptor sets it has hidden costs. By reopening this and saying if draw call is invalid you can skip it, opens you up to other implementations. Like, do bounds checks on indices and cancel draw call. Seems it makes sense for now to allow that in the spec, and later if we find we’re not using it, take it out.
RC: allow what?
KN: allow skipping invalid draw calls.
RC: so make validation optional? Use robust buffer access on some platforms, if it exists
KN: yes. Rules as previously agreed upon were: everything behaves as though there’s robust buffer access. This one goes beyond that and says the whole draw call might be cancelled. Doesn’t say the draw call becomes undefined, but that you can skip it. That lets us do the same kinds of things we do in WebGL today: check the bounds of the index array and error out.
RC: believe for WebGL impls can also use robust buffer access. Optional there as well.
KN: yes. We’re talking about allowing both.
RC: I’d be in favor of that. Use robust buffer access where available, but if not available, maybe skip draw call.
JG: Sort of fine with that. I have the feeling we’ll eventually not use the escape hatch. All implementations will drift toward the more performant version. Doing vertex pulling on Metal seems like it would be more performant than checking the bounds in any other way.
KN: on Metal I agree. Think we’ll likely converge on doing vertex pulling. Not sure how long it’ll take. Will take some time to implement. Most concerned with Vulkan or D3D12 impl with untrusted robust buffer access behavior. Becomes really complicated to transform things this way on these APIs; cache descriptor sets on the fly, etc., at draw call time to make it work.
DM: why would we want to use vertex pulling on Vk/D3D12?
JG: where we don’t trust robust buffer access.
DM: Can we have investigation / confirmation about the way robust buffer access works for vertex buffers, specifically, on Vulkan and D3D12. Concerned whether it works as designed. robust buffer access says you get a value in buffer or 0 or 1. Would it be the case in D3D12 if you sample outside a buffer and get something else?
KN: are you talking about the impls or the specs?
DM: investigation into how the native APIs treat that.
RC: for vertex buffers D3D12 does have robust buffer access. If you fetch outside the vertex buffer you get 0. There are conformance tests for drivers that make sure it works correctly.
DM: does it respect the whole range in the buffer or just the offset? In Metal you just specify the offset but not the range.
RC: things in the root constant buffer aren’t checked. But everything else is.
DM: vertex buffers are neither root constants nor descriptors.
RC: yes, they’re separate. Runtime has enough info at bind time to know;
DP: at bind time it isn’t given a max size, just an offset. (CORRECTION -- SEE BELOW) Will clamp you to reading within the resource. If you have a buffer and using it as a vertex buffer, no clamping on that.
DM: that’s my concern. You’d get a value that’s not in the buffer as the WebGPU user sees it.
KN: that’s a problem. In order to use any part of a resource you have to init the whole resource.
DM: not just init. User may place multiple things in the same buffer.
KN: we could loosen requirements of robust buffer access to give you any data that you own. Not great but at least possible. In impls, wouldn’t be able to create giant resource and sub-allocate it without clearing it. Even if spec is changed you’d still have this initialization problem.
DP: this is a security thing. I think the way buffers in D3D12, app can’t read stuff it won’t be able to read, but can still read stuff it put there.
DM: conclusion: we need to init that uninit space we use in buffers we sub-allocate from. We need to loosen definition of robust buffer access if we want to use it in D3D12. And maybe we need to look more into investigation.
DP: I misread the page I was reading. You do spec a size of the buffer you’re reading from, and it will clamp to that. You spec offset, size, stride.
KN: ok, interesting. In Vulkan it binds to the buffer and not the entire resource. Spec-wise it seems fine. The concern is still with poor implementations of robust buffer access.
DM: regarding discarding the whole draw call: think that apps may ship with just a few indices out of range, and they’ll work on some browsers but break on others.
KN: right.
KR: Pretty sure I mentioned this on previous calls, but this hasn’t been an issue in WebGL. These apps are buggy even if they have only a few indices out of range, and they can’t expect it to work. Someone will eventually try it out on a platform where it doesn’t work and it will get fixed. Opening up the possibility [of no-opping] prevents potentially traumatic transformations in the browser.
JG: fine with that.
AE: maybe there can be a debug extension that throws out the bad draw calls on all backends.
JG: if the user app uses vertex pulling because they don’t want to use our vertex input, they don’t have this problem.
DM: I agree. I find Ken’s reasoning to be convincing. Don’t think we should resolve the point because we’re missing a few imp’t people here, but think we can restart the conversation based on our current point.
(... back to #311)

Error Scope Callback Order #430

DM: seems non-controversial. Missing inputs from non-Google people. I commented on it 40 minutes ago.
AE: basically question is, since error scopes are defined to only resolve after ops inside them are completed, if you do something like: push error scope, queue submit, pop error scope, push error scope, do nothing, pop -- can the second one resolve first, because the first one is waiting for the queue submit to complete?
DM: parent/child relationship?
AE: sibling.
DM: trying to understand argument for not restricting the order. App can respond more quickly. Are you talking about completion of operations, or errors returning sooner?
AE: either. Say you want to do fallible allocation. Allocate really big buffer, then pop it. Push error scope, do stuff, pop it. Want to check no validation errors. Those can happen in an arbitrary order.
DM: downside is, if you have a local piece of code: push / pop / check errors, if order is undefined, they can get errors from anywhere else. You can’t isolate pieces of code checking for errors.
JG: you’d need your own happensBefore / happensAfter logic because impl wouldn’t do it for you.
RC: I can see this potentially useful for doing long shader compiles. You want to compile 4-5 long shaders / pipeline states. Impl can farm out to multiple threads. Smallest could come back first, etc. Independent of order submitted.
KN: agree. I think shader compile is most imp’t use case.
RC: when I read this I was confused until I remembered we were using errorScope to determine when things began / ended.
KN: it’s confusing for everyone.
DM: RC you’re arguing for defined order?
RC: having it undefined could be useful. Could solve pipeline compilation thing another way. Shaders can take a long time to compile. Not just talking about fxc, HLSL, etc. Even given things at level of SPIR-V can take a while.
KN: register allocation is hard.
DM: this is absolutely the thing we need to be able to control and parallelize. I didn’t realize we were controlling that with ErrorScope.
AE: not really controlling it. You can wrap it in a None ErrorScope to see when it completes.
KN: if you don’t check completion then you stall at some point.
DM: can you catch errors happening in a specific scope?
KN: yes.
DM: so you can schedule pipeline creation, wait for messages on specific scope for it.
AE: yes.
DM: OK. I’m fine with that then.
KN: alternative: tell apps they need to do their cheap ones (pipelines) first. Not that bad. Should be consistent. But no real reason for this.
RC: seems kind of weird. Things dependent on each other; general validation, “you had a validation error”. Besides pipeline creation, what would be out of order?
KN: probably just this. Maybe with allocations. Unlikely.
AE: with multi-queue, things could happen in different orders.
KN: we don’t want the ErrorScopes to make implicit semaphores between queues.
DM: makes sense.
KN: as long as, in practice, things are usually in order unless there’s a good reason for them not to be, it’ll be fine. As long as the user isn’t getting errors back in random orders. Doesn’t need to be strictly specified.
DM: we’ll keep room for feedback between MVP and V1 on this one. Seems OK to go with the suggestion on this one.
RC: good idea for developers to tag their promise callbacks.
KN: at same time, these aren’t designed for debugging. Might use these if you’re trying to detect errors on users’ machines. Hopefully relying on browser’s debugging tools.
DM: for debugging, you’d have synchronous errors?
KN: thinking a checkbox for WebGPU errors, like WebGL errors today. Would need checkboxes “WebGPU error caught by scope”, “not caught”. Think more useful.
RC: agree.
DM: sounds like no objections. Let’s move on.

Default offsets to setVertexBuffers #421

KN: AE was talking about this and SetBindGroup.
AE: SetBindGroup with dynamic offsets -> offsets should be 32-bit. Vulkan restriction. Resolves all the concerns about converting arrays to 64-bit arrays being slow.
KN: makes sense. Another option with bind groups: could have either baked object or dictionary, encapsulates that state in more well-laid-out way. BindGroup, some structure that defines dynamic offsets. If you had a baked object or some sort of interface object, create this thing for this bind group, make SetDynamicOffset calls. Make it easier to store that thing off somewhere. Easier for us to get rid of overhead.
AE: couldn’t you also have a constructor which wraps a dictionary mapping binding to offset?
KN: would also work. But Uint32Array is probably way to go.
DM: there’s always room for making interface <-> dictionary if people find it useful. Specifying offsets from user’s standpoint is not something that needs solving. Fairly straightforward. We don’t expect many offsets to be used. Maybe 1 or 2. Is there a question about how SetVertexBuffers should work?
KN: SetVertexBuffers currently takes sequence of buffers and sequence of 64-bit integer offsets. This is more straightforward than bind groups but still hard to solve. Can have 2 sequences, say they have to match each other. Or, have some dictionary somewhere, buffer + size pair. This is allocating during encoding time which is no good. Or have just single SetVertexBuffer. Kind of makes sense to me.
DM: I’d like that too.
KN: don’t know if there were other alternatives.
DM: there’s an unresolved thread about the overhead. Jeff asked questions from Ken about multi-draw in WebGL.
AE: I commented. It’s faster in multi-draw because we’re using Uint32Arrays. We just pass the pointer into C++. If you use a normal array it does type conversion which is slow if it’s a normal JS array.
KN: array is really an object, have to do a bunch of calls to extract from it.
JG: there are optimizations you could do. The one that was faster just did the array accesses in JavaScript. The array dereferences still happen.
AE: it’s not the array dereferences. When you pass the JS array to the thing, the C++ code will allocate some vector, it’ll do type conversions for each one to go into the vector.
KN: b/c JS array is an object and we access it at a fundamental level in the C++ code, we fetch values out of it very generically. It’s costly.
JG: my thought is: that’s an optimization opportunity. Under the hood it’s definitely a sequential array.
KN: internally in the engine it’s true, but exposing that out through the bindigns, which go through the public V8 interface, it’s tough.
KR: The path that’s expensive is going from C++ into the VM. There is no opportunity to optimize. The many calls back and forth are expensive. We should spec the API to avoid (in hot paths) JS allocation, array unpacking, etc. Would make sense to allow offsets to be passed as Uint32Array. For setVertexBuffer vs setVertexBuffers, it’s possible the latter could be optimized.
AE: think you could but that’s the big project we haven’t done.
KN: I think both (setVertexBuffer or setVertexBuffers) could be theoretically optimized pretty well.
DM: doesn’t sound like we came to a conclusion about setVertexBuffer vs. Buffers.
KN: think we should keep what we have but also allow Uint32Array.
KR: I think you should mandate Uint32Array.
AE: you can have offsets larger than 32-bit.
DM: we’ll continue offline from there.

Agenda for next meeting

DM: think about what to talk about at F2F!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2019 09 16

GPU Web 2019-09-16

TL;DR

Tentative agenda

Attendance

Brainstorm for the F2F

Process for CTS contributions

Out-of-bounds clamping may require additional draw-call validation #311

Allow discarding out of bounds indexed draw calls #320

Error Scope Callback Order #430

Default offsets to setVertexBuffers #421

Agenda for next meeting

Clone this wiki locally