Minutes 2018 03 14

GPU Web 2018-03-14

Chair: Corentin

Scribe: Ken

Location: Google Hangout

Minutes from last meeting

TL;DR

No meeting during GDC
Advisory mailing list was created [email protected]
Sharing our investigations with Vulkan Portability
- Vulkan portability will have an opensource repo soon, would be nice to share investigation. Concern that splitting investigation and design will be a burden on WebGPU.
- Vulkan Portability can look at our investigation but we don’t change our workflow. Revisit this later.
WebWorkers
- For MVP have command buffers be transferable so that they can be recorded on multiple threads and aggregated on a thread to sent to a queue.
- For later consider secondary command buffer / bundles and interaction with Metal’s ParallelRenderCommandEncoder.
- To record command buffers in parallel, at least immutable objects like buffer views and texture views need to be shareable.
- Want to figure out the nuances of creating objects on multiple workers
- RAF and commit() in workers written about in OffscreenCanvasAnimation.md. Stuck in TAG review, please help!
Memory barriers
- Vulkan barriers are two powerful, D3D12 barriers seems to be the upper bound for WebGPU in term of explicitness
- D3D11 on D3D12 hasn’t seen explicit barriers being a bottleneck yet.

Tentative agenda

WebWorkers
(stretch) memory barriers
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Myles C. Maxfield
Google
- Corentin Wallez
- James Darpinian
- Ken Russell
Microsoft
- Chas Boyd
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
- Markus Siglreithmaier
Yandex
- Kirill Dmitrenko
Tyler Larson

CLA for Working Group

DJ: Talked to the lawyers, Microsoft is figuring out the process for things. We should share the CLA with the rest of the group. Basically BSD 3 + Apache2 as a CLA. Some formality to share it with the W3C so they can use it with other groups.
JG: Don’t think there will be objections but would like to see it.

Sharing investigations with the Vulkan Portability TSG

DM: How can we share the investigations we have made that aren’t biased towards any API with other people like Vulkan portability. Not clear where it could live.
DJ: if it’s hosted in the GPU group then it’s completely visible and open to everyone in VkPortability group without restrictions. Vice versa is not the case. Would have to join an Adopters’ program (is there one) for the Vulkan Portability group.
DJ: should check with their lawyers about that.
DJ: would be happy to have all the investigations in WebGPU and we should share those investigations with others (and they can contribute, too)
JG: easy to share investigations and statements of facts. When we start sharing designs its’ tricky.
DM: you’re happy keeping the investigations in the gpuweb repo only? And not happy with them being elsewhere?
CW: we already have a forum where we’re doing the investigations.
DM: not the optimal shape. Don’t think issues are the best way to record our investigations.
CW: regarding the forums: you want a better split between pure statements of facts, and designs which might contain IP?
DM: don’t think any of our designs will contain IP. Not going into details of VK driver implementations.
JG: it is “Design” to figure out how to multiplex across different constraints. Investigations like “we can’t do this here / on this platform” are fine. Sounds like we’re OK with investigations happening in the gpuweb repo. Separate topic: how to coalesce these so they’re easier to read instead of being spread across issues.
DJ: agree. We have plenty of options, like a wiki, etc.
DM: yes, there are two sides: (1) the shape of the API and (2) where it’s hosted. The shape, can be markdown files and make pull requests against them. Can’t have pull requests against issues.
JG: that works well for statements of fact. Once we merge the pull request in, we’re saying “this is how it is”.
CW: the Markdown file does contain “Design”.
CW: there are plenty of issues which have statements of fact and “designs”. If you want to port these to Markdown please feel free. Not sure how much we should cater to VkPortability. Nice to have interactions with groups in the same space, but we have very different goals. The investigations we’re doing are not that difficult to reproduce or extract. Not sure what we can get from them (VkPortability). Looking at those investigations, there are DM’s investigations, which are good, and some others, which are not that useful to this WG.
DM: even the stuff I wrote, I had to duplicate because VkPortability is closed and gpuweb is open. Would like to have a single source. FYI, VkPortability repo will also be open and not require any Adopters’ program.
DJ: confused. You’re saying, you have an issue duplicating your work because you have to duplicate between gpuweb and VkPortability? Is there something you can say in VkPortability that you can’t say here?
DM: there’s no restriction on what’s been said in VkPortability, saying it here in gpuweb. It’s just an artificial barrier between the groups.
DJ: my feeling: you should contribute to this group, and VkPortability can use the work. Only an issue if you do something in VkPortability that you can’t contribute to this group – don’t know what that would be.
CW: would be a burden to have separate documents for investigations and design. Better to be able to refer to each other one.
MM: some members here are not members of other Khronos groups than WebGL. The investigations that lead to our decisions definitely need to be present in this community group.
JG: the question is, is there a place where everyone can access and pull investigations. Maybe it’s here. Maybe we just use this group.
CW: this is the common place for us.
MM: why are we worried about the parts another group might take from us?
CW: making it easier to let them understand which parts are investigation and which parts are design.
DM: don’t find it easy. A lot of small differences, details. Things that neither NXT or gfx-rs have implemented. If you consider the depth of the complexity, there will be a lot of shared knowledge both groups could contribute to. A big part of what I contributed to gpuweb was meant to be shared in the first place. The reason I contributed to gpuweb was that it was open at the time. That’s not going to be the only open alternative any more when VkPortability opens up.
DM: I did gather the responses from this group so I realize there is no desire to keep it anywhere else than gpuweb.
CW: if you find something easy for us to work with, we’re happy to try to adapt. But it seems difficult enough to split investigations from designs that I think we probably don’t want to do that in this group.

Advisory mailing list

DJ: got an email list set up: [email protected] . This is a place where we could invite SW / HW vendors to contribute, keeping them separate from other investigations / designs. Anyone can join. There’s a link on the side of the front page of the github repo.
CW: thanks, will point our vendor partners at it.

WebWorkers

CW: hopefully everyone agrees that we want to support multithreaded access to the WebGPU access from JavaScript. Want to figure out the minimum set of things we want to support in parallel.
CW: hope everyone agrees: we want to record command buffers in parallel.
MM: yes, if possible.
MM: two conceptual levels. 1) allowing two threads to write into a single command buffer. 2) shipping command buffers between threads.
CW: if you look at all the graphics APIs, it looks like for command buffer recording, e.g. Vulkan doesn’t support recording multiple command buffers simultaneously into the same command pool, because the command pool isn’t thread safe.
MM: Metal has a separate object, ParallelCommandEncoder. Vends objects that can be passed to other threads. Couple of ways to add parallelism.
CW: when they record to the same CommandEncoder: is the order of things well defined?
MM: order is well defined. All of one child’s CommandEncoder is executed by the GPU before all of another child’s CommandEncoder.
CW: sort of like “#include” in a metal RenderEncoder?
MM: pretty much yes.
MM: API is: CreateSubCommandEncoder. Can receive recording calls in whatever order. Execute determines which runs first.
DM: this is like recording simultaneous command buffers and having a lock upon submit()?
CW: in Metal you can only encode entire RenderPasses at once. This lets you record different parts of a RenderPass on different threads.
MM: right.
CW: that’s an interesting problem. In my mental model was thinking, you can record any command buffer in parallel, and then submit it to the queue. At that point, because of Metal’s limitations, can’t make command buffer inclusion as easily, because we wouldn’t know the order to put it in the ParallelCommandEncoder.
MM: there’s also passing CommandBuffers over. There are two models in Metal.
JG: what’s the second model again?
MM: one model: you have two CommandBuffers. One thread owns one, writes into it, and then the same for the other thread.
MM: Also, ParallelCommandEncoder. Two threads can write into it and their order is defined.
CW: sounds like secondary command buffers / bundles in D3D12.
DM: does seem like secondary command buffers could emulate this Metal behavior.
MM: maybe don’t include both models in the MVP, to start.
CW: in the MVP, think model (1) is the best: create command buffers in parallel, but command buffers have to contain compute/blit/render pass. Can not do CBs in parallel inside the passes.
MM: sounds like a good start.
JG: at the API level if we make CBs Transferable, then we can build them on a worker.
MM: in what we’re designing, there’s only one thing that can be sent between threads, which is CommandBuffer.
CW: yes, but you need access to other resources like Pipelines, Textures, etc. Maybe you share your base vertex data.
CW: in these APIs most objects are immutable. Seems we could use this to our advantage.
KR: No way to share JS objects between isolates.
JG: if you created a Buffer, and the “WebGPU” object was an immutable handle to that buffer, if you cloned that handle, you could talk about the buffer from multiple threads.
CW: that sounds like it would work. Only issue is if you have non-const methods on these objects. For example, you have a buffer in one worker, and map it on another worker.
JG: it’s not shared mutable JS state.
MM: maybe too handwavy, but: if one worker wants to use any of these resources then they’d have to have created them. Also need a Device?
JG: would need a destination. Not just the Inputs, but also the outputs.
CW: the number of objects that are mutated is tiny: Buffers, Textures. But BufferView, TextureView never change. In MVP, BufferView and TextureView could be cloneable, so you could send them to workers to modify them.
JG / CW: you can map the buffer.
JG: one thread would succeed, one would reject the mapping attempt.
CW: what happens when you submit GPU work and the buffer’s mapped?
JG: it’s mapped and you can’t read or write to it from the API side.
CW: concerned because of the validation code for transitions in NXT.
CW: but BufferViews and TextureViews are safe. (CBVs and SRVs.)
RC: so these are safe even though they point to data that’s being changed?
CW: yes, the queue essentially synchronizes everything sent to the GPU.
RC: seems fine if we allow immutable things to be shared. But once we allow mutable things like Buffers to be shared, seems problematic.
CW: you concerned about the contents of a Buffer being mutable, or the Buffer itself?
RC: about mutability. Buffer submitted to queue, then another thread attempts to map.
JG: this would be rejected. JS trying to map it, vs. graphics API actually use it. Have to ensure you can’t map it while it’s enqueued for use, and can’t enqueue it while it’s mapped.
RC: doesn’t that imply a lot of locks, reducing the potential multithreading?
JG: you do want to try to minimize locks. As long as you don’t contend on a resource, there’s no big problem.
CW: that leads to another question. Do we want to allow creation of resources on workers?
JG: why would we not want to?
CW: takes a global lock on the device.
MM: the more I look at this: assume we handle the mapping case. In that situation, you could share all your resources across all the different threads. Update resource, submit to queue, everything’s great. Either submitting to queue requires sending the CB between threads, or making the queue itself thread safe. Maybe objects are created on main thread and then shared.
CW: think this would work and be safe. Sure that people would complain about not being able to create resources on a worker.
MM: solution for web developers would be to PostTask to main thread and get the result back.
JG: it’s a lot more fragile than saying “create me a buffer”.
MM: do have to worry about lots of threads contending.
KR: Use cases where we can run the entire main loop of the app runs in WASM in a worker and doesn’t return to the browser. With WebGL in a worker, everything can be done in a Worker and commit used to SwapBuffers.
MM: very concerning that the browser would never get any CPU time in this scenario.
JG: you’d block on something else, like commit(). You just never yield out of your call stack.
JG: seems it would be premature to restrict object creation to the main thread. Should be rare, seems OK if it grabs a lock.
DM: up to the implementation on how to lock. We don’t have to implement it by locking the whole device structure. Maybe if one thread makes a Buffer and another a Framebuffer, maybe they don’t contend. Would like to be able to create objects from different threads.
RC: OK with creating things from multiple threads. But mutable things I think should be owned by one thread at a time. The queue itself, if we share it between threads, is a place where we should take a lock. Only one thread at a time can change anything.
JG: think (immutable) views get us a lot of that. Not actually binding or manipulating the texture object.
CW: but you can’t create views from views. We could allow that, but not something being done right now. Sure that there will be tricky things.
MM: Rafael’s right: this object is mutable right now so it can’t be shared. Uploads/downloads: atomic switch “this object is owned by the GPU” vs. “by the CPU” gives you a notification that this is a mutable resource now, and can’t be shared. Think the current approach handles this.
CW: to summarize:
- Want to be able to create commands in parallel.
- Follow Metal’s first option for doing parallelism – have to have entire compute, render, etc. pass.
- Ideally share things between threads as views.
- Want to figure out the nuances of object creation in workers.
JG: sounds like the right direction.
RC: so are we going to allow mapping of the same resource by multiple threads at the same time?
JG: No no.
MM: think this is important because of view semantics. If the object’s owned by the GPU then all accesses are owned by the queue. Don’t think we’ve solved the buffer mapping question yet.
CW: correct.
RC: what’s the status of various browsers of RequestAnimationFrame in workers?
KR: talked to [email protected] about it, he’s written an OffscreenCanvasAnimation.md here:
- https://github.com/junov/OffscreenCanvasAnimation/blob/master/OffscreenCanvasAnimation.md
KR: there was some pushback from the W3C TAG on the offscreen canvas proposal on commit() from workers:
- https://github.com/w3ctag/design-reviews/issues/141
KR: Need to talk to the TAG that these use cases are mandatory for porting large native applications to the Web. Difficult discussion, could use involvement of more people. Jeff Gilbert wrote a couple of examples a while back:
- https://github.com/jdashg/misc/tree/master/webgl-from-worker
- https://github.com/jdashg/misc/blob/master/webgl-one-to-many/index.html
KR: Justin’s proposal allows multiple implementations of requestAnimationFrame. There is also a proposal to get a promise / block to know when to produce a new frame and commit(). Should work more on this before we can figure out swapchains in WebGPU. WebGL could be a testbed for these investigations.
DJ: agree. Also enough of us have offscreen WebGL contexts working that we can do prototypes.
RC: my reason for asking was just to get the state of the union. So to summarize, there’s stuff Justin’s implemented that is still in discussion with the W3C TAG, so isn’t finalized?
KR: Yes.
RC: so the thing under discussion is rAF in workers, or something else?
KR: The TAG is concerned about the complexity of the design. Concerned about the commit() API. They were thinking of centralizing the RAF API and it would mandate to return the control to the browser. Can’t work with a lot of game engines out there.
JG: I’m a big fan of early commit(), which we’ve talked about in the WebGL WG before.
CW: good to know that if we want to work on this, we should work with Justin Novosad.
MM: would like Ken to point to resources / links.

Memory barriers

CW: MM: any progress on implicit memory barriers?
MM: not yet.
CW: feedback from Unity/Roblox: Vulkan barriers are too powerful. This is also feedback we got from HW vendors. Apps get Vulkan barriers wrong.
CW: was going to suggest, the D3D12 model of barriers is the best – the most explicit gpuweb can get, without subjecting developers to the same pain they have in vulkan. And developers seem to like D3D12 barriers – or at least live with them.
MM: “all developers”?
RC: did these HW vendors say they got the D3D12 barriers right?
CW: they said that D3D12 barriers encompass most of what you can do on their hardware.
MM: so the discussion is now D3D12 style barriers compared to Metal style (no barriers)?
CW: that’s what I’m proposing.
DM: also, whether we have a start/end to the barrier.
CW: yes, D3D12 requires app to specify where resource is coming from as well as where it’s going to. NXT for example doesn’t have that. Difference between “from/to” to “no barriers” – a spectrum in the design space.
JG: wonder how much of this is how well Vulkan barriers are validated. If they’re not validated it’ll be easy to mess them up. We have to validate whatever we do so in our system they’ll be validated. In the validation we have to do anyways, is it more viable to do Vulkan style barriers? Or make them tractable?
CW: validation would prevent apps from shipping with wrong barriers. The HW vendors said that the game vendors decided to ignore errors. “It works on NVIDIA / AMD so ship it”. Validation would help there.
JG: for WebGL, we get an order of magnitude error responses than original GL did. Tell you that you did something wrong and usually what it is.
MM: better than reporting something wrong is to prevent it from happening in the first place.
DM: benefit of being explicit: Can start the barrier earlier. Give driver more freedom to transition resource earlier. Doing it implicitly requires command buffer encoding to be deferred. Adds latency and late validation.
MM: this is an argument made before. Would help to understand this argument if we had perf numbers between explicit vs. implicit barriers. Haven’t done any analysis.
JG: hard when there isn’t hardware that supports both.
MM: no, a system where the app author puts in their own barriers, and one system where the backend does this.
JG: so this is like if we implemented OpenGL on top of Vulkan, etc.
RC: someone writes the code to do the barriers themselves. How much better would the perf be if a developer on top of our API, knowing everything, could schedule their barriers themselves?
MM: yes that’s what i’m asking for.
CB: we have an emulation layer D3D11 -> D3D12, and perf diffs are due to other aspects of the API, not barriers.
MM: would be interesting if you were able to conclude that and produce results to the group.
CB: can do. Might take a month.
RC: if we do handle barriers, we have to eliminate all areas in e.g. D3D12 where we can’t automatically do transitions, etc. Would have to add things to the API so that it can figure out to transition.
CW: we probably won’t have bindless in version 1.0 because of Metal 1.0 and Vulkan 1.0, so it’ll be easier to enumerate everything needing transitions.
MM: agree.

Agenda for next meeting

Google probably won’t attend next meeting.
DJ: was going to take next week off but could chair.
MM: we should skip next week.
CW: we have two weeks to freshen buffer mapping proposals.
JG: also running Web IDL through IDL parser in Mozilla. Will have updates to the IDL to make it run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2018 03 14

GPU Web 2018-03-14

Minutes from last meeting

TL;DR

Tentative agenda

Attendance

CLA for Working Group

Sharing investigations with the Vulkan Portability TSG

Advisory mailing list

WebWorkers

Memory barriers

Agenda for next meeting

Clone this wiki locally