Minutes 2018 08 01

GPU Web 2018-08-01

Chair: Corentin

Scribe: Ken

Location: Google Meet

Minutes from last meeting

TL;DR

Some of us will try to have a breakout session about WebGPU at the TPAC plenary day
Passes discussion issue 64:
- Proposal was extended to have render and compute inherit from programmable
- Concern and discussion around adding API complexity without demonstrable gains
- Concern and discussion around discarding the concept of commandbuffer common to all APIs
- Metal command buffers cannot be reused but can be recorded/submitted in any order.
- Need to figure out the reusability story.
Render target aliasing:
- Defer post-MVP

Tentative agenda

Passes
Render target aliasing
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Myles C. Maxfield
- Thomas Denney
Google
- Corentin Wallez
- Dan Sinclair
- Kai Ninomiya
- Ken Russell
Intel
- Yunchao He
Microsoft
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Yandex
- Kirill Dmitrenko

Administrative Items

Meeting times

CW: We’ll send a “doodle” that allows us to vote on times.
DM: Can we propose an option that has alternate times week to week, to match Ken’s schedule?
CW: Or a different day?
CW: Anyway, I’ll send a doodle. We won’t wait for Mountain View Google to join.

WebGPU at TPAC

CW: Michael Smith has asked us to have a session on the plenary day of TPAC. This seems like a good idea. Myles and I will definitely attend.
CW: The goal is to show other W3C members what is happening in WebGPU.

License Agreement

CW: My understanding is that we will have some more information from Intel soon. We could technically go ahead now, but I think we should wait.
DJ: if Intel will give us more data in the next few weeks then let’s just wait

Passes

Issue 64 https://github.com/gpuweb/gpuweb/issues/64
DM: tried to come up with a precise position
- Instead of having command buffers in passes, have just two types of passes, one for rendering and one for everything else
- KN had concern about secondary command buffers
- Advantage: base class for compute and render, deal with resource group bindings
- Extending semantics of compute pass with different sync; eg. your UAVs won’t be synced
- At cost of API complexity.
- I think it’s worth it. Like the extended proposal. Hope other parties made themselves familiar with Issue 64.
RC: what sync issues were you planning on improving in the future with this?
DM: sync issues are the bug suggested earlier. Programmer being able to tell us that UAV syncs aren’t needed inside compute pass. Future room for expansion if we want to provide this kind of flag.
CW: were concerns about adding complexity to API not needed yet. Like original point that D3D API allows compute / copies in the same spot in the command buffer. Even in D3D right now you can do render+compute at same time.
RC: everyone agrees we need them for render passes. But unless demonstrable gain, don’t think it’s worth the API complexity at this point. Esp since if we’re going to split things into passes, that also restricts the API.
CW: reducing complexity always good. Can see having different types of passes as different types of objects increasing complexity, but at same time simplifies API because inside render passes, whatever call you do is valid (being allowed to call that function). Gives you more type safety. Makes it simpler for developers as well. Having passes like this – significantly different from any other native API right now.
DM: in Metal it’s not that much different. A bunch of passes together.
CW: in Metal you have a command buffer and “borrow” the CB into an encoder. Then encode into the CB, have a “view” like an encoder.
MM: there’s one encoder active at a given time. Can get the “active” encoder from a CB.
CW: Advantage of this in Metal is that the encoder can write directly to gpu-visible memory, avoiding extra copies, extra allocators
DM: instead of allocator per pass, could give it the same allocator and it would be equal. Don’t think CB in Metal gives you much. Can split it into multiple passes, run on multiple areas of hardware, largely independent. Not an unsound concept but not strictly required in Metal.
CW: in Metal, with subsequent encoders, you know which order they go. In WebGPU you summon passes out of thin air so you don’t know the order / command buffer / queue they’ll go in.
DM: aren’t we committing our passes for execution?
CW: we are but at the time you record the pass you don’t know where it’s going to go. In Metal you know where it’s going to go. (never mind)
MM: trying to understand. Sounds like 2 pieces? Work pass / queue pass, and no command buffers, only passes?
CW: yes. So in a way you have 2 programmable passes (compute, render) and one fixed-function (copy). All can be created from thin air. What is submitted to a queue is not a command buffer, but instead a sequence of passes. Nice because it gets multithreaded recording for free, but open questions like how do we do secondary command buffers, etc.
JG: seems weird to me to take CBs – all APIs have this concept – and discard it.
DM: not clear what they give us. From familarity concept, it is weird to discard them. But from ground-up, not clear why we need them
CW: in Vulkan, when creating CB, specify queue it goes to? I think. If you create passes out of thin air there’s no such notion.
KR: There was a concern a while ago that we wanted to record commands as WebGPU called encoding commands (latency for VR ...) Is it a concern if we summon passes out of thin air?
DM: same problem with command buffers if we had them. If we had multiple queues then user would still have to specify one. Impl on single queue, Vulkan can put different passes into different CBs. Passes in Vulkan fairly light, also don’t expect thousands of passes.
JG: CBs are the node in the execution graph. So discarding CBs from the API, execution order falls to passes. Look at prior art.
CW: dependency graph in Vk spec is a fancy way to say that across command buffers, you have to do pipeline barriers correctly.
JG: CB submission is dependent on semaphores and their completion.
CW: in D3D I think you don’t have that option
RC: is there a use case for CBs where someone might want to record a CB and submit it multiple times across multiple frames?
CW: you can submit the same pass multiple times
RC: so submit 10 passes?
CW: yes, CBs are a bundling abstraction
MM: concern, in Metal there is per-command buffer work. Need to do experiments to understand whether creating lots of them behind the scenes will add unacceptable overhead. Need to be able to come back to this later.
CW: question about Metal command buffers, can you reuse them? MM: Don’t know.
- No; “After a command buffer has been committed for execution, the only valid operations on the command buffer are to wait for it to be scheduled or completed (using synchronous calls or handler blocks) and to check the status of the command buffer execution.”
CW: Can a command buffer work with more than one queue? MM: Don’t know but would guess they are queue-specific.
DM: as far as I know until you enquee the CB it’s not linked to the queue in any way. Can encode multiple CBs in parallel. Good concern that MM raised about per-CB overhead. Haven’t seen it in Metal traces except in callbacks.
KN: don’t have to record a CB per pass. Can coalesce passes.
CW: if we encode at submit time. But for encoding at the same time might not be possible.
KN: if you want to record earlier (record the backend CB before Queue.submit() / on the fly while the CB is being built), this design does not quite work for similar reasons
MM: Metal public docs say – can’t reschedule a command buffer after it’s submitted.
CW: so you can record multiple ones and hold on to them but can’t reuse them. Thanks.
CW: lot to think about. Think about until next week?
MM: yes, want to do more research.
DM: might need to expand scope of this question to the reusability of cmdbufs. If we provide secondary CBs / bundles, maybe passes should be one-time recorded. If you can reuse pieces then no need to reuse CBs.
JG: though probably faster to do so. Reusing existing prevalidated work is faster than reissuing.
CW: DM was saying that if we have a concept of work we can reuse that’s enough.
CW: imp’t to get this right for the MVP. Core concept to the API. Can still change after MVP, but would be good to have it for then.

Render target aliasing

https://github.com/gpuweb/gpuweb/issues/63
CW: have people looked at this since the last meeting?
(...silence…)
CW: on our side we think RT aliasing is not imp’t enough to care about for the MVP. Mostly orthogonal to the rest of the API. Probably worth for V1. Can defer.
- In terms of solutions, there’s the aliased heap idea (put a comment about on Issue 63). An aliased heap can have only one resource, or another one, etc. Not “sets”.
- Would be super simple and have most of the benefits of a more complex solution.
MM: last part makes sense to us. Having simple sets of resources that can all alias is a lot easier and gets us most of the way there. Much harder to describe a graph of which resources can alias other ones.
CW: solution “C” is about the same one that DM mentioned, that you need a really good allocator. Sweet spot would be something around solution “B”.
MM: we’re OK with solutions “B” or “C”. (Implicitly, also solution “D”.)
CW: OK to defer this after MVP?
RC/DM: yes.
MM: yes, can defer.

Agenda for next meeting

Pass discussion
SIGGRAPH is next week
MM: every hotel in Vancouver is full. We might not end up being there.
CW: Airbnb has been helpful.
Will send out a poll for meeting time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2018 08 01

GPU Web 2018-08-01

Minutes from last meeting

TL;DR

Tentative agenda

Attendance

Administrative Items

Passes

Render target aliasing

Agenda for next meeting

Clone this wiki locally