Skip to content

Minutes 2018 06 27

Corentin Wallez edited this page Jan 4, 2022 · 1 revision

GPU Web 2018-06-27

Chair: Dean

Scribe: Ken

Location: Google Hangout

TL;DR

  • Apple’s IHV feedback
    • Render passes are critical, could be worth looking into framebuffer fetch
    • Need some form of multithreaded command recording
      • Metal has the ParallelRenderEncoder for this
      • Concern about NXT’s “record on submit” approach
      • Piece of data is that traversing recorded commands is faster than the app’s graph
    • WebGPU should lose context on a device reset.
    • Aliasing render-targets is critical
      • Could have a heap for fast reuse of render target memory
      • How important is exposing heaps?
  • Minimal interoperable MVP demos
    • WebGL had some interoperable demos before the CTS (like the aquarium)
    • We want demos but no benchmarks yet
    • Some NXT demos could be ported, maybe some DX SDK demos too
    • Should have some test but no need for a CTS for the MVP

Tentative agenda

  • Apple’s IHV feedback
  • Minimal interoperable demos for WIP MVP implementations
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Myles C. Maxfield
    • Theresa O'Connor
  • Google
    • Dan Sinclair
    • James Darpinian
    • Kai Ninomiya
    • Ken Russell
  • Microsoft
    • Chas Boyd
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
    • Markus Siglreithmaier
  • Elviss Strazdiņš
  • Joshua Groves

Admin Items

  • DJ: Tentatively: October 1, 1-day meeting
  • Need to book a room, Apple will volunteer to host
    • Aiming to co-locate a WebGL F2F meeting at the same time on an adjacent day
  • Would be good to get an idea of vaguely how many people will go (aiming for about 12)
  • Will have WebEx dial-in for remote people
  • If you know of anyone who will make it and who didn’t come to the Montreal F2F, let us know
  • DJ: feedback on CLA
  • DJ: background:
    • Google’s, Apple’s and Microsoft’s counsels have been discussing Intel’s feedback
    • Suggestion: let them have another pass at working it out
    • Don’t want to speak for Apple’s counsel, but maybe Apple’s position is that it’s more of a misunderstanding than a disagreement
    • If you haven’t given feedback and strongly support the agreement as it is, may be worth making that public as well

Apple’s IHV feedback

  • DJ: received feedback from vendors
  • MM: regarding the email sent out last week or so
  • MM: not going to just read the email, but some bits from conversations that weren’t captured in it. Assuming you’ve read it.
  • MM: about on-chip memory: one item talks about framebuffer fetch. Only possible on mobile due binning GPUs. When any fragment gets rendered, it’s clear what was rendered just before it.
    • Don’t have any proposals about that thing, but something we should consider.
    • DM: largely the same for Vulkan. Would consider later when we focus on mobile GPUs the ability to define multiple sub-passes.
    • DM: it did catch my eye that it was marked “critical”. Not an optional thing. Critical is serious.
    • MM: “i’d consider a render pass approach critical” - that’s discussing Metal vs. D3D12. We’re doing the critical thing already.
    • DM: our current sketch API has Metal-style RenderPassDescriptor passed in when starting render pass. To map to Vulkan/D3D12’s passes would need to create the render pass in advance. May need to change the sketch API to create the render pass earlier.
  • MM: about parallelism:
    • On the CPU, recording drawing commands is expensive. Usually involves traversing a big node graph.
    • Usually want to put a whole lot of rendering commands into one command buffer.
    • Should be some way to let multiple threads render into the same command buffer.
    • This group didn’t decide how this could be done.
    • Way that prob makes the most sense: the “call-out” method, where you can call out to a second-level thing. When the second one has been completed, returns back to the same point in the first, like a function call. Only two levels deep.
    • Different than just having more command buffers. There aren’t barriers created at the call-out and return. Barriers are only at the outermost level.
    • DM: aren’t the barriers on the encoder boundaries anyway?
    • MM: yes. Talking about general strategies. Metal achieves the call-out strategy via ParallelRenderEncoder. Each individual command buffer recorded by the threads are secondary command buffers. Metal creates the top-level one which calls out into the others.
    • DM: have the Metal ParallelRenderEncoder. Have D3D12 Bundles as a different use case (reusable chunks of commands). Vulkan’s 2nd level cmd bufs can do both. Feels sound to me to provide these as two separate extensions. ParallelRenderEncoder like Metal, and Bundles like in D3D12.
    • MM: not sure how deep we want to go in discussion now. But if extensions are valuable and implementable on every platform, add them to core.
    • DM: aspect that’s imp’t: do we encode on multiple threads. NXT defers encoding until submission time, eliminating the benefits of parallel encoding.
    • MM: have investigated in WebKit. Traversal through the app’s scene graph is usually slower than traversing through recorded drawing commands.
    • DM: so we’ll save on this scene graph traversal because the app will be doing it in parallel?
    • MM: yes.
    • DM: hmm, ok. Goes across some of the advice from WWDC this year, where you have to record on multiple threads or you’re not taking full advantage of the API.
    • MM: the advice from WWDC is the best case advice. Even if you do defer passing each individual command to the api, you still get some benefit from parallel scene graph traversal.
  • MM: about CPU/GPU interaction:
    • Looking at Metal applications, many are callback-based, and also some that are busy-loop-based. Callback-based is like how Cocoa does it. But there are also apps which do “while (true) { … }”, and those are throttled in the nextDrawable call.
  • MM: GPU resets:
    • When Metal executes a device reset it doesn’t clear the GPU memory. To an app, this particular command buffer didn’t finish, and you get a callback. The status is “not valid / failed”. App can just keep going; context is still valid.
    • Inside the command buffer, something caused the device to reset. Could be that the device is in a weird state. Usually resets don’t leave the device in a weird state, so app can keep submitting new command buffers. Silently ignores the device reset.
    • But sometimes the device can be in a bad state.
    • Instead of taking Metal’s approach, WebGPU should take the more heavy-handed D3D approach, so if any device reset occurs your context is no longer valid.
  • MM: aliasing and resources:
    • Note from GPU SW team is, it’s generally not very useful except for aliasing render targets. We could probably find a middle ground where we come up with a system that’s just for render targets.
    • JD: what’s the use case for aliasing render targets?
    • MM: in a frame you usually render to multiple render targets. While you’re drawing to one of them, it’s no longer useful to keep around its contents for the next frame. Also not useful to keep the contents around for a later CommandEncoder.
    • Render to targets 1, 2, 3. While rendering to 3, contents of 1 are probably not useful. Would be cool if 1 and 3 could share the same memory.
    • JD: you could reuse the render target but if they don’t have the same format I understand why it could be useful. Surprising this is a big concern but understand it.
  • MM: these previous 4 points are about all that was lost from the conversation.
  • DM: the render target aliasing is almost required for frame graph-style applications. They dynamically traverse the frame graph and allocate/deallocate. Could have a heap where you allocate and free render targets during the frame rather than aliasing.
  • KN: I like that idea better than aliasing.
  • DM: MTLHeap question: if someone was trying to do a poor man’s heap, it’s not trying to answer how useful it is. It’s just saying, can’t use it for textures. This question needs more clarification from the Metal team.
  • KN: it lets you reuse memory you couldn’t otherwise reuse.
  • DM: want to know how useful this is. Roblox says that heaps are super important.
  • JG: placement in those heaps?
  • DM: no, bulk allocation. During the frame want to make sure no device allocation occurs.
  • MM: if question is, does Metal team think bulk allocation is one of the most important reasons to use heaps, the answer is no.
  • DM: ok.
  • MM: sort of answered question by not answering it. Happy to take more feedback in writing to metal team.
  • DM: more, how does the Metal team think exposing the heaps to the users is important, vs. our managing them ourselves. We settled on the second for now, but not sure whether that will be efficient or not.

Minimal interoperable demos for WIP MVP implementations

  • JG: support this idea but a matter of timing. Goal is to have an MVP that we don’t necessarily ship, but that everyone’s more or less supporting? And later, investigate things booted off MVP for V1? And once we have an MVP, we expect it to be interoperable?
    • Do want demos / tests, and demos are pretty tests?
    • Expecting to see things like this around Q4.
  • MM: was the goal of this idea to help us rank features? If a demo we want to write needs a feature, we write it? Or to convince ourselves that we’re interoperable? Or how snazzy things are?
  • KR: Before the webgl conformance suite there were ~6 demos that showcased functionality and cross-platform interop. Really basic. It might be useful to put together some demos (port Aquarium?) to showcase functionality and interop.
  • MM: is this the sort of thing we should brainstorm? Maybe each company represented here could write something.
  • DM: want those demos to be on gpuweb repo.
  • JG / KR: yes.
  • KN: and we can update those between MVP and V1.
  • JG: an AI I had was to get some very basic WebGPU API tests based on sketch IDL somewhere. If we agree those should go into Github repo then I can put them there.
  • KN: ultimately will go wherever tests go.
  • JG: don’t think we need a CLA for tests.
  • KN: repo should be fine.
  • JG: great, will submit a PR.
  • JG: porting Aquarium could be cool. Always wary about benchmarks. But if we make it and keep it simple, could be useful.
  • MM: when we port it we could get rid of the performance reporting.
  • JG: only against bad benchmarks. Hard to judge how good a benchmark is without diving into the architecture, but if we write it it’s less of a problem. People will turn on FPS counters, so might as well do it properly instead of other people doing it poorly.
  • KR: so let’s just sync up to make sure we don’t step on each others’ toes.
  • KN: we have some demos from NXT that we’d like to pick up and polish.
  • KR: was thinking some sort of global illumination algorithm.
  • CB: we had some samples in the DX SDK with async compute + global illumination. Been out there for a while. Fluid simulation, particle simulation, post processing. Separable kernel convolutions.
  • MM: is that volunteering? :D
  • DM: the postprocessing stuff is a good example of compute-assisted graphics. Even a simple blur using a compute shader.
  • JG: was something in WebGL-land that was a list of approaches showing e.g. transform feedback, swizzling, etc.
  • KN: have presentations like that
  • KR: thinking of WebGL2Samples repository.
  • JG: since I’m putting up a test subdirectory, could make another one for samples. Let’s just put it in the same repo because the API is in flux right now.
  • DM: thought I heard we need a conformance test suite for MVP. We don’t need it to be complete, just need the infrastructure set up.
  • JG: right, but question is how “done” the conformance suite is in your mind. We should have some tests, but don’t need to feel it’s complete yet.
  • MM: you’re getting at the idea of what should be shipped in different browsers. Each browser will decide that on the way toward MVP. Like, do we need to finish the conformance suite before we ship MVP?

Agenda for next meeting

  • JG: let’s try to burn down PRs and see if that uncovers issues and bring up agenda items for the next meeting
  • MM: will try to come up with a system to describe render target aliasing, but probably can’t do that by next meeting
  • Next meeting July 11 (skip July 4)
Clone this wiki locally