Skip to content

Minutes 2018 01 31

Corentin Wallez edited this page Jan 4, 2022 · 2 revisions

GPU Web 2018-01-31

Chair: Corentin

Scribe: Ken

Location: Google Hangout

TL;DR

  • Apple’s buffer upload/readback proposal
    • Do the simplest thing possible for the MVP.
    • When reading back the data is copied internally so that there is no data races.
    • The commands are put in a “pass” which Apple’s prototype’s command buffer
    • Suggestion of a specialized “client-side” buffer that can receive copies and be mapped at the same time.
    • Suggestion of access control on mapped buffers using ArrayBuffer “neutering”
      • JS constraint is that ArrayBuffer cannot be un-neutered.
  • WebGPUMappedMemory, Google’s proposal
    • An object that is both then-able (promise-like) and pollable.
    • Concern about making it a promise: they resolves only once, and go at the end of the micro-task queue.
  • Meta: How about having a “sketch” API that’s super-simple (and bad) to iterate on?
    • On the buffer mapping side, people feel we can make progress soon
    • Defer discussion on creating a sketch IDL until next week at least
  • WASM CG confirmed that all “host bindings” proposal have will have efficient function calls on opaque objects and primitive types

Tentative agenda

  • (A)Synchronous buffer readback
  • Shape of the API (stretch)
  • Agenda for next meeting

Attendance

  • Apple
  • Dean Jackson
  • Google
    • Brandon Jones
    • Corentin Wallez
    • James Darpinian
    • Kai Ninomiya
    • Ken Russell
  • Microsoft
    • Ben Constable
    • Chas Boyd
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
    • Markus Siglreithmaier
  • Yandex
    • Kirill Dmitrenko
  • Joshua Groves
  • Tyler Larson

Administrative

  • DJ: We should be in agreement to the CLA. Similar to Apache2

(A)Synchronous buffer readback

Dean’s proposal

  • DJ: Apple’s proposal is very simple. Idea: do something as simple and stupid as possible for an MVP. Upload/download data. Download data returns a promise. Upload is synchronous. Want to prevent synchronously reading, causing a flush. No state kept around. No data races. Implementation responsible for doing the transfer.
    • Example shows how enqueueing these commands would work.
  • CW: have been thinking about how to access things from WebAssembly, because Promises are hard to deal with in WebAssembly. But maybe we don’t have to think about WebAssembly initially.
  • DJ: hard to get info from WebAssembly folks, so hard to know the form we’ll want.
    • Requires the copy during download / readback.
  • JG: main issue: not only eat a copy for the whole thing, and you also eat a copy if you’re updating just a few sub-parts of a large buffer. Would be good to reduce the copying we have to do.
  • DJ: we could definitely add a SubBuffer API and still keep it simple.
  • JG: could enqueue a number of ~10 subrange updates. Could issue 10 calls. If you’re manipulating large amounts of sparse data, could be unfortunate.
    • Might be possible to tie this into SharedArrayBuffer work in particular for IPC-based browsers like Chrome
    • Still like the idea of being able to map the shared buffer (between IPC channels) on multi-process browser architectures.
  • DJ: not arguing against Mozilla’s proposal from last week; just saying that this is a really simple proposal. Expect that some people will say that they want at least a slightly more complex API. Easy to expand this API because it’s minimal right now.
  • JG: questions about which APIs could take mapped buffers. Could decide this after the MVP.
  • CW: would be great to an MVP so that we have something in hand, working, and then go forward together from there.
    • Something with stupid simple semantics like this sounds good.
  • RC: what happens if you change InputBuffer between the time you call uploadData and when you enqueue it?
    • DJ: yes. When you call uploadData, that’s when a copy is made.
  • JG: do you “enqueue” the Download command? Same synchronization behavior? It would make the copy out at the point it’s executed in the command queue?
  • DJ: don’t actually have to allocate a copy of the ArrayBuffer. Can return the implementation’s version (?)
  • DJ: download is enqueued. It operates in order with regard to the other enqueued commands.
  • JG: just asking. Two ways of implementing this, including polling fences.
  • DJ: technically you could defer the downloadData to the end of the queue if you know nothing’s touching it before then.
  • RC: assume downloadData gives you a brand new ArrayBuffer?
  • DJ: yes. It’s up to the implementation to allocate the new ArrayBuffer and return it.
  • CW: why is this part of a pass instead of being an operation on the device itself?
  • DJ: could be done like that. Thought it made more sense to think of upload/download as a pass, to nail down the order of operations.
  • JG: all of the base APIs we’re building on top of have blit passes / blit encoder concepts. Part of why is you need to enqueue them into a command queue to execute them at some point in time. You can’t demand a download ‘right now’, for example; have to request it in the command queue.
  • CW: in the implementation you could have “pending commands”.
  • JG: just making a “virtual queue” instead of a “literal queue”.
  • DJ: the example at the bottom shows the goal was to make it as simple as possible. If you treat uploads/downloads as a pass, it’s really simple to express things like “upload, do some compute, and download”.
  • BC: my assumption was that these were commands you put in the command list. Only way it makes sense to me.
  • CW: Apple’s “pass” is actually a command buffer.
  • DJ: yes. The names don’t matter. Goal of this proposal is to…
  • BC: in D3D12 you put a CopyResource in your command list. Vulkan probably has something similar, as does Metal. Seems we’re building on top of that. Thought Jeff had a specific concern. When you’re pulling from the GPU, not much problematic; just put this command in the stream. Until the command’s completed and you’re notified, you don’t have a copy of any bytes. Thought Jeff was concerned about: if you’re uploading, and you enqueue the upload, there’s a delay between the enqueue and when the GPU executes the copy/upload. It is a data race or synchronization issue we need to think about.
  • JG: what I was worried about is a little different. These APIs want to ensure atomicity, which requires multiple copies down the chain.
  • BC: in D3D12, it doesn’t make a copy for you when you call the API any more. You have your adult pants on now. If you say “copy from this buffer” it assumes you’re going to have fences, etc. in place to ensure you aren’t modify the data while the GPU’s reading it. For the web-based API you don’t to be able to break yourself, so have to insert a copy. But: is there a way in JavaScript to make these buffers read-only?
  • JG: do think one of the keys here is: in GL you can map buffers. Client-side buffers and GPU-side buffers. Allows you to have stronger validation guarantees about reads/writes to the client buffer, and you enqueue a copy from the client buffer to the GPU buffer. Then we can think about this in terms of an API we can validate. Have different ideas about how deeply we want to eliminate data races. Maybe can’t have client buffer mapped and enqueued for copies at the same time. Would also get a natural way to implement a shared memory architecture for IPC implementations. Client side buffers would be in shared memory with the GPU process, so eliminating a copy. Maybe using SharedArrayBuffer, or maybe explicit “copy to/from client side buffer” primitives, that are actually synchronous – only going to/from shared memory.
  • BC: sort of see where you’re going. Looking at Apple’s proposal, thought you could eliminate data races using their proposal. Object would be locked until you’re done with it.
  • JD: there’s a similar concept in ArrayBuffer – Transferable.
  • BC: I’m not strong on the JS concepts. But if JS doesn’t support what we want, we could add “staging memory” to the WebGPU API, similar to what D3D12 has in the API. Prevent you from fiddling with memory while GPU might be accessing it.
  • CW: transferring ArrayBuffers to the impl could be a way to do that. Google’s proposal has a similar concept; mapping a buffer for reading asynchronously, which could return a SharedArrayBuffer for reducing memory copies. Would be similar to D3D12’s buffers in upload heaps.
  • JG: some advantages here in making the no-copy path to look like “Buffers” rather than “ArrayBuffers”, and offering a way to map the Buffer into e.g. a SharedArrayBuffer. Once you’ve unmapped you aren’t supposed to touch it any more.
  • JG: short of dealing with ArrayBuffer transferring, there’s not a way to prevent people from using a buffer. Instead could have a persistently mapped “Buffer”.
  • KR: It would be suboptimal / impossible to have a read-only typed array. Java wants this route and perf was abysmal. Best thing in JS is to neuter ArrayBuffer / SAB.
  • CW: map buffer, get SharedArrayBuffer. Write to it, then done writing. API neuters SharedArrayBuffer. Seems performant?
  • JG: problem design-wise is: not sure you can have an ArrayBuffer whose lifetime is cyclic.
  • DJ: Jeff’s right; once you transfer an ArrayBuffer, it’s dead to you.
  • KN: an ArrayBuffer handle can not be un-neutered. Can return a new ArrayBuffer handle. In discussions with WebGL readbacks, wouldn’t be a problem returning a new ArrayBuffer pointing to the same backing store.
  • JG: that’s fine. We’d still be producing a little garbage. It would be better to not.
  • KN: doesn’t produce much GC pressure because the memory the ArrayBuffer pointed to is reused.
  • JD: why can’t we un-neuter ArrayBuffers?
  • KR: simply not possible at the JS language level. Would have far too many ramifications in JavaScript VMs.
  • KN: producer/consumer example.
  • CW: getting back to Dean’s proposal: are we happy having this as the first MVP strawman?
  • JD: what happens if you execute a command buffer multiple times? Can’t resolve a Promise multiple times right?
  • CW: correct, can’t.
  • JG: that’s part of why I like the idea of bringing the buffer we’re copying to into the API. That’s the way most of these APIs work.

<META>

  • CW: would be nice if we had something so that we had IDL rather than a bunch of documents.
  • DJ: that’s a good suggestion. We could add this even if we never use it. Could start adding other things.
  • KR: We already went through a bunch of similar proposals in WebGL etc. Apple’s proposal is beautiful but is very inefficient.
  • CW: trying to get to something that’s horribly inefficient and wouldn’t work everywhere; not a strawman API but at least you have something that you’re going to replace piece by piece.
  • KR: why not settle on a buffer API instead?
  • CW: because we’re never going to agree on them.
  • JG: it’s likely that one we put something in we’re going to stick with it. Agree that we’re going to move the stakes.
  • CW: in a way, if the proposal is going for simplicity rather than efficiency, we shouldn’t feel bad about replacing it later.
  • DM: concerns:
    • Proposal came out an hour before the call. Want to give it more time, even if we agree on it.
    • Want Mozilla’s proposal considered (we don’t have one written up yet) before we settle on something.
  • CW: we’ve had code in various documents up to this point. Arguing for having a toy graphics API / some existing thing like floooh’s sokol_gfx.h to have something to work with.
  • JG: understand your frustration. Lots of discussion and no tangible results. Think if we give it another week or two we may be able to have an actual good starting point, rather than one borne out of frustration. Give it a little more time?
  • DM: we have the toy APIs. Had two Metal-like APIs when we first met. DM’s and Apple’s were both Metal-like. Just because we do something quickly now doesn’t mean we have to agree on it as the long-term solution.
  • CW: we should probably put this IDL somewhere and implement it. Right now we have our own things that are slowly converging. Like JG said, maybe it’s out of frustration that we don’t have something in common. Maybe we should start with something that we all disagree with. I even disagree with NXT’s design. :)
  • DM: lots of issues with Apple’s proposal including reusability of command buffers.
  • DJ: if reusability of command buffers comes up then yes, we will come up with something different. Not saying that it’s going to be set in stone. Like the fact that it’s stupidly simple.
  • JG: recommend that we table canonizing an initial strawman IDL for at least another week.
  • CW: OK. This is the first meeting where this has been brought up.
  • JG: it’s a good point that we should aim to create IDLs for this stuff instead of just position documents.
  • RC: Dean’s stuff is good, IDL plus example code.
  • CW: let’s all take this meta-discussion back.

</META>

WebGPUMappedMemory

  • CW: Kai’s proposal had some good ideas about making things poll-able and promise-able.
  • CW: there’s a section “WebGPUMappedMemory”.
  • KN: built on the principle: a Promise is an impl of a then-able. Something you can do “.then()” on, and it calls that function when it’s done. Then-ables can be used in Promise resolution like Promise.all. Idea: make ourselves our own object (“WebGPUMappedMemory”). You can both poll to see when it’s done, and use it as a then-able. (Call .then and pass it to something that takes a Promise). Not sure of other examples in web APIs.
  • JG: polling vs. promises is like polling vs. events. My understanding from that is, that if you have callback based interfaces (without enqueuing), you can build polling on top of that. That’s part of the dilemma here, because Promises have to go to the back of the queue.
  • KN: good point, didn’t think of that. Maybe if this is a Then-able but not a Promise, then you can build the polling API on top of it.
  • RC: a question
  • JG: allow passing a Closure object. You decide what to pass.
  • RC: one surprising thing about Promises: if you do .then it’ll call your callback. An hour or two later if you take that Promise and call .then, it’ll still call your callback. That’s why I have concerns about a Promise, because it might keep ArrayBuffers alive much longer than expected.
  • KN: have getPointer() on here, returning an ArrayBuffer. Think it’s reasonable for the Then-able to get nothing, or the WebGPUMappedMemory, or something. Agree that passing ArrayBuffer may not be a good idea.
  • JG: in its purest form would be an event. In the callback you could access the mapped data but maybe it’s not passed to your callback.
  • KN: can make the WebGPUMappedMemory look as much like a promise as possible
  • JG: since it’s not a Promise you could make then() throw if it’s not mapped any more
  • CW: lot of options. Interesting part here is that it can merge the polling and promise based APIs.
  • JG: does anyone know: do Promises go on the event loop?
  • KN: there’s something called a “microtask”. Also a task queue. rAF, timeouts, etc. go on the task queue. Microtasks always get enqueued at the end of the current task. Happens before all the other tasks, at the end of your current task.
  • JG: like pushFront on the task queue.
  • KN: yes, except that Promises go on the microtask queue in the correct order.
  • JG: that’s probably the mechanism we want for pollability anyway. Think majority of folks here do not want you to spin-loop on polling.
  • KN: there is probably a good argument for allowing spin-waits on worker threads. Discussed this in WebGL as well. May not want to design that possibility out just yet.
  • CW: talked with Brad Nelson about WebAssembly host bindings with regard to Issue #46. Takeaway: in host bindings proposals, there’s at least one common denominator: there will be a fast way to call functions/methods on opaque objects and primitive types. This info is not clear in the WASM repos right now, but this will be true at some point.
  • JG: so you’ll be able to do an OO-style “Device.enqueue()” and it’ll be fast?
  • CW: yes. Not sure whether property bags will be fast, but this sort of operation will be fast. Not clear the shape it’ll take.
  • RC: does mapReadAsync return a new instance of WebGPUMappedMemory every time?
  • KN/CW: yes.
  • RC: if you call getPointer() multiple times will you get back the same ArrayBuffer?
  • CW: yes.
  • RC: different instances?
  • CW: different pointers. Different ranges of the buffer. You pass the offset/size to mapReadAsync.
  • CW: Since Chrome’s built using IPC we want explicit ranges on the buffers you’re going to read back.

Agenda for next meeting

  • META: think about having a joint strawman that we can iterate on and maybe break.
  • Should iterate on buffer readback/upload APIs.
  • Shape of the API. Have some ideas about something nice in JavaScript and efficient in WebAssembly.
Clone this wiki locally