Minutes 2018 02 07

GPU Web 2018-02-07

Chair: Corentin

Scribe: Ken

Location: Google Hangout

Minutes from last meeting

TL;DR

Meta: have a sketch API in WebIDL form to iterate on
- Agreement on the general idea, and all proposals should be in WebIDL form.
- Been working in the abstract and nothing to show, having IDL would help folks that need to work on something concrete.
- Suggestion to use simplified, less-opinionated NXT WebIDL as a starting point.
- Hard to agree without seeing the result, Google will produce that WebIDL
Iterate on buffer readback/upload API
- Suggestion to use Apple’s simple IDL for the sketch API.
- Persistent mappings could be are possible even in Chrome, not clear if DRF
- Advantage of persistent mapping over re-using ArrayBuffers is that it could generate less garbage, but is harder to make DRF.
- Should investigate more and write down why we choose one over the other.
Shape of the API Google’s proposal
- API should do feature detection via the existence of specific methods.
- Technically two different APIs, but that are isomorphic.
- Descriptor could be faster even for JS.
- A constraint is that all WASM imports have to be visible by JS.

Tentative agenda

Meta: having a sketch API in IDL form
Iterate on buffer readback/upload APIs
Shape of the API.
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Corentin Wallez
- James Darpinian
- John Kessenich
- Kai Ninomiya
- Ken Russell
Microsoft
- Ben Constable
- Chas Boyd
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
- Markus Siglreithmaier
Yandex
- Kirill Dmitrenko
Elviss Strazdiņš
Joshua Groves
Tyler Larson

Administrative Items

AIs for everyone: ping our respective lawyers. We need to get an answer on the CLA for our CG.
Register for the F2F in Montreal.

Meta: having a sketch API in IDL form

CW: feel we haven’t made tangible progress. Feel that it would be more productive to make a strawman IDL to have something to talk about.
- What about IDL and some tiny spec language, and make faster progress that way?
BC: totally agree
MM: totally agree. We think we should go further and say that new proposals going forward should come with Web IDL.
CW: agree, for all new API proposals, would be best to have IDL.
DM: how will the IDL help us narrow our disagreements? We have 3 prototypes.
CW: we haven’t had an IDL presented to the group that we can all discuss. NXT’s is autogenerated.
MM: our prototype doesn’t have IDL.
CW: not sure if Mozilla’s IDL is the same as the Obsidian IDL?
DM: it’s similar.
CW: would be great if we had some IDL that we all agree is wrong, and that we try to make better. Need something concrete to build on and modify.
DM: not sure. Intuitively think we should just try to narrow down the big questions before going low level on IDL semantics.
CW: we’ve been doing that for 9 months, nothing to show.
DM: illusion of progress.
JG: would help some folks; I’m personally comfortable talking in the abstract, but some folks would like something more concrete. Agree it helps phrase the discussions, won’t need as much back and forth on some parts of the model.
MM: I’m not comfortable talking only in the abstract.
BC: why not both? Should make everyone comfortable.
JG: just need to synchronize our efforts.
BC: just a matter of someone proposing some IDL. Does the IDL file in nxt-standalone have all the builder stuff?
CW: the source of truth in NXT is a JSON file. Structure is extremely limited for code generation purposes.
BC: you have more familiarity with Obsidian vs. NXT at the interface design level.
JG: Obsidian is abstract. Long time since looked at it. Dzmitry’s work on gfx-rs has been disjoint from that.
MM: little concern that if we start from NXT that we won’t be treating all the platforms the same.
CW: yes. Agree that we should not start with NXT’s IDL. Has baked in assumptions about way mem barriers, render passes, etc. work. Eliminate it in the sketch API. Make mental model as simple as possible, then make it more complex as we add features.
KN: agree, don’t think that any of the prototypes’ APIs are a good starting point.
CW: the reason we did NXT instead of WebVulkan at the beginning was to put all 3 underlying graphics APIs on equal ground.
BC: Microsoft didn’t come up with NXT, was about to make the observation that Metal was working more than D3D12 was.
CW: Vulkan backend in NXT is not done yet.
BC: my proposal is to take the JSON file (from NXT), figure out a way to get an IDL from it, and have people look at and argue it.
JG: we have to have a starting point, so we have to pick one.
BC: NXT in my opinion is not biased toward any one platform or the other.
MM: seems to me that if the designers of NXT say it isn’t a good starting point, then we shouldn’t use it.
BC: disagree. Microsoft chose to help with NXT because we like the way it looks and the way it’s going. Most accessible starting point. Want everyone to have the faith and trust that we’re not trying to force the API on anyone.
KN: from a functionality standpoint it’s a good starting point, but because of the code auto-generation we might want to clean up the generated APIs before we start debating it.
BC: can discuss nits later. I don’t like the builder pattern but it also sounds like Google folks aren’t sold on it long term either.
MM: how about a compromise? Designers of NXT can look over the IDL, maybe clean it up or take parts out, then once we know it should look like we can discuss it as a starting point?
CW: I like that idea. How about over next few days, we start with the NXT API, trim it down, simplify it and put it out as a suggestion.
BC: sounds good, what do you think Dzmitry and Jeff?
JG: sounds good
DM: sounds to me like we’re blessing an implementation and its API.
JG: the only way I think we could make more progress is get everyone in a room fo 2-3 days.
MM: don’t feel like we’d be blessing it; we would be looking at it and evaluating it first.
CW: NXT “dumbed down” for a sketch – there’s basically nothing in there. Would look very simple and non-controversial.
DM: OK, let’s see in a few days.
MM: at a meta-level, saying it’s hard to evaluate without a concrete proposal – all the more reason to do this.
CW: AI for Google: put together this slimmed-down API.
CW: once this is there: should proposals always come with some sort of diff on the IDL?
JG: typically this is the job of pull requests
MM: ideally, new requests should come with IDL. But maybe it’s not necessary to go all the way to a pull request.
KN: agree, for API design changes, don’t think we want to put together a pull request modifying every API entry point if we want to discuss a stylistic change.

Iterate on buffer readback/upload APIs

CW: suggest we go with Apple’s simple API proposal.
MM: that is a fantastic idea.
CW: there was an unresolved discussion about persistently mapped buffers. There was some thinking we should have persistent mappings in the API. Dzmitry, can you comment?
DM: comes down to: Chrome can not do this because of specific portability requirements Chrome has to impose on the API.
CW: yes, persistent mappings are hard when you have a multi-process model.
MM: is this because you can map the GPU memory into some process, but then can’t map that between another process?
CW: yes, basically. If you want to share memory between processes on any platform have to do it via a file descriptor or Windows handle or similar. No way to represent your GPU mapped memory, except Vulkan with weird extensions. If Chrome is running on top of Vulkan with these extensions we might be able to take advantage of them, but not on Metal or D3D12.
JG: you couldn’t map the driver’s buffer, but could have a general persistent mapping between processes and have to flush ranges to have them seen. Difference between persistent and coherent. Coherent is when you update data on the GPU and see it magically in your mapping.
CW: persistent mappings are hard because you want to have the buffer given by the GPU. Or you’re saying to have a shadow copy mechanism?
JG: yes. Shadow copies / flushes let you have this in an IPC architecture.
MM: the point you made makes sense. Persistency makes sense, possible, not horrible. Coherency, not possible, horrible.
JG: depends. Couldn’t have something device-local, coherent. Could have something host-local, non-coherent.
MM: let’s not design our API around something that doesn’t work all the time.
JG: made points about whether something is technically possible and whether we should investigate it or abandon investigations.
CW: harkens back to WebGL APIs where MAP_READ usage bit hints to make a shadow copy. If you can tell the GPU what ranges are going to be written then you know exactly what parts to copy back.
JG: correct. A lot of this is still TBD. I’m confident that something like this works if you don’t care about portability, data races, dependencies, etc. Question is if you can avoid data races, etc.
MM: Apple cares about portability, data races, dependencies.
JG: to rephrase: quite confident that if you didn’t care about these you could have persistent mappings; but question is: how much does the IPC architecture and persistent mappings and requirement to not have data races collide with the desire to have persistent mappings? Not confident on this yet.
MM: so is the proposal that the runtime will copy things around between processes at its own rate?
CW: if you do a copy in a client buffer to something you submit, after fence for that command buffer, then it’s able to do the copy from the thing. It’s fairly defined when that happens. When JS can see the fence is passed, may be able to guarantee that the writes are done on the shared memory buffer. Hard and complex. Might be doable.
MM: I think I get it. This requires that all of the resources a command buffer can affect are known to the runtime when the command buffer is submitted. So the runtime knows when the GPU is using which resources. Copy in at beginning, copy out at end.
CW: yes, something like this.
JG: in multi-queue environment you need to understand this anyway. Not an extra amount of data you have to have; need to know these data dependencies anyways.
MM: bring this up because it relates to a topic: GPU-driven rendering, where the CPU doesn’t know which resources the command buffer affects.
CW: hard to validate.
MM: agree. Would be cool but validation would be impossible, and hard to know when the copy should be done.
KN: question: what is the benefit of persistently mapped buffers over the idea of mapping as creating an ArrayBuffer and maybe it gives you back the same buffer each time, maybe with transfer semantics to avoid data races?
JG: you can map one large range and do flushes of sub-ranges.
CW: it’s the same thing but you generate less garbage.
KR: In a same-process environment the map/unmap operation can be expensive, is that why persistent mapping were suggested?
DM: was thinking more like modifying many small things, creation of many small ArrayBuffer instances. But concern about cost of map/unmap also sounds right.
KN: diff between persistently mapped buffer exposed to JS, vs. having a persistently mapped buffer in the implementation.
JG: if we’re using it why not expose it to JS?
KN: because hiding it may allow us to hide data races and improve portability.
MM: agree.
CW: if you’re single process then you’re coherent, but if you’re multiprocess you’re not. Portability is lost.
KN: you wouldn’t give JS this ArrayBuffer until there’s no data race.
MM: Agreed, the implementation would make it look like there are different copies even though it has a single persistently mapped buffer underneath. Would ensure portability across browser/devices.
JG: don’t think it’s necessary to ensure portability.
KN: not sure what the detriment is to the idea of returning multiple ArrayBuffers on the same backing store, aside from garbage.
JG: don’t like the idea that we’re using something internally that we aren’t exposing in the API. If it’s useful we should expose it if we can address the portability concerns.
KN: there are a lot of things we’re using in the implementation we’re not going to expose in the API, like all of Vulkan.
JG: if we don’t want to do this exercise all over again in two years, we should expose all the primitives we’re using.
MM: these three platform APIs were not designed with the same portability concerns this community group has. “Exposing everything” is not a good default for our goals.
JG: concerned that we’re not giving some features a fair shake because we don’t want to spend the time investigating them.
MM: what if you think about whether you can do this in a way that’s portable on all the platforms?
JG: that’s the latitude I was requesting.
MM: great.
CW: kind of hard for people to wrap their heads around all the concepts. Easier to say, let’s look at this thing and add more detail.
JG: agree. Just don’t want to dismiss something because we don’t have the time to do the deeper investigation. At least, table them with a couple of sentences why.
CW: one of the artifacts we should produce: why things aren’t in the API.
MM: yes, having rationale will be valuable.
JG: will be valuable two years from now. “Why did we do things this way?”

Shape of the API.

CW: had discussion a while ago about how many JavaScript dictionaries we use. Had discussion with Brad Nelson (chair of WebAssembly WG). Had proposals on making fast function calls from WebAssembly.
- Proposals have in common: calls with numeric values and opaque types will be fast. Property bags, not sure.
- Reached out to Brandon Jones and a couple of other people, put together an issue (https://github.com/gpuweb/gpuweb/issues/46) with some suggestions for the shape of the API.
- Could be super-fast from WebAssembly and look nice in JavaScript.
- The issue does come with Web IDL.
RC: did read this. Don’t think we should use the presence of a function in the type system to know whether a feature is present on the device. Reason: if the API ever uses multiple devices at the same time, a feature might be exposed on one device or another.
CW: good point, agreed. Second block of code: “if canUseMaxAnisotropy…”. Can throw if the thing isn’t in an extension supported on that device or enabled, and also throw JavaScript exception if the browser doesn’t support the feature. It’s a side-effect of the proposal that you can check if the browser supports something by presence of the function.
RC: checking the type should mean “does the browser support this at all”. Should need to query the device to see if the hardware supports it.
KN: not sure whether there should be a separate way to tell if the browser / device supports the particular extension. Both should be hidden behind the same query.
MM: IDL says (WebGPUSamplerDescriptor or WebGPUSampler dictionary). This is a construct we don’t know if we can do in WebAssembly.
KR: Idea is that what’s exposed to WASM is only the variant that takes the opaque object (descriptor). You have to create an opaque descriptor, call methods on it and pass it to the typed-create function.
MM: So actually two different APIs.
KR: JS can use the same API as WASM or a more javascripty API.
KN: both of these would exist in the web platform and be exposed to JS. But a WebAssembly app would use the low-level APIs taking opaque handles. It could also use the JS entry points that would go through trampolines, though would be slower.
CW: this idea lets you take almost any web API and turn it into a fast WebAssembly API. Would be very regular (ideally) how you expose web APIs to WebAssembly.
RC: say you’re calling createSampler, using the dictionary approach. How do you know whether maxAnisotropy took effect?
KN: it will be ignored if not supported.
MM: the way you do this is to ask for it later.
RC: But the WebGPU sampler is opaque.
KN: could have dev layers to make sure you don’t have unknown keys in your dictionary. Those layers could have every supported extension.
RC: shouldn’t have to use a dev layer to know whether you have anisotropic filtering.
KN: the canUseAnisotropy bool came from a device query earlier.
CW: the dev layer / devtools extension would just be there to make sure you do your checks properly and don’t have any typos. If you can’t use maxAnisotropy then it would tell you that you didn’t enable the extension, had a typo in the name, etc. Still have to check availability at device creation.
MM: seems there are 4 questions, along two axis:
- 1. Should and API be available in JS/WASM?
- 1. Should that API take a dictionary or sampler descriptor?
For WASM piece I don’t know much. For JS side: when would the developer use the descriptor path if they could just pass a dictionary?
JG: the better one is the descriptor. Heavyweight, full object. Sucks to have to instantiate these all the time. Sort of complex when you have to create, then fill out, then use the descriptor.
- In my mind: anything taking Descriptor Dictionary: converts it to a real Descriptor first. Just create it once.
MM: why would the JS API not just take the dictionary?
KN: there is no way to expose something to WASM without also exposing it to JS.
CW: if you use WebGPU from WASM: you would hook in all the places in the API into the WASM module’s import table. Has to be exposed to JS to be able to go into the import table.
JG: this is the only way to get hooks into WASM.
KN: there are no direct hooks from WASM into the web platform and no plans to add them.
MM: so from WASM you could call into JS to create a dictionary, then call into JS again to pass in that dictionary?
CW: yes. You could also put in a JS function in your WASM import table which would take a string, then set attributes.
JG: descriptor API should be faster.
KN: in theory should not be faster from JavaScript. JIT can take a structure known at JIT time and turn it into an actual C struct. If you have a property bag at your bottommost layer it doesn’t need to use the property bag at the bottom.
JG: the dictionary is a map. A first-class object is defined in C++ and has a uint32 in it. Every time I pass in that object I’m directly reading members out of it instead of going into a map.
KN: JIT can elide the map operations.
KR: When you receive a dict in the C side of things and want to map it to what’s given to the C++ code of the browser, it can be super expensive. Gut feeling is that descriptor could be faster than dict.
JG: only reason I would propose only accepting dictionaries: you could have a helper library do it for you.
KN: think this is a really light overhead for our API but that’s certainly possible.
CW: let’s keep the discussion going on the mailing list.

Agenda for next meeting

Google: put together slimmed-down NXT API.
MM: should we try to enumerate the remaining big questions?
CW: memory synchronization?
JG: did we nail down memory heaps?
CW: no.
CW: maybe raise big questions at the end of next meeting. We’ll have IDL that covers all aspects of the API so we can think of what’s not addressed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2018 02 07

GPU Web 2018-02-07

Minutes from last meeting

TL;DR

Tentative agenda

Attendance

Administrative Items

Meta: having a sketch API in IDL form

Iterate on buffer readback/upload APIs

Shape of the API.

Agenda for next meeting

Clone this wiki locally