Minutes 2019 12 02

GPU Web 2019-12-02

Chair: Corentin

Scribe: Ken

Location: Google Meet

TL;DR

Agenda for next meeting
- Made the agenda in next week’s document.
Feature levels
- Experience from D3D12 and WebGL is that it’s impossible to make performance levels, only feature levels
- Proposal to define coarse feature level in this group but concern around negotiation that this will entail.
- Each Apple, Google and Microsoft to make an analysis of feature levels for its platforms
#501 array buffer creation benchmark & Immediate uploads and friends #426, #481, #491
- Socialization of the idea of uploadBuffer + deferred closure that’s in the comments of #501.
- Discussion about the consequences of that design for validation, when the closure is called, feasibility in JS engines.

Tentative agenda

Agenda for next meeting
Feature levels
#501 array buffer creation benchmark & Immediate uploads and friends #426, #481, #491

Attendance

Apple
- Dean Jackson
- Myles C. Maxfield
Google
- Corentin Wallez
- David Neto
- James Darpinian
- Kai Ninomiya
- Ken Russell
- Shrek Shao
- Ryan Harrison
Intel
- Yunchao He
Microsoft
- Chas Boyd
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Mehmet Oguz Derin
Timo de Kort

Agenda for next meeting

CW: Next meeting is with the Khronos Liaison. We should come up with an agenda.
MM: one topic should be requirements for a potential fork of the SPIR-V spec that is not owned by Khronos, and whether that’s possible at all.
CW: in same spirit: should have requirements for what happens if a spec external to Khronos wants to reference SPIR-V and subset / extend it.
CW: there’s also a Khronos Data Formats spec (not sure about its IP zone); it’s great because it defines compressed texture formats bit-by-bit. Never seen anywhere else. Would help a lot if we could reference that. Also float16, rgb9e5
- MM: for that proposal it sounds like we wouldn’t want a fork of it.
- CW: seems pretty clear that it would be a reference.
MM: worth discussing Vulkan portability?
- DM: think what Neil’s interested in is how a WebGPU backend could be made for Vulkan portability, but probably not something we should worry about.
RC: had wanted to ask, if the group decides we want to make changes to SPIR-V and fork it, are we going in a legally unsound direction; and/or if we want to introduce a feature very similar to something in SPIR-V (for exmaple, we don’t like how tessellation shaders work), how obligated is the SPIR-V working group to take it?
- MM: and, are we even obligated to tell them about the changes we want to make?
KR: I think we should approach the next meeting with an information gathering approach, to figure out how this group could interact with the Khronos specs in ways that allows collaboration.
CW: from the last face-to-face: everyone at Khronos would be excited for WebGPU to use SPIR-V - not trying to argue for or against SPIR-V. Think the next meeting will be about Neil Trevett being excited about a potential new use of SPIR-V. Let’s use this time to figure out what all the rules are and about their consequences for potential collaboration
MM: beyond possibility of forking the SPIR-V spec, if we wanted to reorganize it into a single document, what would be the terms of that?
CW: how could this group make bug reports / feedback to the SPIR-V working group that don’t count as contributions, and how could this group make contributions to Khronos specs, if at all possible?
- I.e., not just “we’re going to fork”, “we’re going to reference and not touch it”, but middle grounds.
RC: also: what IP zone is SPIR-V in, and what would we need to join in order to make contributions to it?
- DN: it’s in the Vulkan/OpenGL/OpenCL IP zone, and you’d want to join if you want to contribute IP and make it available as frictionlessly as possible. Anyone can contribute IP (to my understanding) with appropriate legal signoffs.
- CW: the second question is a great one for next week.
JG: could we link to the agenda doc?
CW: I’ll make the agenda for next week now and copy these into it.
- Done: agenda for next week

Feature levels

MM: recap from 2 weeks ago: device limits. If you expose physical device limits there will be a lot (dozens) and if every device has different set of limits that’s a lot of entropy for fingerprinting. Hoping to make that not so bad.
MM: one way: no limits. Lump all devices into one bucket. Probably not best. Cheap phone should not look the same as a $1,000 graphics card.
MM: one proposal: buckets. Devices could be sorted into relatively small (5 or fewer, for sake of argument) buckets.
MM: Also: devices in same bucket should look the same to the web.
Questions:
- MM: technically, on browsers we support, can just do this without standardization support.
- MM: but would be cool if we could agree as a web platform to limit fingerprinting in general, and browsers to collaborate on what the buckets look like.
- MM: should definitions of buckets be written down anywhere? If so, in the spec or not?
JG: I like the first idea. The second, not sure. In WebGL, we’ve discussed what happens when you have a device / class of devices that need their max texture size restricted more than it was. One point about resisting high-level fingerprinting: an advantage in not spec’ing it tightly is if we have to restrict max texture size from e.g. 8K to 7K, it doesn’t (for example) also have to lose MRT support.
- Having it more loosely spec’d allows browsers to have more flexibility in how to treat them.
MM: think that makes sense. There are a bunch of devices which match pretty well except for max texture size. Would love to verify that as true before going down this route.
JG: for #1, FF has some fingerprint resisting technology, and I think having discrete feature levels will help. Writing it down makes it more difficult. It provides answers to developers like, what can I use? Gets a little tricky.
JD: I like the idea of feature levels not just for fingerprinting but also for portability.
KR: Historical context: we tried performance levels in WebGL a while ago, and did a lot of investigations. The performance charasteristcs of various pipeline stages was all over the place. Devices didn’t fall in nice buckets. Maybe buckets for limits would work better. In some other API, we’ve seen that the highest feature level needs to be split over and over again to expose the ever increasing features. (Android screen size is an example of that)
MM: you raised two points by my count. Figuring out which devices fit in which buckets, would need to do that analysis to do this proposal. One result of that analysis would be whether devices really fit into these buckets. Think we agree on that. On second point, instead of naming these with T-shirt sizes, what about numbers? The requirement is just that a considerable number of users fit in each bucket.
CW: one concern: feature levels for portability and preventing fingerprinting is great. Standardizing on feature levels will lead to discussions like: does this feature level require 16 storage buffers on compute stage? e.g., my company’s phone only supports 14, can we redefine the highest feature level? realize this isn’t a problem for some of us, but worry about the cross-org nature. Trading camels to get the bigger stamp on the release of our next devices.
MM: can we agree it’s worth gathering data on devices’ limits to see if they fall into buckets?
JG: very much in favor.
CW: different effective feature levels - agree what Jeff said. Each browser vendor would be free to make the feature levels whatever they need to be to make devices “close” to a given feature level. Having it in the spec would be nice; some tradeoffs. Worry about iPhone 20 or Pixel 10A coming out, and all of a sudden they’re just under a feature level, and someone raises the question about redefining the feature levels.
MM: if the feature level’s shipped for ~5 years it’s not clear that it should change.
JG: Ken has more background, but Chrome started requiring EXT_robustness for WebGL support on Android and then Samsung shipped a new flagship phone that didn’t support it. There are cases where we aren’t a big enough spec body that people are looking at our feature levels to decide what to ship. It’s hard because some apps might not work on a whole new class of devices.
JD: I’m interested to hear Microsoft’s stance on feature levels because DX has had them for a long time. Developed with help from hardware vendors.
CB: I’ve spent a lot of time on this. Used to have massive array of caps bits. Spreadsheet the size of a wall, and that was just the PC space. Not mobile hardware, and not covering performance. Lessons:
- Think it’s a good idea to do this sizing thing. Helps vendors decide what the installed base they’ll hit is, if they hit a particular set of features.
- Getting into number of buffers that can be attached at certain points: may be hard to find a grouping. Might have to query into the buckets.
- We’ve never been able to have a flag that returns performance characteristics. Drivers always lie and say they’ll be the max performance. We’ve always encouraged apps to run a little benchmark loop during their loading screen because that takes real-world performance into effect.
MM: thanks, that helps. Because we’re targeting DX and because DX already has feature levels, was hoping that we could just reuse those in that space.
MM: would also like to do this analysis on extensions. Presence of extension is also a bit of entropy, so clustering would be good.
MM: Apple’s not in a very good place to run this analysis because our hardware is more constrained. Would appreciate volunteer to either work with me or take it on.
JG: I’d say Apple already has this done with the Metal feature level spreadsheet. Wondering what it looks like on the D3D12/Vulkan side.
CW: for a while we’ve meant to get better data than vulkan.gpuinfo.org because the data’s really hard to parse. We can/should get the data for Android. For the desktop world the analysis is already done for D3D12/Metal.
MM: maybe that’s my first step, to cross-reference the Metal and D3D tables.
CW: we’ll take an AI to get better data on Android, but it might be pretty hard to get the data. Not sure how we’re going to do it. Will take a while.
JG: we’ve talked about this and have some telemetry in Firefox, but mainly targeted at compositing backends. Wanted to do feature level checking for WebGL. Also wanted to draw a table based on WebGL stats
- https://jdashg.github.io/misc/webgl/webgl-feature-levels.html
- Maybe not particularly useful, but did include extensions in this. As a developer, it’s useful to know how many people will be able to use a particular feature (multiple render targets, vertex array objects, etc.)
DM: in the real world do we really have that happen? People coming to a standards body and redefining the feature tiers?
CW: Vulkan Portability is trying to reduce the requirements for Vulkan, no?
DM: not really the same thing.
JG: would be good to reduce the concern level.

#501 array buffer creation benchmark & Immediate uploads and friends #426, #481, #491, #506

(Buffer uploads designs hackmd: https://hackmd.io/d6sXDReIQE-9njvHQdPUTA )
CW: my gut feeling is that this is a very difficult problem that doesn’t have a good solution, and we’re finding a lot of local maximas.
CW: question: what tradeoff do we want to make? Each local maxima has a downside.
CW: I don’t think there’s one true solution we can decide on.
CW: DM do you want to discuss your latest proposal with the closure?
DM: user would provide closure whose goal is to fill out an ArrayBuffer. Closure guaranteed to be invoked between call to upload buffer, and next submit on this queue (or any queue, if multiple queues). Closure guarantees implementation can do this from most efficient area (like IPC) - can directly provide access to the space - allows impl to guarantee that lifetime of the ArrayBuffer is limited.
CW: what happens if app calls uploadBuffer several times on one thread, and then there’s submit happening on same queue on different thread?
DM: effectively, when the queue receives the submit call, sees there are a few pending callbacks waiting. Has to run callbacks (...missed some…) and then proceed to submit.
CW: so only when all callbacks have been fired does submit happen?
DM: yes.
MM: so callbacks are fired inside submit?
DM: no, I don’t think so. Would be able to be fired after the JavaScript frame.
MM: interesting. So rAF ends, and then you get callbacks.
JG: is it heuristic that buffer is idle?
DM: heuristic for the impl, yes.
RC: isn’t this true for mapBufferAsync too? Promise resolves later.
CW: mapBufferAsync, the intent is that it can take multiple frames to come back. Usage is different. uploadBuffer is for things that have to happen before the next submit.
RC: if callback isn’t called until JS exits, then it’s not “right now” but later, no?
CW: it’s for the same frame.
JG: it’d be after submit then.
MM: might be the first time that our engine has been able to run a JS callback while JS is also running. Don’t think we’d want to run that immediately so would have to run it after rAF, every time. Or, run it inside a C++ function. Call commit(), then run all JS callbacks.
JG: think this is to RC’s point: similar to mapAsync, if you put stronger guarantees on when mapAsync returned. If you mapAsync something and it’s in use (maybe not possible because of restrictions on buffer mapping) - attach / detach the buffer. If you stack it on Promises instead of callbacks, maybe fits better into web stack.
CW: worry is that both options have lots of constraints on how they’re used. uploadBuffer uses pipeline. mapAsync has state machine. In both, you have an ArrayBuffer and there’s lifecycle enforcement. In one case you have to be perfectly pipelined between submits. Other case, transition of memory back and forth between CPU and GPU. Very different. Maybe middle ground to be defined.
MM: originally i was concerned about ease of use. Concerned about Promise executing after currently recording frame. Hard for app to get right, and have to have a pool of buffers. If this callback runs after rAF then I think I agree with Rafael that it’s harder to use.
DM: not sure what kind of pooling you have in mind. It’s still pipelined. Should upload, should upload, etc. Not supposed to pool too much.
MM: simple app that wants to upload. Inside rAF, create JS function that populates the ArrayBuffer, then it gives that callback to the impl. Impl guarantees it will call it for the currently recorded frame.
DM: impl’s guaranteed to run that callback before the next submission happens on the actual queue.
KN: if it happens during this frame then everything before it also happens this frame.
MM: that’s the big benefit and why this is easier to use than buffer mapping.
CW: due to the magic of JS GC, the closure keeps a reference to whatever data you had so it’s easier to manage than C++.
RC: why is a callback closure easier than a Promise?
MM: today, you call map, and the Promise is resolved within a certain number of frames. Impossible to populate a buffer for the currently recorded frame. With DM’s proposal it is possible. Impl will run it at its leisure.
KN: is it really easier for the impl to give you recorded space between when you asked for it and Submit, rather than immediately?
DM: hard to answer that. We’re still implementing this and running into corner cases. Thinking of corner cases that we may need to address. Might not need the closure, but not sure.
MM: so if you don’t need it, then it reduces to “give me an ArrayBuffer now”, and detach it.
DM: there’s still the question of when to detach the ArrayBuffer.
JG: you could say, at frame boundary, all these things become detached.
CW: you’d want it at submit boundary. Notion of frame doesn’t exist.
JG: worth noting: these are all different ways to talk about the same things. They’re all satisfiable with the framework we have today, with some tracking around which indirect upload buffers are mapped. Could see which buffers are idle, map them eagerly, unmap when don’t want them.
CW: agree. Also, there are use cases where buffer mapping the mapAsync way allows you to produce fewer (zero) JS objects while doing all this. DM and I walked through some examples. Looking at scene graph use cases: can get upload space garbage-free while recording objects. Also relatively similar to what devs are doing today. I’m worried about the callback idea. It’s very novel and cool and so has a lot of things that are scary. Sorry my thoughts are fuzzy.
DM: it’s still a rough idea. Don’t think we have to commit to the idea of having closures. If going on this path, we’d first agree on Queue.setSubBuffer and then go from there, figuring out more sophisticated ways to do uploads. If we can do what Jeff explains with minimal changes then I support that. Corentin and I commented on Jeff’s proposal.
JG: yes. Not going to be able to address all of this today. I will reply to those comments. Should I give 3 minute overview of mapBufferRange?
CW: maybe - sorry - how about we look and comment offline. Level of polish on that idea is not as far as the other two so think we have to iterate on it.

Next Steps

Agenda for next meeting
Two weeks from now:
- More upload stuff
- Maybe more to discuss after Khronos discussion next week?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2019 12 02

GPU Web 2019-12-02

TL;DR

Tentative agenda

Attendance

Agenda for next meeting

Feature levels

#501 array buffer creation benchmark & Immediate uploads and friends #426, #481, #491, #506

Next Steps

Clone this wiki locally