Minutes 2021 03 22

GPU Web 2021-03-22

Chair: Corentin

Scribe: Ken / Austin

Location: Google Meet

Tentative agenda

(quick?) WebGPU logo/mark
(quick?) Taking advantage of Vulkan built-in YCbCr conversion #1537
ShaderModule hints / warmup #1064
Add GPUSwapChainAlphaMode = opaque/premultiplied/non-premultiplied. #1474
Add GPU.adapters_active #1477
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Corentin Wallez
- James Darpinian
- Ken Russell
- Shrek Shao
Intel
- Brandon Jones
Kings Distributed Systems
- Hamada Gasmallah
Microsoft
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Andrew Varga
Dominic Cerisano
Francois Guthmann
Mehmet Oguz Derin
Timo de Kort

(quick?) WebGPU logo/mark

BJ: rather than fake WebGPU logo you talk about, people pick up the wgpu logo. There is a correlation - but unless they want to contribute it we probably shouldn't use it.
MM: I have a couple of designs from my employer if anyone's interested.
BJ: there's an admin issue associated with this, could put them there.

(quick?) Taking advantage of Vulkan built-in YCbCr conversion #1537

DM: Vk has this extension, also part of Vk 1.1 core, lets you make YCbCr decoding samplers, backed by HW support for multi-planar sampling. Supported on most desktops. Required on Android Vk 1.1. Support quite widespread.
- Unknown: how much this gives us vs. doing it manually. No numbers yet.
- Nice to design fat sampler approach in compatible way with YCbCr. It's not right now.
- Trying to come up with modifications to API, provide information earlier. Get exact format / colorspace for video frame. Part of the pipeline. Code generation at pipeline, instead of huge switch case inside shader.
- There are ways we can get this, but seems difficult without explicit mechanism for user to get raw data, like WebCodecs API.
- After good convo with Corentin (thanks CW!), we'll revisit this when we get WebCodecs support. But right now, fitting this in seems hard.
CW: very dynamic on web platform. Can be watching livestream, video on a site, etc., change format at any time.
JG: how much do we know about when it can change format? Is that something coming from the container formats for videos? Or you can change out the .src of the video?
CW: think it can come from the container. Not 100% positive. We could ask our media team.
JG: suspect that's true, would be nice to have it written down. Interesting that it'd probably work with WebCodecs because it tells you what format you're dealing with. Can specify a video element. Still in the problem space: the sampler will work with that video and maybe other videos
KR: WebCodecs works frame-to-frame, so you know what you get every particular frame. But mid-stream you can get a different format/resolution. Not as explicit with the <video> element.
DM: don't you have to change your decoder if things change with WebCodecs?
CW: don't think so - it's mainly that the video frame format & resolution, etc. can change frame-to-frame.
CW: for now - let's go with the more implicit route.
DM: would be nice to see what the full shader will do for the fat sampler approach. Multiple formats, clipping modes, color spaces - how bad it will be.
JG: we more or less know what it'll look like. These are constrained issues. Similar to blitting code for fast GPU uploads from various color spaces.
CW: AI: let's link to that code in our respective browser engines to see how bad it looks, in #1415.
DM: sounds great.

ShaderModule hints / warmup #1064

MM: tried to capture all the ideas from last time. Large block of IDL; understand if people didn't read through completely. Tried to recapture the general direction of what this approach is trying to accomplish.
MM: from our perspective: all of these approaches are fine. Others have opinions on what they prefer. There was a new proposal from Toji (Brandon Jones).
BJ: if we're going to allow the whole structure to be optional then we need to re-declare every level of the structure - gets messy. Maybe we can do without certain things. Cleaner thing, which you implicitly proposed - make the top-level structures optional, everything else below them required, as usual. Nice clean in-between way. Vertex layout, depth/stencil, etc. - logical sorting that DM proposed based on pipeline. Doesn't say anything about what HW would need for hinting stage. Might not be the best layout. Would make the IDL simpler, but maybe not the most beneficial. Would get a lot deeper / more complicated really quickly.
DM: had lots of hint proposals - all had fairly major issues. tried to analyze it from first principles. We were chatting on webgpu, they concluded that our basic assumption is incorrect: if we don't have any extra hints we'll do more work. They provided a way to do it without additional APIs. If this is indeed the case and we can take the time to analyze their comment in detail - I think they’re correct - then we don’t need any API.
MM: I don't think their comment is correct. The thesis of their comment is we must compile "foo" twice. It's in the 3rd-to-last paragraph. I think that's false and even if it's true it's missing the point. If entry1 and entry2 have the same pipeline layout you don't have to compile them twice. If you do have to compile them twice then we want to move that compilation earlier. I tried to capture that in my comment a few minutes ago. https://github.com/gpuweb/gpuweb/issues/1064#issuecomment-796898279 . Many small compiles are slower than one big compile. In this situation we're interested in, the createPipeline call is executed lazily as necessary during app runtime. Contrary to the idea that we want one big compilation. That's where this collection of proposals is coming from. If we allowed author to specify some but not all of pipeline state, they could do 1 big compilation upfront, which I've shown with many barcharts.
- Thesis: the statement's not true, and even if it is true, the point is to move the compilation earlier.
- DM: I didn't think that was the main point. Thought it was a nice addition to the problem of doing more work.
- MM: 2 benefits. 1) do less work. 2) less work for applications that have many entry points. createReadyPipelines is a strict win - thumbs up for that. Second goal: move compilation earlier. Take the one big compile and move it to app startup rather than runtime.
- DM: that's going to halve your pipeline creation time, if you do this earlier?
- MM: for some definition of halve, yes.
- MM: counter-proposal is: have the application call createPipeline during runtime, like they would do today - in the browser, it would say I'll remember the MtlLibrary we needed to service that call. THat's good. Browsers should do that. Doesn't work for apps with many entry points. Then solution is: have a call createReadyPipelines. Join them into 1 function call. Browser can join all requests into 1 big MtlLibrary. For apps creating pipelines lazily, they can't supply whole bunch of pipeline descriptors to createReadyPipelines.
- DM: first point, won't work for multiple entry points: proposal is that we'd see even at ShaderModule creation time what functions are used by multiple entrypoints. Compile MtlDynamicLibrary to reuse those functions. Idea is we'll never compile the same code twice if it doesn't need to be compiled for different pipeline layouts.
- MM: MtlDynamicLibrary is only HW-accelerated on a few devices. More expensive than just a function call.
- CW: MtlLibrary contains LLVM IL. On devices not supporting it in HW, imagine there's a linking stage, followed by lowering to GPU bytecode. That's good. Most of the result - cost of compiling MSL to LLVM IL - is reused. That's the point Jasper was making.
- MM: I can come back next week with a benchmark that describes why MtlDynamicLibrary would be slower than not using it.
- DM: even if it saves us from recompiling the same functions it'd be slower?
- MM: having this warmup call wouldn't cause us to recompile the same functions.
- DM: you mean slower than ideal situation where we can create MtlShaderModule from the start. OK. For me it's more important how much toward this ideal goal does Jasper's proposal get us - because it doesn't require any changes. If it gets us to 80% of the goal then I think we should do it.
- MM: what's the downside of this new entry point? App doesn't need to call it, browser doesn't need to implement it, it's just a win.
- CW: apps don't know how to call it, on what browser it matters, add'l development cost for them, add'l cost to the spec, add'l paths for app to consider. Point is, it's not just design purity. It helps everyone to keep things small and to-the-point.
- MM: all of these proposals say they should do nothing architecturally. In app, either they call this function and nothing happens, or don't call it and nothing happens.
- CW: dev cost to getting the information up front in the app.
- MM: none of these proposals request authors to provide info they don't have.
- CW: my point was if you tell people they might have a perf advantage, a lot of people will do it. But we're not providing clear info on what to do. If we can get 80% improvement, we should take it. If browsers aren't smart enough, even if we think this, my claim is: createReadyPipelines does solve 80% of the problem.
- MM: I list the three axioms of the claim we’re making, and I think createReadyPipelines solves some, but not all. 1. Applications that have many entrypoints that know the pipeline ahead of time, but not the rest of the state, and create pipelines one by one lazily. CreateReadyPipelines doesn’t help this case. I will come back next week with data about MTLDynamicLibrary, and we can go from there.
- CW: I’m ready to take your word that we shouldn’t use MTLDynamicLibrary.
- DM: I’m interested. If it doesn’t work, then we need to go back to Jasper’s plan to reevaluate.
- CW: Let’s talk separately in a meeting about this in a WebGPU office hour.
- MM: Probably won’t have anything useful in the next couple of days. We could do a week from this coming Wednesday.

Add GPUSwapChainAlphaMode = opaque/premultiplied/non-premultiplied. #1474

KR: non-premultiplied option is new to me.
JG: Was thinking we could strike that in the interest of expedience, but it does have some support on some systems. Nothing but normal premul alpha with non-opaque alpha is possible with CoreAnimation today. Opaque is a hard path to hit, so that’s not really supported either, nor does DirectComposition.
KR: Neither does ChromeOS’ compositor. Only platform that supports is Windows, so I can’t get behind this.
CW: For android as well, alpha=false is not portable. If you set alpha: false, then you need the alpha channel set to 0. Not clear if it’s pre/non-multiplied.
KR: I dropped objection to opaque because on MacOS we tried clearing the alpha channel to 1.0 and it was the same as our other workaround. So the cost is the same at least on that platform, and it will probably be the same on tilers.
JG: Wonder if we could add tracking to detect possibly when.. validation to hit the for realz fast path and not even do the clear.
KR: Don’t know the cost of the color mask on the new APIs. In OpenGL, it’s a very high cost.
JG: I mean validating that the user does the fullscreen does the full screen alpha=1 clear and eliding the one we would do.
CW: We talked about in Chicago about having an RGBX format, only provided by the swapchain. When you use it in a pipeline, it forces the colorWriteMask to be 0 for alpha channel.
JG: Feedback from ISVs is that they do use the alpha as a weird scratch area, so they would prefer for us to keep it accessible. Unfortunately, seems like it’s not implementable today.
CW: So we say opaque, and that’s it?
KR: I think, yes. And if not, it’s assumed premultiplied.
DM: Keep it an enum so we can grow it?
DM: On Windows, we have direct comp support for opaque, but we would still need to clear the alpha in case they read back. We need consistent behavior.
KR: You could do the setting on readback.
RC: I thought when you’re done with the frame, you can do whatever you want, and it’s cleared to zeros on the next frame? So we wouldn’t need to clear anything on readPixels.
KR: But if you read back at the end of the frame, you should see alpha=1.0.
RC: During presentation after the end of the frame..
JG: This proposal is not saying that alpha=1, it’s saying that you have an alpha channel and we ignore it.
DM: Can you get the contents from the canvas? Can you observe if it’s cleared.
JG: toDataUrl would probably get the cleared alpha. It’s supposed to give you the pixel data on the frontbuffer that’s presented to the page at the time. If that’s the case, we would want to give you opaque data.
KR: I believe it’s specified to be synchronous, so if you’re iteratively rendering to the canvas you should see it at each step. But with WebGPU as well the readback results should give alpha 1.0/
JG: Trick is that’s two different code paths depending on where you are in the frame. If you haven’t rendered yet, we give you the frontbuffer. IF you have started, we give you the backbuffer. The backbuffer would have alpha values, but the frontbuffer would not.
KR: Ok, we might need to have the code paths have consistent behavior otherwise there will be some horrible flickering.
JG: Yea, we can spec toDataUrl or etc as clearing opaque to 1.0. Probably reasonable if you want midframe backbuffer.
JG: Oh we can’t do this - we use swapchains. Does toDataUrl give you the contents of the current acquired swapchain surface?
RC: Why don’t we say if if you’ve touched it, and you call toDataUrl, you get what you’ve rendered so far.
JG: That does work - probably need to file an issue about what toDataUrl gives you. If you call it much later, you should get the front buffer for sure. That’s for right clicking and saving as an image.
KR: Might not be possible to get the front buffer. In order to satisfy toDataUrl, you always incur the cost of WebGL’s preserveDrawingBuffer. We don’t want to need to do that every time.
JG: Anyway, we shouldn’t hold up this proposal on how toDataUrl works. We can move forward.
- Filed: https://github.com/gpuweb/gpuweb/issues/1548

Add GPU.adapters_active #1477

Closed. Adapters will be invalidated.

Agenda for next meeting

toDataUrl, right-click canvas "Save As" behavior?
- JG: will have an issue filed for this.
- KR: think about CanvasCapture to MediaStream.
CW: WebGPU office hours in 10 days. Discuss MtlLibrary further.
- JG: will schedule it. Need list of email addresses.
CW: we could make the next meeting a WGSL meeting if not too many agenda items.
- JG: we'll see how tomorrow goes.
CW: tomorrow WGSL discusses if they want to take the WebGPU meeting next monday. Otherwise decide if we want to cancel, do PR burndown, etc.
JG: not sure if everyone in WGSL meetings is available during this timeslot Mondays.
- CW: please ask during WGSL meeting tomorrow.
CW (later): Query API: What is the GPU timestamp unit on Metal? #1325

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2021 03 22

GPU Web 2021-03-22

Tentative agenda

Attendance

(quick?) WebGPU logo/mark

(quick?) Taking advantage of Vulkan built-in YCbCr conversion #1537

ShaderModule hints / warmup #1064

Add GPUSwapChainAlphaMode = opaque/premultiplied/non-premultiplied. #1474

Add GPU.adapters_active #1477

Agenda for next meeting

Clone this wiki locally