Skip to content

Minutes 2021 08 30

Corentin Wallez edited this page Sep 10, 2021 · 1 revision

GPU Web 2021-08-30

Chair: Corentin

Scribe: Ken / others

Location: Google Meet

Tentative agenda

  • Administrivia
    • Cancelling next week's meeting
    • APAC friendly meetings
    • Define milestones to burn down existing issues
  • CTS updates
  • Dealing with holes in the pipeline layout. #2043
  • timestamp-query is unimplementable on TBDR architectures #2046
  • Should depth-stencil render attachments require views to have aspect = "all" for combined depth-stencil formats? #2062
  • Implicit GPUTextureSampleType for texture_2d<f32> #2064
  • (stretch) Feature request: ignore shader writes to color attachments #2060
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Brandon Jones
    • Corentin Wallez
    • Gregg Tavares
    • Kai Ninomiya
    • Ken Russell
    • Loko Kung
    • Shrek Shao
  • Intel
    • Brandon Jones
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
  • Francois Guthmann
  • Michael Shannon
  • Mehmet Oguz Derin
  • Timo de Kort

Administrivia

  • APAC friendly meetings
    • 5 PM Pacific OK?
    • No concerns raised.
    • Every month. First Monday/Tuesday of the month.
  • Define milestones to burn down existing issues.
    • We should triage the incoming issues to make sure we're not missing anything.
    • What kinds of milestones should we use?
    • MM: most natural way - designate one/more meetings as burndown. Go through lists. If we can't go fast, that seems reasonable.
    • MOD: (Maybe a first thing could be checking whether some issues are WGSL-only but not tagged as such)
    • CW: sounds good. Might be OK with closing investigations because they're tagged as such - easy to find again on Github.
    • MM: investigation's not closed if the feature's not added yet.
    • CW: what should the milestones be that are defined? Like post-MVP. Is MVP the "version 1" that all browsers are meant to support? Then better version of WebGPU afterward? Or do we want to have separate tags for MVP, V1, and Future?
    • DM: thought you considered the OT to be MVP. How can this be merged with V1?
    • CW: OT's a Chrome thing - shouldn't influence this group except for providing feedback from users.
    • DM: that's the point of MVP - to get feedback from users.
    • CW: correct, but we've tagged a lot of things as post-MVP that maybe should be post-V1.
    • KN: think everything post-MVP can be after stable release. So it's really "post-V1".
    • MM: isn't Chrome close to shipping the OT? Even if we wanted to ask Chrome to put it in the OT it couldn't make it.
    • KN: it already branched.
    • DM: everything in Chrome will make it into the OT, just at a later date.
    • CW: correct. MVP also had the notion that code could work between Chrome and Firefox's early access releases.
    • DM: we'll have that feedback. We have a much quicker turnaround for FF API updates because we only care about Nightly. Several weeks of timeframe where ppl will run same content on both browsers each time you break Chrome's impl :) We will have that feedback, hopefully.
    • CW: right now in terms of milestones - should only have V1 and "later".
    • DM: sounds good.
    • KR: so we're considering the MVP milestone to have already been reached?
    • CW: yes. Still a lot of MVP issues, but they're mostly WGSL. Port them over to V1.
    • CW: MM, good suggestion for the bug triage meetings.
  • FD: when does V1 ship? Under what condition?
    • CW: when we feel the spec is in shape for a candidate recommendation. Goes through several steps to be W3C recommendation.
    • CW: some risk in releasing to Stable. Only recommendation status - IP issues coming in. WebRTC just reached recommendation stage; some APIs have been shipped far before the recommendation stage.
    • FD: any condition for impls? All browsers have caught up to the implementation?
    • CW: no.

CTS updates

  • KN: None this week

Dealing with holes in the pipeline layout. #2043

  • CW: if you preassign BGLs but some pipelines don't use anything for material BGL but use the per-draw BGL, you'll have to create empty BG to bind to that thing. Do we want to do anything to make that better? Bind null, for example?
  • DM: thinking about this - only semantic where it makes sense to me is to treat same as BGL entries not specified outside of your range. You specify 2 but the device supports 4. If we treat those the same as potential holes, this fits into the logic seamlessly.
  • DM: got concerned about what user was doing. Chatted on Matrix. Didn't reach any positive conclusion.
  • DM: concern: I don't understand bit where you're saying - if shader doesn't use any material BG then I have to create empty BGL and bind it...sounds like you're typing an exception path in the code. I'd expect code to handle 0 or more entries in BGL for material. Happens to be 0, nobody cares, it works. I don't understand what extra work is implied.
  • CW: is the confusion that you create a BG that contains nothing and it's valid?
  • DM: the code where you're doing it in an automated way. Scanning shader, and there's a piece of logic figuring out what to put in BGL. Happens to be a product of your logic.
  • FD: can be that way, but if there's nothing I don't create anything - that's my logic. If shader doesn't need it I don't add it. But right now I create an empty thing and it's fine.
  • DM: no exceptions, nothing to think about here really - it works.
  • FD: yes but it's weird. Other thing - non-idiomatic. Not sure it's very explicit anywhere what's the idiomatic way. You mentioned - it's Vk pipeline layouts, they're described in several places, but not why they were designed that way.
  • DM: my non-idiomatic discussion refers specifically to what FD was describing. But maybe we can discuss this somewhere with more space and less people.
  • CW: discussion around Francois's code can be had separately - but don't think we want to defer the decision around this.
  • FD: to be clear, I'm fine with the way it is.
  • MM: is creating the empty BGL something the app author can do better than the impl? If it's the same work, moving from app code -> browser code, makes sense for browser to do it.
  • DM: I disagree. Doing it in the browser can lead to more complex spec. Instead of saying, this uses 3 BGLs, it uses 3 slots but some can be empty. Hard to reason about state when some can be null.
  • MM: FD was saying, when he wrote this code - it was easier for him to not bind that stuff. The cognitive overhead seems to be to have to bind the empty BGL.
  • DM: I don't understand - if you have general piece of code that creates BGL, you have to say, if it's empty, don't create it - that's more work.
  • CW: to rephrase - if someone ends up with empty BGL, means they're trying to have something that reuses BGs between pipelines, has more factories / helpers that do things for your engine. You're more likely, for each binding in slot 1, add to the list to create a BGL. If it's empty it magically works.
  • DM: right.
  • CW: I think the only thing the browser could do for the user - deduplicate the empty things. Creating an empty BG - it's JS garbage being created, it's cheap, but also unnecessary. That's the only thing the browser could do better than the user.
  • DM: you mean internal rather than external deduplication? Internally you already deduplicate everything.
  • CW: if you have empty BGL, you have to create empty BG, when you could just say null. Less garbage, less objects living. Corner-casey.
  • DM: if we really think this is worth it, we can allow it. Doesn't have to happen now.
  • FD: good documentation / guideline sounds better.
  • CW: I'd be happy to drop this proposal, write docs, see if there's any feedback on this issue before we make a change.

timestamp-query is unimplementable on TBDR architectures #2046

  • MM: today in this extension there are 3 function calls. 1 inside compute pass encoder, 1 inside render pass, 1 inside top-level encoder. Meaning of these calls: please write down current time.
  • MM: inside compute & render pass encoders - unimplementable on tiling renderers because they're considered atomic. Can't insert between draws.
  • MM: for tilers - only meaningful to take time at beginning/end of passes. Desktop renderers can do that too.
  • MM: if API was changed so times were only taken at begin/end of passes, anyone could implement them. Same code would run on mobile/desktop.
  • MM: proposal removes 2 problematic functions, replaces them as part of pass descriptor. Web app says, when you execute this pass, write down times into these sequences. Impl can write time in multiple places.
  • MM: for compute/render - these are distinct objects. Render passes, we may want to add add'l checkpoints. Do at end of vertex processing for example. This proposal doesn't do that - keeps them distinct for future expandibility.
  • MM: think this can be implemented everywhere, is useful, maps naturally to the 3D graphics APIs.
  • CW: there were couple comments - DM do you want to start?
  • DM: can I have multiple queries for the same location?
  • MM: didn't consider that.
  • DM: render pass timestamp locations? Would you ahve multiple things written into multiple query indices on beginning of render pass?
  • MM: aiming for - it's expressible to say "at beginning of render pass X, please write current time into this position of this query set, and this position of this query set…". Same for end of processing. Didn't consider - what if beginning of RP tries to write into position 3 of query set, and so does end of RP. Don't have any opinions there.
  • CW: can Metal write into multiple times at beginning?
  • MM: yes. Reorganized what Metal does to be more understandable.
  • CW: no strong opinion on validation we need, if any.
  • DM: is this still an optional feature?
  • MM: that's a tradeoff I don't have the info to decide. When writing this, thought - yes. Think it's worth figuring out the actual cost. If you don't think it's valuable, I won't do that work. But if valuable, I'll do it.
  • CW: because of fingerprinting we'd like it to be optional. Also on some Vk devices, timestamps are garbage (0 useful bits).
  • CW: also to make it clearer to security folks - this isn't a fingerprinting vector.
  • DM: when we discussed original timestamp queries - one use case was that there could be a tool allowing you to profile draw calls, see how long each takes on the GPU, with fancy UI. With reduced power of the feature now, it's becoming less of an issue. Ppl want per-draw times, not per-pass times. Strange middle ground between timing your command buffers.
  • CW: looking at real-time rendering presos from SIGGRAPH - ppl care about per-pass not per-draw.
  • DM: if I want to measure passes, why not split command buffers and use the portable timing we already have?
  • MM: good question, I don't' have the data for that yet. Think there's nonzero, small cost. I'd publish our graphs and test samples.
  • DM: cost Myles talks about is probably CPU cost; this is GPU cost. The CPU cost isn't what we're measuring.
  • KN: does Vk let you time subpasses? Should make sense but looking at Vk API may not be possible. WebGPU doesn't have subpasses but someday it might. Might have divisions between renderpasses. Write tile data, read in later part of pass.
  • CW: AFAIK I don't know of API for per-tile timestamps. Write timestamp after top/bottom of pipe is finished - should be able to do so per subpass.
  • MM: think it's imp't for first version of WebGPU that timestamps work the same everywhere. Can add more fine-grained timestamp feature that's not everywhere in the future. But think we shouldn't start with that.
  • RC: in spec there's already measureExecutionTime. Is that optional? Is that sufficient for V1?
  • MM: not optional. Little different. It would be reasonable to have WebGPU that only has the thing you're suggesting, and no optional features at all. Reasonable path forward. Don't think WebGPU should have that facility you're describing and an optional feature, but the optional feature only works on desktop GPUs. It's execution time of a cmdbuf, rather than single pass. Also data you're describing is data on the CPU after you're finished - might need to wait a few frames. this optional feature never hits the CPU at all. Lives on the GPU. One cmdbuf can populate the data, then next cmdbuf in the same frame can read from it.
  • CW: can go with GPUCommandBuffer.writeTimestamp, or something like your proposal. Even if GPUCommandEncoder.writeTimestamp is similar, think this is more expressive for developers, so probably a bit better.
  • DM: is it viable to have writeTimestamp? Thought this info was needed to record passes. Out-of-band is unimplementable?
  • DM: or - on Metal, Q to MM - if there's GPUCommandEncoder.writeTimestamp, is that impementable on top of the queries discussed?
  • MM: yes, but wouldn't have as high fidelity. Think you're describing WebGPU feature where there's nothing on compute/render passes, only on the top-level encoder. If app wants to time how long a compute pass took, they'd writeTimestamp, record their render pass, then say writeTimestamp after pass was closed. Are you asking, is that sufficient?
  • DM: right.
  • MM: first, this proposal takes times just inside. NUmbers slightly different. More imp't: in future want to add things like take time just inside vertex processing of render pass, wouldn't work. Today, mostly identical, but in future unable to add more flexibility.
  • DM: info up front we need - sample buffer attachments (?). Not clear how you set it up if you issue writeTimestamp on teh command encoder itself, and pass doesn't know which times it needs to measure.
  • MM: you'd implement this on Metal by writeTimestamp -> tiny blit command encoder which does writeTimestamp.
  • DM: OK, I see.
  • CW: what about writeTimestamp in … ? Feels like less overlapping in compute passes. Think less value there.
  • MM: splitting passes is expensive. Different point than made with DM. If pass will be split, rather web developer know about it and initiate it, rather than hidden behind a call. That's true not just on Metal but all TBDR devices.
  • CW: this is just for compute passes though.
  • MM: oh, thought you were talking about compute+render.
  • CW: think there's relatively little cost for splitting pass for compute.
  • MM: splitting comptue pass does have costs. Resource tracking was about splitting command buffers, not passes. We think splitting compute passes should be explicit. Think splitting command buffers should be explicit too. But not same as splitting compute encoders.
  • CW: OK.
  • DM: overall, seems strange that we're exploring novel ground for this because it's an optional feature. Think we should focus on V1, put this on sideline. Just remove current timestamp from spec in the meanwhile.
  • MM: reasonable. Our desire: we think this is valuable, but stronger desire - don't want optional feature in WebGPU and not have it work on iPhones. If we can delete optional feature, that's acceptable, but not ideal. Also, leaving writeTimestamp on encoder but not on compute/render encoders - OK, but a little sad because this functionality's useful.
  • DM: think we should iterate on this more & not put ourselves under time pressure. Remove timestamps from spec, discuss re-adding them in a way compatible with Apple devices.
  • MM: remove the entire optional feature?
  • DM: yes.
  • CW: I don't think we're under time pressure. Closer to V1, could remove if needed.
  • DM: discussing entirely new API here not seen in other APIs. Not the state where we want to fix the spec for V1.
  • MM: it's seen in Metal.
  • DM: thought we wanted to stabilize spec by end of year. Could discuss then.
  • CW: we'd love to have this feature. My point - it's easy to remove this thing, but more difficult to talk about if it's removed. Could mark it something unfinished, shoudn't ship to stable, then keep iterating. Will be optional features that will still be in the spec.
  • MM: when Chrome ships V1, you might not ship with timestamp queries?
  • CW: if CG says they're not ready, you'll need a flag to enable them. Good reason why we should defer discussion.
  • MM: we'd like one change soon - our first order desire to make it implementable on iPhones. Unhappy if we came away without any changes.
  • DM: also feel a little unhappy about shipping spec with feature that we know is broken, but promise to not implement it. If we know we need more work on a feature we shouldn't have it in the spec for v1.
  • CW: we previously talked about proving grounds for optional features as separate docs. Could be the first one. Would be happy removing writeTimestamp from render pass encoder. Some concerns about compute pass encoder.
  • KN: think we should remove writeTimestamp from render passes. OK with removing from compute passes. Remove parts that are broken.
  • DM: you want to ship V1 with this timestamp that's broken?
  • CW: right now: can't use writeTimestamp without scary flag --enable-unsafe-webgpu at Chromium init time. No plan to make that not the case. Because behind flag, not part of web platform, can break any time we want. Even if we remove extension from spec, will probably leave the code there - nobody will use it.
  • DM: there is a concern. Imagine CHrome releases header where something will break. Ppl will autogenerate Web IDL bindings to it. You'll change the semantics and break their code.
  • KN: nothing about Chromium. It's about breaking changes to the spec. Remove the things that are broken/
  • DM: if we're considering MM's spec we shouldn't have broken things in the spec.
  • CW: splitting into 2 separate docs sounds fine.
  • KN: do we still want to have that doc include render pass writeTimestamp?
  • CW: doesn't matter. DM's concern - optional feature's still evolving, we shouldn't expose in the spec because ppl will think it's part of V1. Split into separate doc.
  • KN: OK. Like core API can add things after V1, optional features should be able to do so too.
  • MM: has to be discoverable. Add in extra members to your dictionaries, run a compute pass, see if things got written?
  • KN: wouldn't need to discover the difference between a partial impl and a full impl if no browser ever ships a partial impl. Don’t see the harm in the spec having optional features we know we want but no browser ships yet.
  • CW: think this is for editors' meeting.

Should depth-stencil render attachments require views to have aspect = "all" for combined depth-stencil formats? #2062

Implicit GPUTextureSampleType for texture_2d<f32> #2064

(stretch) Feature request: ignore shader writes to color attachments #2060

Agenda for next meeting

  • CW: Port over remaining topics from this time.
  • CW: What to do about WIP optional features.
  • CW: burndown of issues if not enough items on the agenda.
  • JG: Origin Trial vs MVP vs v1
  • Intel: timestamp queries
  • Intel: chroma reconstruction
  • Intel: TF.js experience report
Clone this wiki locally