Skip to content

GPU Web 2024 05 22

Kai Ninomiya edited this page May 28, 2024 · 2 revisions

GPU Web WG 2024-05-22 Atlantic-time

Chair: CW

Scribe: KR

Location: Google Meet

Tentative agenda

  • Administrivia
  • CTS Update
  • “Tacit Resolution” Queue
  • Hosting community discussions on advanced features like ray-tracing and bindless?
  • Hardware feature levels (and compat) #4656
  • Should there be a limit on the number of open GPUCommandEncoder instances? #4622
  • [editorial] Consider if there's a much better name for "image copies" #4573
  • Add synchronous GPUAdapter.info #4550
  • Agenda for next meeting

Attendance

  • Apple
    • Mike Wyrzykowski
  • Google
    • Austin Eng
    • Brandon Jones
    • Corentin Wallez
    • Kai Ninomiya
    • Ken Russell
    • Stephen White
  • Mozilla
    • Jim Blandy
  • Nvidia
    • Anders Leino
  • Myles C. Maxfield

Administrivia

CTS Update

  • KN: Not much news to highlight.

“Tacit Resolution” Queue

  • Nothing to report.

Add synchronous GPUAdapter.info #4550

  • KN: We more or less agreed to do this, the PR is up. There are a few points people should be aware of:
    • With this change, we’re saying that the available info about an adapter may not change: once you’ve accessed it once, it will keep returning the same thing. The only reason it would change if there were a permission prompt. Now the only way to get that new info would be to get a new adapter.
    • The old adapterRequestInfo should not open a user permission prompt, because it would create strange corner cases, depending on what you’d asked for first
    • This PR markes adapterRequestInfo as deprecated but not marked for removal. We should decide if we want to remove it. FF and Safari could just ship without it, Chrome could remove it all
  • JB: no opinion about deprecation; seems pretty easy. Just immediately resolve the Promise.
  • KR: agree.
  • MW: not much of an opinion - our impl already resolves the Promise synchronously
  • BJ: mainly a question - do we need a getter and the function version which fetches the same info. I'd argue "no". Given we have 2 major impls yet to ship, seems like good opportunity - if we won't use permission prompt for function version, phase it out now. FF / Safari ship without that variant.
  • JB: seems removal impacts Chrome more than anyone else.
  • CW: we're willing to do it, keep the wart for a bit, communicate with developers, then remove it. We don't expect a lot of usage of this API.
  • KN: should I remove it?
  • CW: yes, let's just remove it from the spec and we deal with it on the Chrome side.
  • KN: OK. Spec won’t serve as a reference for it anymore, but think MDN's a good enough reference for this.

Hosting community discussions on advanced features like ray-tracing and bindless?

  • CW: hearing lot of input from the community (more than a half dozen people) that they'd like to prototype more advanced features. Nice if the CG provided them a space to discuss designs and tradeoffs. Would still take a lot of time to add to the spec proper - but get these ideas ready. Try to make progress on these without taking time away from people implementing the spec, etc.
  • AL: sounds good to me. Have been discussing some such features at NVIDIA.
  • JB: we have some support for ray tracing in wgpu. Not production, but you can play with it.
  • CW: a while back, Felix Maier prototyped ray tracing in Chromium. Will discuss logistics with Kelsey. Probably this means concretely - a separate repo like gpuweb-exploratory-raytracing or similar. Also, could have meeting slots organized, if people want that. Nothing in these meetings can discuss normative stuff in the spec; but if people want to synchronize on things in real time.
  • BJ: in WebXR we set up a "proposals" repo. Process: file an issue on that repo, gave them outlines for the structure, Setting it up today, might use Discussions rather than Issues instead. Gave us a point under the official WebXR org, but that was distinct from the other repos containing more official things. Gave them a space where they can have these discussions. But not mistaken for anything official.

Hardware feature levels (and compat) #4656

  • KN: I've posted a proposal for feature levels. Reason this came up: looking at WebGPU Native, and in some cases, we might want to default to Compat - or require they specify Compat vs. Core, to write code that runs on more devices. Should do something here, and it sounds like feature levels - we discussed here for years, but never made a proposal for it. Here is one. Replaces "compatibilityMode" boolean passed into requestAdapter. Contains min feature level field. Give me only adapters that support at least this feature level.
  • BJ: do you envision being able to override the feature level you requested for the adapter, when you request the device?
  • KN: could add a shortcut - enable all features core to that feature level. But I didn't include that. Part of idea - you request Compat, and get an adapter which supports higher stuff, but they're not enabled by default. Would need some set of features - a way to get back to Core if you request a Compat feature-level adapter.
  • KN: also, folks are interested in a "Feature Level 2".
  • JB: over the years we changed WGSL so we can "just" add stuff to it. What does this add over the evolution strategy we chose there? More of a living spec, plus explicitly-enabled features for things that aren't backward compatible. Just because we need Compat/Core distinction doesn't mean that we need more levels. If it were just the Compat/Core enum I think this wouldn't be controversial.
  • KN: that'd be the case initially. We don't have more levels defined now. Idea behind feature levels - hardware will age out of the market, slowly. Apps will want to drop support for certain hardware classes before we will do so in the browser.
  • JB: so impls can publish support for certain feature levels? As opposed to many separate features?
  • KN: yes. And we could do this as a working group.
  • MM: suppose 10 feature levels, app asks for 3, the device supports 6. Would feature levels 4/5/6 be disabled by validation?
  • KN: depends on whether you enable them when you create the device. You make the minFeatureLevel request on the adapter. Min feature level 3 and no args during createDevice - you'll get a device supporting level 3.
  • MM: why's this part of the request at all?
  • KN: to upgrade the defaults. We could put a FL on the adapter, give you a way to enable that FL on the device. But you'd have to request the adapter, look at the feature levels. Doesn't work for Compat - we can't give you a Compat adapter unless you ask for one.
  • JB: so without spec'ing minFeatureLevel - are devices required to offer lower defaults? Which defaults are affected?
  • KN: defaults for limits, and defaults for features enabled automatically (which we don't have now, but would have for future feature levels).
  • JB: can't you just offer really high limits right now?
  • KN: yes, excluding Compat. The adapter provides some higher limits/features and you can enable them if you want. Feature level gives you a shortcut for doing that. FL could be part of the device request, too. Bunch of that doesn't work for Compat, because we're moving backward from the default.
  • CW: other nice thing about getting it from the Adapter - FLs might not be totally ordered. Might have some FLs for mobile GPUs - tilers, PLS. Or immediate GPUs - raytracing, bindless, sparse, etc. Nice to be able to ask - I want something that has these capabilities. Only need to know what you care about, which is a specific FL.
  • JB: I'd be surprised if the FLs aren't totally ordered.
  • BJ: envisioning that maybe we'd have something like - this is the desktop 2026 FL, and the mobile 2026 FL. Those keep stepping forward.
  • KN: maybe those would both be supersets of the 2020 desktop/mobile, as things converge. Complicated graph of FLs.
  • CW: makes the proposal more complicated.
  • BJ: if I ask for FL3 on an FL6 device - you get FL3, and can selectively enable things up to FL6 - that's one variant. Other one - you might have a browser that says, I only expose FL6 and above. Allow us to phase out tech debt over time. In that case, you'd ask for FL3, but you'd get FL6. That's a distinct case. Exists both for what I said, as well as upgrading Compat to Core.
  • KN: we would only do it if e.g. Dawn removes Compat. We'd give you Compat validation as long as we can, for portability reasons.
  • BJ: FF and Safari would have that behavior from day 1 - people request Compat, only support Core, and get a Core adapter.
  • JB: FF does intend to ship Compat.
  • MM: clustering features together - makes a lot of sense. Having this problem in Vulkan right now. Is this part of the adapter request because Compat's a subset of full WebGPU? Other question - if you want level 3 and the device supports 6, that would work. Why are these two features, Compat as distinct from aging out old cards - using the same mechanism, when it seems they don't need to?
  • CW: because it can - if there's a FL mechanism that allows phasing out old cards, Compat looks a lot like that. An old GPU that supports OpenGL ES. Interested in FLs more than making Compat a bit nicer. Can live with the second Compat flag, just nicer to use the same mechanism.
  • KN: I agree.
  • JB: as soon as these things aren't totally ordered - people will want to enable more than one at the same time. No longer picking a specific level, but instead have a set of levels.
  • KN: in that case, you'd request a lower minimum feature level common to both of them. It's a convenience.
  • JB: can we explain more clearly how you'd desugar this into the existing API?
  • KN: yes, with the exception of Compat, because it replaces the current way of requesting it.
  • AL: if these aren't ordered, could call them Feature Sets instead of Levels. Levels make it sound ordered.
  • JB: if it's a DAG, Sets are the way to go.
  • MM: comparison to "Families" in Metal - 3 lineages of "feature levels". Mobile, desktop and common. Each has level 1/2/3. In each family it's ordered, but 3 parallel lineages. Worried about - if they're called Sets, the combinatorial explosion is large. 3 strictly increasing FLs, makes it easier to reason about.
  • CW: it's up to this WG to make sure there isn't a proliferation of FLs. These things have to be thought through, and spec'd.
  • BJ: if we talk about Feature Sets, Kai's proposal has a concept of a minimum. If the sets are disjoint, might be hard to define what that minimum is.
  • AL: couldn't you force each set to contain the Compat set, or some other minimum?
  • CW: think we'll need to come back to this topic.

Should there be a limit on the number of open GPUCommandEncoder instances? #4622

  • JB: Moz is not happy with imposing a limit on the number of GPUCommandEncoders. Memory consumption of buffering and replaying commands isn't substantial. If we impose a restriction on all users of this API because of this particular Metal limitation - would be nice to know that the numbers on Safari are so severe, and the workaround has such a severe impact, that it's prohibitive. Need a big weight to impose this kind of limitation. Would want to see some measurements - increases of KB/MB etc. Without details like that, it's not appealing.
  • MM: you mean - if every CommandQueue was created with the option "1000 max command buffers on this queue"?
  • JB: several points in the design space we discussed. One, doing an over-reservation. What's the impact of doubling the # buffers that a queue's ready to create? Another, command recording shouldn't do command encoding immediately, but buffer it up. wgpu does this already. So does Dawn. Talked with Mike about this - using memory for 2 things - my feeling is that it's true there's overlap but it's not a doubling. Think that first representation can be very compact. Not persuasive enough to me that we want to impose it on all our users.
  • MW: our biggest concern is the latency this adds. Have to wait for submit() to issue commands to the driver.
  • MM: historically, when Apple was implementing its GPU process, we started with a design like this. Queue commands, then replay upon submit(). Big perf regression, was a big deal. Immediately obvious from profiles. The first thing that we had to fix to get the GPU process to work performantly.
  • JB: maybe just do this as a fallback. When people create too many command buffers, fall back to buffering them up. A price they pay, that they opt into.
  • MW: that info's not really available to us. If we spec this, what happens when you get to this limit? The first line in the issue says that the validation for createCommandEncoder's empty in the spec.
  • CW: we could do an OOM there. Always possibility of device loss, too. Not ideal - wgpu has use cases that run into this. We don't see many. People will run into this, in production, when they least expect it. Device loss or OOM isn't great for the users. If we add a hard error, it's bad for every browser. If Safari wants graceful fallback, it can have it. Seems it should be easy to write a fallback path, because WebGPU is very stateless in its command encoding.
  • AL: a well written app would check roughly how many CPU cores are available. Would be fine on iOS where there are strict limits. Wouldn't hit the slow case where you virtualize the buffers. Why not make it robust and fall back to virtualized buffers?
  • MM: a lot of this hinges on whether apps will hit this limit or not. People are posting in the thread saying they do hit this limit. Nail down this question? A sufficient number will never be hit by real apps? Then raise the internal impl to that number, and go forward.
  • MW: WebKit doesn't let the core count be queried for security reasons. Want something in GPU supported limits - in Compat, if dev knows they'll use a certain number than what might cause problems, they request it at their own risk. Also talked with Metal team - there's a way to raise this limit, often needed when porting Vulkan games to Metal, but this isn't the optimal way to use the API.
  • JB: don't think a well written app can know in advance how many simultaneous command buffers it needs. 64 > 8, but < 1024. Can I have an architecture plugging together very disparate things? Folks concerned about load are not technical. Part of the web API responsibility to support that use case. Requiring global coordination is an anti-pattern.
  • MW: we don't understand the global coordination argument. With WebGPU, not allowing command encoding from multiple threads today, if you don't call submit() - the command buffer won't be executed.
  • MM: in safari i just checked navigator.hardwareConcurrenncy and it returns 8.
  • KN: according to a StackOverflow thread from 2019, desktop Safari returns 8 regardless of how many cores your mac has; and mobile safari returns 2 regardless of the number of cores the device has.
  • KN: you traverse scene graph and queue up all buffer updates, then render afterward. In single frame, possible to use multiple command buffers inside the frame. Don't know if people would be likely to go over 64. Esp if some are being recorded asynchronously. Plausible it could happen. I agree with Jim that there are probably good app designs where this can happen.

[editorial] Consider if there's a much better name for "image copies" #4573

[spec] What to do with the “Detailed Operations” section

Agenda for next meeting

  • Items to revisit on Pacific time
    • Add optional feature clip-distances (if there are new questions) #4588
    • Add optional feature dual_source_blending #4621
    • Push constant proposal #4612
  • Keep discussing:
    • Should there be a limit on the number of open GPUCommandEncoder instances? #4622
  • Your agenda requests here:
Clone this wiki locally