Skip to content

Minutes 2022 10 19

Corentin Wallez edited this page Oct 25, 2022 · 1 revision

GPU Web 2022-10-19

Chair: CW

Scribe: KR

Location: Google Meet

Tentative agenda

  • Administrivia
  • CTS Update
  • “Tacit Resolution” Queue
  • createPipelineAsync() doesn't reject its promise, even if the created pipeline is invalid #3296** **(or is it agreed upon?)
  • mapAsync(WRITE) zero-fills the relevant region of the buffer #2926
  • Canvas texture expiry timing is unpredictable #3295
  • Passive Fingerprinting Surface #3101
  • (stretch) Permissions Policy for WebGPU and for unmaskHints #3483
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Corentin Wallez
    • Gregg Tavares
    • Kai Ninomiya
    • Ken Russell
  • Microsoft
    • Jesse Natalie
  • Mozilla
    • Jim Blandy
    • Kelsey Gilbert

Administrivia

  • CW: When do we start APAC meetings.
  • MM: last day in office is Nov 4
  • CW: can you make Nov 2?
  • MM: yes.
  • Nov 2 -

CTS Update

“Tacit Resolution” Queue

  • Nothing in the queue!

createPipelineAsync() doesn't reject its promise, even if the created pipeline is invalid #3296** **(or is it agreed upon?)

  • KN: It’s agreed upon, but the editors have work to do, to find out if we can do the thing we agreed upon. Probably no discussion needed here yet.
  • MM: if we reject the Promise with nothing, the user doesn't get any error messages
  • MM: TAG says - when you reject, you should reject with DOMException or JavaScript Error. Neither can hang structured data off of them.
    • There's stuff in the pipeline descriptor needing to be validated, and can fail.
  • KG: this is because we don't reject shader creation? Just give you something back?
  • MM: model in spec - we give you pipeline state object, and it's invalid state. Asking why it's invalid - should have surrounded create() call in ErrorScope.
  • KG: strange that shader & pipeline creation are different.
  • CW: bc shader creation's where most errors should be. Shader's only output to other fat object creation, …
  • KN: ErrorScope around shader creation - you'll get an error, just not structured data. Don't have anything for this on Pipeline yet. Want to be able to in the future. If rejection can only produce a String - can't do that in the async path. Need to know web platform design guidelines.
  • KG: will study more offline.
  • MM: to make this actionable - think we should resolve today is - we've resolved that we should reject the Promise. We should follow the web platform's guidance and attempt to attach structured error messages to the thing we reject with.
  • KN: we won't be adding the structured return yet. Will do it in the future.
  • CW: from previous discussions - not having ErrorScopes was why we had async creation in the first place.
  • MM: if we can't attach useful info to Promise rejection - I think we can't reject the Promise. More important to get good error messages than to reject the Promise.
  • JB: we could format the error strings and return them.

mapAsync(WRITE) zero-fills the relevant region of the buffer #2926

  • (Myles didn’t do his homework on this)
  • MM: would prefer we defer for a week. But if we can make useful progress without me let's do that.
  • KG: nothing useful to add now.
  • CW: defer for a week.

Canvas texture expiry timing is unpredictable #3295

  • KR: I have a comment in progress in https://github.com/gpuweb/gpuweb/issues/3295 but didn't have time to commit it yet, sorry.
  • KR: Two classes of issues. One, Scirra made their RAF function async. You're supposed to complete the work in the function. BUt they want to await on stuff for lazy loading inside the RAF callback. Turns out the flickering wasn't that they would get intermediate results. Either they had all the data and drew, or they didn't and skipped. Essentially stuttering and they were seeing dropped frames.
  • KR: Don't know if they stopped using async RAF callbacks. Anyway that's one category of problem: users doing incorrect things with RAF callback.
  • KR: 2) Applications expecting the backbuffer to be cleared by the browser, but for some reason the browser didn't have time (GPU back pressure, etc) so the frame isn't cleared for them and it causes bad rendering.
  • KR: That's why we advise applications to do the glClear at the beginning if they need the frame cleared at the start (should get merged with the lazy clearing).
  • KR: However the problem is that this behavior is really hard to test, there's CTS tests that have heuristics to wait for N frames etc.
  • KR: But basically some applications have bugs where the browser didn't clear for them, but also the CTS is hard to test. That's why we should have a clear definition of where the browser expires the texture.
  • Scirra's bug was this one: https://bugs.chromium.org/p/chromium/issues/detail?id=995235#c20
  • MM: WebGL behavior you described - is any browser intending to implement that same behavior in WebGPU? Where the back buffer might/might not be invalidated?
  • KG: it's the behavior you'd get getTexture() and render into it if we hadn't expired the prev texture.
  • MM: current proposal - spec will not guarantee but will strongly encourage impls to expire texture at end of task / between tasks. An impl is allowed to always clear textures between tasks. Is any impl intending to have there be the possibility of the problem Ken described? Are we disagreeing about philosophy, or the spec wording? Sounds like we all agree on what we want.
  • KG: kind of agree on goals. My main objection - don't think the problems WebGL's had were large here. We'd like to continue doing what WebGL does because I know how to implement that. New stuff - OK if impl may expire earlier. I shouldn't stand in the way if people want to add this allowance in. I'm just saying, I recommend against trying to find the specific way to write this.
  • MM: specific proposal: we accept the previous proposal (Kai's?) - there's another task queue, we expire textures on it, it's super high priority, likely between 2 different tasks your texture'll be expired.
  • KG: I don't know how to implement that though. When we pull a frame / composite it - I know how to implement that. I don't know how to add a task queue.
  • KN: task queues are a spec thing. WebGL has one for context loss events. You can use an existing one. There can be looser ordering between the two.
  • MM: tasks in task queues get interleaved. Interleaving is impl-defined. User's standpoint - what Kelsey wants to implement is like having 2 task queues interleaving in undefined way.
  • JB: so specifying it that way doesn't pin anything down?
  • MM: it goes.
  • JB: it doesn't seem to constrain impls. Feels like you've spec'd something but you haven't. What's the advantage?
  • MM: part 2 - spec would say interleving's undefined - but CHrome and Safari would definitely invalidate between tasks. Makes everyone happy?
  • CW: reason to add task queue - handling non-displayed canvases. Fixes the issue because image is produced / detached - happens on task queue as well. Spec'ing it as task queue - understand that implementing it right now on composition transactions is most convenient / usual - but spec'ing things in terms of task queues, we can go one way or the other later - browsers must implement this task queue in a certain way.
  • KG: OK.
  • KR: To be clear, my goal is portable CTS, so allowing e.g. Chrome to be tighter if e.g. Firefox is looser, doesn’t really satisfy my goal for portable CTS tests.
  • KN: Myles found - WPT tests can test "should" behavior. I'd like to write in the spec: "should" do this with high priority, so it happens after every task. CTS test has concept of warnings - could issue one, that's probably what we'll do. Can also write a test and have it fail and be documented that it's not technically required by the spec. Probably just throw a warning, and Chrome runs with warnings-as-errors.
  • MM: sounds like FF's desired impl strategy would still require countdowns in the CTS like we described before.
  • KG: one thing in WebGL that we have tests for that can't be in the CTS - generate a hard GL error if a texture upload doesn't go through GPU-GPU fast path. Something we do in our own test harness. Wanted to test this detail of our impl. Also an option for solving this in WebGPU - test harness carveouts.
  • KG: also have had WebGL CTS flakiness in the past - we're happy with the stability of the tests in this area today. Had to make sure canvases are in visible window area, otherwise wouldn't issue rAFs. Now we have the tests for them, and think they're successful.
  • KG: interesting about how this interacts with encoding MediaStreams from canvases.
  • KR: can we reach consensus? This solves this third problem, and potentially addresses others in the future.
  • KG: yes, I think so.
  • MM: also - if CTS is unrunnable in one browser or another, we can revisit.
  • Consensus - describe expiry as a task on a separate task queue. Tests will test the actual spec behavior. Browsers can have optional tests in CTS, or in their own harnesses, if they want tighter behavior. Revisit if things look unworkable in an impl.

Passive Fingerprinting Surface #3101

  • CW: you give info that can be used for fingerprinting, for free. Discussed this last time, Safari wasn't in the room. Gist - browsers can expose amount of info they want, including nothing - or things that are correlated with things already exposed by the web platform. We expect few bits to be added by WebGPU.
  • KG: I drafted a post - way too long - can make it shorter, can link to it: https://hackmd.io/G9b5FePeRU-lYvjTrg0Dxw
  • KG: high level - looks like we're exposing more bits than we actually are. Some are un-disguisable. It's in UA's joint interests to not expose them. Not sure what more we can do at the spec level, and don't think we should modify the spec more to guard against this. As UAs we're all in this together to minimize fingerprintable info.
  • CW: think this blog post is close to Chrome's position as well.
  • KG: will clean up, post, and link to it.

Permissions Policy for WebGPU and for unmaskHints #3483

  • CW: Kai, is this more of a future thing?
  • KN: think we need to discuss before V1. Or we'll break iframes.
  • KN: related to fingerprinting surface. Permissions policy / feature policy for a given feature can be:
    • I allow myself to use it
    • I allow this iframe I embed to use it
  • Default policy for WebGPU would be - page can use it, first-party iframes can, but third-party iframes would need to put attribute on iframe element to enable WebGPU in it.
  • Maybe for unmaskHints - have to figure out.
  • Commonly used for things that can produce permission prompts. Third-party iframe, asking for permissions - doing on behalf of the embedder. Can't figure out origin of iframe.
  • We don't have permission prompt for WebGPU. Given there's fingerprintability in the API - more than Canvas & WebGL - we could require this policy for WebGPU as a whole.
  • Or, only for unmaskHints - might be a permission prompt about allowing the page to identify your machine. Could put that behind an iframe feature policy.
  • Want to do this now - if we want to default to third-party iframes not being able to use these features, would break that content when hosted in third-party iframes.
  • KG: maybe ideal if we banned third-party iframes not enabled by the embedder. More ideal - you get software rendering. Would run, just not fast. Want faster? Enable attribute in iframe.
  • KN: yes. We could have more - you can access more than the base device, or you get hardware acceleration.
  • CW: we have 3 points where permissions prompt can happen: reuqestAdapter, createDevice, and getting adapter info. Can't just guard getting adapter info because we have these 2 other places.
  • MM: making software device for us (Apple) is difficult - we probably wouldn't do that. So fingerprinting fighting technique can't be to create software devices.
  • MM: Kelsey mentioned our plan - third-party iframes don't get WebGPU. If embedder delegates WebGPU privs to iframe - hadn't considered that, but sounds reasonable. Our plan - that's the level of granularity. Not appealing to say you get an adapter but not a device. Also for us, getAdapterInfo won't return anything, even for main content.
  • KG: with the understanding you'd be able to tell if you're on Intel/Apple GPU based on rendering differences?
  • MM: we'd like to mask that info as much as possible. If app wants to render bunch of things to see details of rasterization, and we see that as fingerprinting problem - we'll inject junk noise all over the place. We did this for 2D canvas for example. Timing - quanitize timers, add timing noise to shaders, dither the output, …
  • KG: understanding that this might break apps?
  • MM: yes, with that understanding. IT's a tradeoff.
  • KG: you'd detect this via heuristics?
  • MM: don't want to answer that.
  • KG: some concerns are - there are differences in math in shaders, antialiasing, etc. Some of these you can probably mask via dithering. Others, worry about ability to inject noise for some kinds of apps that are doing calculations they want to be more reliable. Running folding@home for example, trying to get math answers out, not just rendering. Injecting noise is problematic. In FF there's privacy.resistFingerprinting. Enabling - we inject noise into glReadPixels, etc. This breaks real content, unfortunately.
  • MM: we understand this is a tradeoff. We're willing to iterate to find the tradeoff.
  • KG: Here’s a survey I ran with a dozen or so people: https://kdashg.github.io/misc/webgl/fingerprint-v1.html
  • KG: Hashes a triangle's rendered output - consistent across operating systems on all devices from a specific GPU vendor, like Intel or NVIDIA. Struggled to get around this.
  • Some novel research from Adobe to provide precision guarantees in shaders - too expensive to ship. This is just some info we gathered over the years.
  • CW: jumping back: MM you said granting WebGPU permission to third-party iframes would be the way you'd go. Either embedder sends all WebGPU rights, or none - or says it can't send all of them. All-or-nothing WebGPU would probably work for us.
  • KN: fine with me. I'd like to list out all of the possible policies we could have.
    • WebGPU, at all.
    • Some form of WebGPU fingerprinting. Maybe you get software only.
    • Maybe another level, you can get hardware, but not higher limits. Could fingerprint through rendering, but not the limits.
    • There's using the unmasked hints.
    • We could have as many as all of them.
  • KR: Instead of very fine-grained permissions, can we define "webgpu" right now, and expand later with things like "webgpu-fingerprinting" or w/e?
  • MM: no opinion, but a little backwards. You're expanding the keywords, but thinking about the order of adding these things.
  • KN: as long as we don't break anything. If we have a "webgpu" flag now, we could later add "webgpu-restricted", would grant more permissions, wouldn't break existing stuff.
  • MM: better to start with a small set of abilities now, and grow that set in the future. Give everything away now, and web sites will rely on it.
  • CW: good point, we should discuss next week. Maybe: no permission, full permission, and you get "some" WebGPU.
  • CW: Kai, can you write down the list of permissions you described? And Myles' last comment on the issue, so we can discuss next week?
  • KN: I can. BTW not here next week.

Agenda for next meeting

  • mapWrite
  • Can we make progress on external texture expiry, same as canvas texture expiry?
    • KN: won't be able to write much up, but might be able to do this without me.
    • My proposal will basically be to do the same thing.
  • Permission discussion
  • CW: shader stage input/output subsetting rules. Have more info based on Chrome's impl experience.
  • KN: maybe more on the pipeline promise rejection - depends on feedback from TAG.
  • CW: pretty light agenda for next week. Please add more items to this doc.
Clone this wiki locally