Skip to content

Minutes 2018 09 27

Corentin Wallez edited this page Jan 4, 2022 · 1 revision

GPU Web 2018-09-27 Apple Park F2F

Chair: Corentin

Scribe: Ken, Myles, and others, thanks all!

Location: Google Meet

TL;DR

  • Potential WG creation:
    • Difference: in CG you give IP for what you contribute, in WG for the whole spec
    • Chair has to be impartial in the WG
    • Agreement to have WG just stamp drafts from CG and provide the rationale for the decision to stamp or not.
  • MVP timeline:
    • Agreed to snapshot an MVP soon (like Xmas) and have the API be reasonable enough by then. Then snapshot regularly like every quarter.
    • We promise changes between MVP and v1, content will break.
  • Command buffer submission model
    • Is the submit operation on queues, or is it commandbuffer.commit()? Are command buffers transferable between threads?
    • Implicit submit on rAF could work but not for compute workloads.
    • Metal has a limit of in-flight command buffers.
    • DM will write a design doc in the repo.
  • Buffer mapping:
    • It is broken because it used explicit barriers to know when to map.
    • Proposal to add CommandBuffer.map[Read|Write]Async
    • Need to be discussed along with multi-queue to know what the interface should be (there could be a “CPU queue”) and whether concurrent CPU-GPU reads can happen.
  • Numerical fences
    • Vulkan WG got feedback from ISV that they wanted numercial fences.
    • Numerical is a “zero-cost” abstraction over one-shot fences.
    • They would have monotonicity constraints.
    • CW to put up a proposal.
  • Copy alignment requirements:
    • Benchmark data from MM is conclusive: emulation with compute is slower.
    • Consensus to not emulate, JG to gather constraints from different APIs
  • How extensions will work
    • Consensus: every extension supported by the browser are exposed on the objects (actually an impl detail). Extensions are enabled at device creation. Device has a list of limits and extensions it was created with. Non-enabled extension methods throw.
    • Discussion of whether extension validation happens synchronously or via CIN, KN to make a proposal.
  • SSBO operations (HLSL, GLSL, or Metal style)
    • Metal’s general pointers cannot be emulated efficiently on other APIs
    • HLSL and GLSL-style are equivalent up to compiler transformations.
  • Shading language
  • Push constants
    • MM will gather data.
    • Some hardware translates push constants directly, some other always emulates with UBOs.
  • Dynamic buffer offsets
    • Defer post MVP1, need ISV feedback.
  • Test suite
    • See table of options below.
    • Debate about having the CTS in JS or WASM, will need to discuss more.

Agenda

  • Potential WG creation
  • MVP timeline
  • Command submission model
  • Buffer mapping with implicit barriers
  • Numerical fences
  • How extensions will work
  • 2pm Shading language
  • Memory model
  • SSBO operations (HLSL-style, GLSL-style, or MSL-style?)
  • Dynamic buffer offset
  • Test suite
  • Memcpy alignment requirements
  • Push constants
  • Possible API Review
  • System integration
  • Agenda for next meeting

Action Items

  • DJ: draft a charter for the WG and send to the CG for review.
  • DM: Make a command submission model explainer doc in the repo.
  • JG: Investigation and proposal for multi-queue.
    • JG: Proposal for buffer mapping based on that.
  • CW: Make a PR for numerical fences
  • JG: Collect copy alignment requirements
  • KN: Formally describe how extensions throw or use CIN
  • KN: (From WebGL AIs) Investigate revamping WebGL harness and using it here

Attendance

  • Apple
    • Dean Jackson
    • Justin Fan
    • Myles C. Maxfield
    • Thomas Denney
  • Google
    • Corentin Wallez
    • Dan Sinclair
    • David Neto
    • James Darpinian
    • Kai Ninomiya
    • Ken Russell
  • Intel
    • Yunchao He
  • Microsoft
    • Chas Boyd
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert

Potential WG creation

  • DJ: Should we transition to a WG?
  • DJ: Benefits: We can make a real “recommendation” (according to the W3C’s notion of “recommendation”
  • DJ: Disadvantage: Membership is sort of restricted to W3C members, with exceptions (with permission from W3C) but they will be pressured to join the consortium at large. Some people may not be able to do this because of company policies.
  • DJ: Other groups have both! Throwing specs over the wall from the CG to the WG
  • DJ: Payments specs do this.
  • DJ: It gives the CG more freedom to do experimentation
  • DJ: Other thing: To make a WG, we would need an official charter, which the W3C membership votes on. It will almost certainly get approved, but everyone has to re-join the WG, which is another IPR requirement
  • CW: Can we change the charter over time?
  • DJ: After it’s chartered, it’s difficult. We want the charter to be somewhat broad and somewhat narrow, there is a balance to be had
  • DJ: Opinions?
  • DJ: This was Apple’s goal from the start. We want a WG. But we also like the CG idea. Realistically, the meetings would be in the CG, but the WG would be done via email
  • RC: Why can’t we just have a CG and ship whatever we want?
  • DJ: We can ship whatever we want, but it won’t be a W3C recommendation
  • DJ: Recommendations have a clear patent policy. More of the community stamps it to be royalty free
  • RC: Are there any patent concerns that would make WG better? Or do we just want a WG because it’s what everyone should do?
  • DJ: probably something in between. Two basic ways web standards developed. HTML spec, and W3C. Boils down to appearance. There are assurances that if it’s a W3C standard it’s royalty-free, etc., which they can’t really guarantee due to random patent tolls everywhere, but at least you have the W3C behind you.
  • CW: stronger IP protection appealing to IHVs in particular. Makes sense to have a WG so people in that space can participate more easily. (Would be easier for them to participate in the CG, honestly, since it’s contributory)
  • DJ: everyone in the room is a W3C member, and Yandex, too. Where’s Elviss from?
  • CW: individual.
  • DJ: easier to have an individual non-member join than, e.g., NVIDIA.
  • CW: not nvidia. Qualcomm is.
  • DJ: my suggestion: bulk of work continue to be in CG.
  • CW: WG is for stamping and IP protection.
  • DM / RC: makes sense to me.
  • MM: What would we have to do differently?
  • DJ: not much. I’ll draft the charter. You should ask your legal people to review it. Broad enough for flexibility, but not overly broad. In Apple when we join a new group or see a new charter, we do a review even if we never intend to join it. Then vote, comment. We encourage friends to vote yes too. This is a niche area and I don’t think people will vote against this.
  • YC: do we need to change the license for the WG?
  • CW: spec still under W3C doc license. Test suite under W3C Software License. Because we have our own BSD+ CLA, that stays the same as well.
  • YC: how long will the transition be?
  • DJ: not a transition. To start up the WG will take 3 months. Tell W3C we indend to, work with them, produce charter, review period is 1 month, might be comments on the charter.
  • CW: doesn’t change how we do work. Every now and then the WG looks at it and says, this is the official spec.
  • MM: and the WG is the same people.
  • CW: Yes.
  • RC: Lawyers have told me that in CG you’re only responsible for things you contribute, but in the WG you have to give things away royalty-free even if you didn’t contribute it.
  • DJ: you’re granting a royalty-free license for what’s essential for implementing the spec. For Apple, we’re not granting patents specific to Metal for WebGPU since we’re building WebGPU on top of Metal. 90 day periods at a couple of points to specifically exclude certain patents. This triggers a process where the WG thinks about it and figures out whether the spec’s affected. This has happened in the past and the WG decided to publish anyway.
  • RC: also doesn’t the chair need to be more neutral?
  • DJ: It’s fine for us unless Yunchao or you want to be a chair.
  • CW: in WG you have to be impartial but in CG you don’t.
  • MM: other chairs of other CGs have done a good job of balancing their point of views and their ability to chair.
  • DJ: we’ll assume we’re going ahead unless we hear from people in a week or two. I’ll take action items to draft a charter and send to CG.
  • RC: IS the WG going to make any formal decisions?
  • DJ: I suggest the only decision they make is whether or not to approve the document that was handed to them
  • CW: and provide info as to why they didn’t publish it.

MVP timeline

  • CW: It feels that we’re making progress on the API side, and converging. Google is implementing more and more of the specification, and making tests. Every change causes a fair amount of work.
  • CW: Thus it would be nice if we could snapshot the API soon and call it e.g. MVP 1. It might not be the final MVP.
  • CW: our proposal: by end of year we’re happy enough with API structure that we can snapshot it for the first time, and then work on MVP2.
  • MM: in 5 years will we have 17 MVPs?
  • CW: hopefully just a release candidate. Look at WebVR, WebXR. WebVR is the MVP for WebXR. There were MVPs of both. These things take time. I estimate 3 revisions before final product. They shouldn’t be shipped by default. We promise people their content will break.
  • JG: there is no content because you have to flip switches.
  • CW: people will put stuff on the Internet.
  • JG: will we consciously not take into consideration people coding to the MVP?
  • CW: yes
  • JG: like how we do it in WebGL. Behind a draft flag, people can experiment with it. But release browsers should not have WebGPU MVPs in them.
  • CW: plan in Chrome: by next year we host binaries, non-official, big warning about not secure, but have the WebGPU MVP enabled by default. Developers can use them.
  • KR: I wonder if that is a good idea - I suggest we just put it in regular Chrome behind a flag. WebVR had lots of difficulties doing this.
  • JD: Was the WebVR problems due to device/hardware drivers and bundled software?
  • KN: Yes.
  • CW: we have the problem with spirv-cross.
  • JG: main issue with shipping in browser; as we work on MVP we have to restrain ourselves to binary download sizes, etc. May restrict problem space we have to experiment with. Security doesn’t matter though since it’s not on by default.
  • CW: obviously we have disagreements about what to do in Chrome. But intent is next year you can try WebGPU either with a flag --enable-unsafe-webgpu or a special binary. Basically, developers can try it, but it’s not on by default. What do people think about trying to reach a snapshottable version of WebGPU in 3 months?
  • DM: snapshot of the spec?
  • CW: yes.
  • DM: if we continue with our same pace from the last 2-3 months I think we can reach it.
  • JG: is intent there to have a mutually interoperable MVP?
  • CW: as for interface, yes.
  • JG: context of my q: we could snapshot the API today and say that’s what we’re going to work on for next months.
  • CW: reasons for snapshot: 1) interoperable implementation / interface. 2) if we don’t as a group try to get a stable spec, there will be WIP parts for a long time.
  • JG: so, it’s a forcing function.
  • CW: yes.
  • DJ: I think it’s a good idea.
  • JG: it’s a good goal. I like the forcing function aspect of it.
  • RC: seems fine to me.
  • CW: one thing that might not be resolved in 3 months is shading language, but intersection of shading language and API is literally 2 members of a descriptor, or what you put in an ArrayBuffer and give to the API, so we don’t need to block on that if we haven’t resolved the questions by then.

Command submission model

  • DM: trying to see if we can tie command buffers to threads. If we require command buffer to be created, recorded, and committed on same thread.
    • Not that useful for reusable command buffers.
    • Requires Metal-like commitment model where you commit() on a command buffer rather than asking queue to submit cmdbufs.
  • Advantages:
    • easier to reason about in impl. Use TLS.
    • on API side, constrain IDL.
  • MM: command queue takes a lock when you submit a cmdbuf?
  • DM: yes, same as Metal semantics.
  • CW: is there a cmd queue at that point?
  • MM: cmdbuf is associated with cmdqueue because you created it from it.
  • CW: maybe no cmdqueue because it’s associated with a device?
  • JG: why does this require a Metal-like commitment model? I get that you want to record into cmdbuf without thread sync overhead. After recording, should be a thing you can pass around between threads.
  • DM: doesn’t require Metal semantics. If you restrict cmdbuf to live on same thread, you either need to copy a handle to the queue to the thread. You’re saying record cmdbuf on one thread and submit on one thread. Not expressing in Web IDL that you want it recorded on the same thread. Need recording type restricted to one thread? Would be nice to not have the extra type. I don’t think commit vs. submit makes diff to user.
  • JG: you can’t batch submit this way.
  • DM: do you need to? Dawn does deferred recording of cmdbufs. Doesn’t restrict as much.
  • JG: this is the explictness question. Do we want to make perf predictable? Or do we want our impls to have to have heuristics for batching / flushing?
  • CW: we can force a queue flush at end of rAF for graphics. If you’re using WebGPU for compute when do you know when you should submit them to GPU?
  • MM: you get another call?
  • CW: then cmdbuf commit is a shim over postMessage to a thread, and queue submit is later.
  • DM: fair point that no frame boundary for compute jobs.
  • MM: Q: in Metal it’s common for engines to traverse over large scene graph that can end up making one large encoder that has a bunch of draw/binding calls. Metal has facilities for splitting this among threads. Is this incompatible?
  • DM: i think it’s compatible. Could have parallel encoders like Metal and say that they can move to different threads. We’re not talking about encoders as much as cmdbufs.
  • JG?: maybe I’m being stubborn but I don’t see the advantage of having a separate top-level encoder and cmdbuf.
  • KN: types don’t add API complexity. Methods do.
  • DM: when you make a cmdbuf, you want top-level encoder or cmdbuf?
  • CW: device.createCmdEncoder(). encoder.finish() gives you a cmdbuf that’s transferable / sharable.
  • DM: and you don’t want to be able to move cmdbuf before you start encoding?
  • MM: definitely want different threads to be able to encode different cmdbufs.
  • CW: device is sharable.
  • KN: want to make textures, etc on different threads.
  • CW: device.giveMeAnEncoder(). Can postMessage the encoder. Or wasm sharable array of things. Main thread does submit of all of them.
  • RC: I thought cmdbufs were the immutable things between threads.
  • CW: Right. Encoder has TLS.
  • MM: encoder.finalize() gives you the immutable thing that you can pass around?
  • CW: yes, so the type can be sharable.
  • KN: sharable’s not a tag. Says that when you make a copy you get a reference to the same thing.
  • CW: not sure what’s disadvantage of this model vs. other model where you can commit from a thread. When you commit from a thread you have a lot of flushes, or explicit flush.
  • MM: would all worker threads have to hand off finalized cmdbufs to a magic thread for ordering?
  • CW: yes. If you commit in a worker then workers have to commit in the right order. In Metal solved by making cmdbufs in a queue in the right order. Order was decided when you created the cmdbufs.
  • MM: no.
  • JG: you can reserve your line in the queue.
  • MM: but you don’t have to do it that way.
  • JG: correct.
  • CW: thought in Metal you reserve your place in the queue. Encode everything, then flush and everything was ordered in the beginning.
  • CW: other way is that you decide the ordering at the end.
  • DM: another model is that the queue is sharable.
  • CW: then workers have to signal each other when it’s one’s turn to submit. That doesn’t help.
  • DM: essentially, commit semantics drags wtih it queue semantics, and this makes it more complicated than having separate encoder object. Difficulty is multiple types of encoders, and we got confused as a team when discussing them.
  • JG: part of the reason we’re confused is that they’re different from Metal encoders and it’s a namespace conflict. Consumers of our API will mainly only be using our API, so not as much confusion. Won’t be thinking about Metal, Vk, and our API all the same time.
  • JG: if it’s valuable to batch cmdbuf submissions then that’ll change the API.
  • CW: when doing Skia/Dawn, senorblanco@ found that reducing queue submits sped things up a lot on Mac AMD. DM wasn’t able to repro this though.
  • MM: so making bigger cmdbufs helped?
  • CW: rather, queue flushes were expensive.
  • DM: are you sure you weren’t hitting the same issue that MoltenVk did, the max number of cmdbufs supported by a queue?
  • CW: don’t think so b/c Dawn records 1 cmdbuf anyway. Wait, number of in-flight cmdbufs?
  • DM: yes. This would explain the huge difference. Did Dota 2 tests today. Submits 130 cmdbufs per frame. Can either stitch together into one big one or not. On AMD got difference of 5 frames out of 80, 7% speedup. Not too convincing.
  • MM: talked with metal team about this. Reason: in rendering app, you block on next drawable buffer. In compute application that mechanism doesn’t exist. This slows down the system so the GPU can keep up.
  • DM: back to original topic. Conclude: I like the finalize() step because it lets us free resources. Don’t think extra encoding stage is useful for reusable cmdbufs. Would want all types to be changed. It’s independent of finalize() stage. I’m fine with having separate encoder stage.
  • MM: where will the results of this be recorded? Won’t be obvious from the Web IDL.
  • CW: since we’re aiming for snapshottable thing in 3 months we should start writing Markdown for the concepts. We should comment on the relevant issue.
  • MM: and then close it.
  • CW: DM can you file an issue and have PR?
  • RC: pull request with Markdown explainer?
  • MM: should Web IDL have an attribute like Sharable?
  • KN: Transferable means that you pass it.
  • MM: it’s a device reference.
  • JG: all objects are references.
  • KN: some object reference can’t be copied.
  • CW: some can be to a different thread.
  • MM: if it’s clear which objects are references / copyable.
  • JG: what is it? DisableCopyable / Cloneable? We can talk about it being Sharable.
  • KR: We could put “*Ref” at the end of all interfaces that are references?
  • (discussion about whether a Command Buffer is sharable/copyable/transferable)
  • CW: in wasm we might have sharable arrays of sharable objects. Send index via shared memory. Don’t know who to send it to.
  • MM: we should focus on what’s real.
  • KN: Its possible yes, You’d maintain two arrays in the two threads.
  • CW: Either way is fine for now.
  • <post discussion:>

Buffer Mapping

  • CW: Buffer mapping doesn’t work because we don’t have implicit barriers.We should fix that! Recap:
    • At Chicago F2F we said buffer should be mappable for reading or writing.
    • Fast path: in cmdbuf, put transition to MAP_READ or MAP_WRITE. Triggers async thing where buffer will be mapped. Look at fences to know whether it’s passed / buffer’s mapped.
    • We don’t have transitions any more. There’s a doc about this in the repo.
  • https://github.com/gpuweb/gpuweb/blob/master/design/buffer-mapping.md
  • CW: map operation was enqueued in cmdbuf. Done as part of “transition” of explicit barrier. When you do transition to MAP_READ, buffer will eventually be mapped readable in your JS thread. To know when ArrayBuffer is there, have to wait for a fence to be passed. That was the idea.
  • MM: reason this is in the command stream is that you want to say, I want to be able to say I saw the results of the GPU commands previous to it in the stream?
  • (Discussion about ownership between CPU/GPU, what happens on unmap)
  • MM: first map operation is async because you have to wait for GPU to finish its work, and unmap is sync because you want flushed to GPU?
  • CW: yes. Previously, map operation redundant because transition to MAP_READ was there.
  • MM: now we no longer have MAP_READ transition.
  • CW: Right. What about WebGPUBuffer.mapReadAsync()?
  • MM: this won’t be in an encoder? Between encoders?
  • CW: between queue submits.
  • DM: noooooo…..
  • DM: let’s not mix the timeline streams. One way to communicate between the CPU and GPU timelines. I have a strawman proposal.
      1. Doesn’t make sense to transition between readable and non-readable inside a cmdbuf.
  • CW: you’re saying unmap is implicit?
  • DM: you just start using it. It’s mapped. Map on the command buffer, MapBuffer(), which makes it inaccessible to further operations in cmdbuf, and visible to CPU when cmdbuf finishes.
  • MM: you don’t want unmap in the GPU command stream. Must be synchronous.
  • CW: want unmap to be a sync operation, so we can copy into shmem.
  • DM: ok, interesting.
  • MM: so DM you mentioned 2 points.
  • KN: mapping isn’t a queue operation.
  • MM: we don’t have multiple queues. Has to be associated with a cmd queue because you have to know which cmds’ results are visible when it’s mapped.
  • KN: have to know exactly which commands.
  • JG: once you have submitted it for mapping, can’t allow enqueuements against that buffer.
  • KN: Before we talked to transition resources in queues so it made sense to transition on a queue
  • MM: so it’s the queue it’s owned by? If we’re in a world where any resource can be used by any queue, I don’t know how mapping can be safe.
  • JG: you have refcounts on the buffer, are there commands in flight.
  • CW: we talked about this with an explicit Map/Unmap on the buffer. That’s where the CPU owns the buffer. In Dawn, when you call MapAsync, what are the commands on all the queues, right now? When all finished, safe to use the buffer. We know all previous execution, so give you the signal.
  • MM: so you wait for last use on all the queues.
  • CW: or, let’s say if one queue, all cmds submitted on queue prior to mapping, even if they don’t use the buffer. Might add more latency, but probably fine because if the app’s going to map a buffer it’ll make a copy into it and then know it’ll have to map it. We think, anyway.
  • MM: when I said there was a point associated in the queue, it’s all the writes in all queues before now.
  • CW: points about latency
  • RC: so to call MapAsync you’d have to submit all the cmdbufs that use that buffer?
  • MM: no, but if you didn’t submit it yet, and call MapAsync, the command in your pocket won’t affect when it’s mapped.
  • CW: the command in your pocket you can’t submit because the buffer’s being mapped.
  • CW: 1. submit commands. 2. MapAsync. 3. Can’t touch buffer until GPU’s done with it and you get a signal that buffer’s ready.
  • Diagram drawn about implicit transitions of buffers between 3 states (CPU-owned, GPU-owned, and “in transition”). Illegal to submit new cmdbufs referring to this buffer while in transition, and until unmapped.

[BREAK]

  • MM: If the map is not associated with a queue, should be with a device
  • KN: All objects already associated with a device
  • MM: Kind of scary for backreferences
  • CW: The device is internally synchronized
  • MM: If the unmap call is on the buffer, then it knows which device it’s part of
  • CW: It might be interesting to have a distinction between buffered commands and--
  • MM: Commands that are in command buffers are in the encoder API
  • CW: Conclusion: Like before, but explicit map read/map write synchronous call
  • RC: Why would someone call async map write
  • MM: It’s like “prepare it for writing”
  • RC: I understand read async.
  • JG: write-async doesn’t make sense
  • CW: it does for browsers that have the write in process, so they can hand the buffer immediately to the app. So you could say Promise the read, Promise the write and you avoid the copy.
  • MM: How can this be portable? If some applications can see into it. On a discrete device i call mapWriteAsync, and somewhere else I call read. I shouldn’t see the memory, but I might on a phone.
  • CW/KN: It would be zero filled in that case.
  • MM: So you want to pass a range so you control which part gets clobbered.
  • JG: You can do explicit flush.
  • MM: On phones in Metal it is the same buffer. But there is also a shared mode where you have to make calls.
  • CW: We agree that you get an ArrayBuffer. The way to get it is you have a fence, until that fence is passed, buffer.mapping is null. After it is what you are looking for.
  • CW: It should be a Promise, and one promise per range.
  • CW: Let’s talk about this later, and move on to another subject.
  • JG: We rejected promises earlier because you might be waiting on multiple things, and then you’d have to wait on multiple promises.
  • CW: You will call the buffer with a map command. That’s how we fixed buffer mapping without explicit transitions. Now, how do you get the data back?
  • MM: buffer.mapReadAsync() and buffer.mapWriteAsync()
  • DM: I am not a fan of a buffer command with implicit hooks into all the queue logic. I want it to be more explicit mapping data from the GPU to CPU. Maybe the command could be on the Q or the device?
  • MM: I think he’s saying that the buffer can only be owned by one Q at a time.
  • CW: It could be the case that Q1 and Q2 have the buffer for reading. Then after you can map the buffer for reading on the CPU.
  • MM: I think the model Dzmitry is describing is where each resource is owned by a single queue. However, that doesn’t allow for two queues to be reading the same resource at the same time.
  • CW: This is an argument for mapWriteAsync()
  • KN: If you can transfer resources between queues, then we should have single-writer, multiple-reader. This means that you can mapWrite on that queue because the queue owns it.
  • CW: Then how about the CPU is a queue
  • JG: This is forcing the concept of queue.
  • MM: Are you, Kai, describing a new API or just a mental framework to think about this stuff with
  • KN: Thinking about this as an API, think it would make it easier to spec.
  • CW: Thinking as a mental framework.
  • RC:
  • JG: Could treat as transitioning to “no queue” for writing.
  • KN: For reading it can be on a CPU queue and a GPU queue.
  • RC: Will implementations know whether the buffer is read or written. Then the implementation can do the necessary waits.
  • KN: If we have a queue ownership model we should just use it.
  • RC: If you are writing on the CPU and enqueue, then ??
  • KN: Would have to transfer to another queue.
  • DM: What’s the usecase for CPU and GPU concurrent reading?
  • JG: For compute use cases?
  • MM: Like particle system where you want to “step” and get the data on the CPU and GPU
  • RC: CPU writes to a different section the GPU is reading from.
  • CW: No
  • MM: Would have problems with page size.
  • RC: So full read or full write.
  • CW: We need to talk about how to do multiple queues before we can have a consensus.
  • CW: For buffer mappings nd multi queue, we need more research, and put it on next meeting’ agenda, and make proposals.
  • CW: No meeting on Monday.

Numerical fences

  • DM:The VUlkan WG is discussing problems with Vulkan non-numerical fences. There are some problems about how to use them properly. Using D3D-12 style fences is way more developer-friendly. However, it doesn’t apply to us as much because we don’t have to have a low-level fences. One of their problems is about deleting fences. But we can figure this out. So their reasoning doesn’t strictly apply to us. But it seems worth considering. Back in the day we already discussed whether we want numerical fences or not. Corentin wanted them.
  • RC: Vulkan backends will have to emulate them?
  • CW: Yes.
  • RC: And in D3D you get the real one
  • DM: It would be fine to move to numerical fences. IT’s not a huge pain to implement. Develops are familiar with the workflow that they expose.
  • JG: I didn’t think they were necessary. I’m trying to page everything in and remember what we said previously. The only change since then was feedback from the Vulkan advisory panel.
  • RC: Should we have to make new fence objects or re-use the same one again and again
  • CW: More context: In D3D, the concept for VkFence and VkSemaphore, CPU-GPU and GPU-GPU sync uses d3d-style fences.
  • JG: Can we call it D3D-style fences instead of Numerical fences
  • JG: It’s the difference between CPU-GPU and GPU-GPU synchronization. Vulkan has a clear divide between these things with Fences and Semaphores. D3D uses the same object for both of these things. We don’t have agreement on GPU to GPU synchronization because we haven’t talked about multi-queue, and they aren’t needed for a single queue. Once you have multi-queue, it matters how you construct your dependency graph. This isn’t true for a single queue (it’s just a line).
  • MM: Not in Metal. You have a single queue and it figures out the dependency graph.
  • JG: and you can’t submit to multiple?
  • KN: there isn’t really a “single” queue in Metal, it’s abstract.
  • JG: We can talk about fences but right now the only fences are to figure out if the GPU timeline is past a certain point. If we have multiqueue we want semaphores to create a dependency graph.
  • JG: In Vulkan you have to do that via semaphores.
  • JG: the structure of cmdbuf submit in Vulkan it it takes set of semaphores to wait on, and single semaphore to signal when done.
  • MM: should know, but what’s the difference semantially between fences and semaphores?
  • JG: signal fence in cmdbuf, wait on fence on CPU. Semaphores are for submission order of jobs. Can’t query semaphore.
  • MM: aside from who waits on what what’s the difference between them? Just a name and call different functions?
  • DN: semaphore is only signaled or waited for on GPU. No CPU side representation.
  • MM: mechanically they’re the same but difference in who uses which?
  • MM: so here, are we talking about whether those are numbers / booleans, or are they different APIs?
  • JG: semaphores are distinct. You could pretend they don’t exist.
  • CW: we’d have dependencies if we had multiqueue and ownership transfer.
  • JG: in D3D can you wait on a fence in the cmdbuf?
  • CW: can make a queue wait on a fence. Wait happens on the GPU. Can get an event from it.
  • CW: would like to argue: even if we only care about fences (GPU-CPU sync), we probably still want numerical fences.
  • CW: 1. Easier for developers porting from D3D. 2. Useful for people wanting to double/triple buffer things. Don’t have to track a “monotonically increasing number” themselves, we do it for them. It’s generally useful. We have to do this tracking in our Vulkan/Metal backends, most engines would need to do the same.
  • RC: that’s so you can reuse the fence object?
  • CW: yes, creating less garbage.
  • JG: we just give them one because it’s free to do so.
  • JG: it’s a zero-cost abstraction. Sounds cool. Is it zero-cost?
  • CW: for us, yes. It’s something we do anyway.
  • KN: for resource lifetimes.
  • CW: remember someone said, might not do automatic tracking all the time like that.
  • JG: or different form.
  • CW: we’d have to map between our internal numbers and the app’s. But almost free.
  • JG: trying to figure out how you build one on top of the other. Closed problem? If so we can bake in this method.
  • KR: Problem with applications wiating on random numbers? In WebGL we wrap in opaque objects for a reason so that people don’t come up with random IDs
  • JG: don’t think that’s a problem. YOu’ll look up to see if we’ve submitted a fence for “5”, for example. If not, the fence hasn’t been hit.
  • KN: “5” is just a number. It’s not an ID that represents an object.
  • CW: these fences are strictly monotonic.
  • JG: I think this is not true in D3D.
  • CW: it’s true in Vulkan and our proposed API. Also, you can only wait on a number that’s already been signaled.
  • JG: what about waiting on a number that hasn’t been signaled?
  • RC: wait forever?
  • CW: if we have queues waiting on each other we don’t want them to wait forever.
  • MM: you can forbid deadlocks.
  • JG: you can know what you haven’t queued yet.
  • CW: nice to be monotonic so you can say “I want to be signaled when it’s at least 5.”
  • JG: is that really Vulkan proposal? Think D3D works differently.
  • RC: SignalEventOnCompletion. It’s an equals comparison.
  • JG: that lets you do triple-buffering with “0, 1, 2, 0, 1, 2”.
  • MM: so not true that everyone has numbers that go up forever. Also, wraparound.
  • KR: not 64-bit numbers in JS.
  • JG: 53 bits are enough for everybody.
  • MM: seems scary to bake something in to the API that will cause programs to stop working after a while.
  • JG: the assertion that everyone does things this way is not true. D3D12 is counterexample.
  • CW: Vulkan proposal is strictly monotonic.
  • JG: noted.
  • RC: in D3D12 doc it says it’s signaled when it says “reaches” a certain value, implying “>=” relationship.
  • JG: assuming that’s true: if it’s 2 of the 3 APIs…
  • KN: this tutorial says >= too.
  • RC: “reaches” is pretty clear to me.
  • JG: this is a place where I’d be unsurprised to see things go one way or another.
  • CW: if Chas says it’s “>=” then I’ll put up a proposal.
  • MM: you can do the 1,2,3 thing by making 3 fences?
  • JG: you can say “has triggered or not”, then arbitrary value “==” with a one-shot fence.
  • RC: will you be able to wait for a fence to finish?
  • CW: a Promise. With Kai’s proposal for TaskQueue, that’s equivalent to waiting.
  • JG: with a lot of steps.
  • KN: It’s Promise.then(stopExecutingTasks).beginExecutingTasks().
  • RC: we already have ReadAsync and WriteAsync.
  • CW: in Skia/Dawn we found that triple buffering of buffers was important.
  • RC: Read/WriteAsync doesn’t give that?
  • CW: if you only have one buffer and wait for it to be done until you submit next time then you have no parallelism.
  • RC: in Skia these are render targets?
  • CW: or vertex buffers, or whatever. You want to help the GPU have more parallelism so at least double-buffering even for GPU-only resources.
  • MM: how do you know how many resources to buffer up?
  • CW: GPU-only, two resources.
  • MM: so you just arbitrarily picked 2?
  • CW: yes. And on the CPU given the depth of our graphics pipeline we made it 3.

Copy alignment requirements

  • MM: did some data gathering.

  • MM: this is using BlitCommandEncoder vs. compute shader on two different phones.

  • JG: so it’s clear that we need to prefer Blit to shaders.

  • MM: we don’t want to give up 2-3x transfer performance.

  • CW: and the GPU got wider between the 6+ and 8+.

  • JG: we should make sure we don’t have to polyfill shaders. We have to surface alignment requirements.

  • CW: can see the cache effects on the red line.

  • MM: think we want to bake in alignment requirements in our work.

  • JG: what req’s are coming to us? It’s a standard size in D3D12?

  • CW: Let’s find out! (->AI)

  • MM: you saying why it’s different in Metal and D3D?

  • RC: the number in the docs is what we negotiated with the IHVs.

  • KN: there’s a query for this in Vulkan?

  • CW: yes, optimal alignment thing. There’s a minimum.

  • MM: so it might support 4, but the query might return 32.

  • JG: so you can copy as low as 256 bytes but only at 512 byte offsets?

  • MM: we should have a note in the spec that if you’re copying small amounts that you should write a compute shader.

  • DM: or you can copy line by line.

  • MM: for textures. We’re talking about buffers.

  • DM: 256 does not apply to buffer-to-buffer copies. Only buffer to texture. Don’t think there’s a minimum though.

  • CW: Vulkan spec says 4 is the minimum. (I think)

  • KN: table has no entries for this.

  • JG: I’ll take the AI.

[LUNCH BREAK]

How extensions will work

  • CW: discussion about shader subgroups. Not available everywhere. How do we expose this?
    • Enabling extension or not?
    • Do we expose the methods?
    • Is there an extension object?
  • MM: proposals?
  • CW: ours is, you enable extension at device creation, because of Vulkan. However, if you don’t enable the extension you get an exception.
  • MM: you call the methods on the object and you get a JS exception if they’re not there?
  • CW: the functions for all extensions Chrome supports will always be there and they’ll throw exception.
  • KN: you can’t feature detect the wrong way.
  • MM: why not extension objects?
  • CW: if you want to call Buffer.frobulate(), you’d have to call extensionObjectOverThere.frobulate(buffer). That’s why extension objects aren’t nice from a usability point of view.
  • KN: Not everything works with extension objects, methods and some arguments do, but not lifting constraints in the extension.
  • RC+MM: The set of functions on the object is all the things the browser knows about.
  • RC: How to query supported extensions?
  • CW: It is on the adapter.
  • CW: Concerns
  • JG: Should be able to reflect on the extensions that are enabled. Can’t always feature detect, for example float textures lifted a restriction. No way to feature detect.
  • KN: This is what WebGPUDevice.extensions do: it gives you a dictionary of enabled extensions.
  • RC: Would be nice for framework developers to enable extensions at runtime.
  • JG: Emscripten had the same problem with WebGL. Had to query everything.
  • KN: it’s fine for frameworks to turn everything on.
  • JG: didn’t want people to turn everything on and then expose to end users.
  • KN: only a problem if Three.js turns on all the extensions and then the end user grabs the WebGPU context.
  • JG: it’s their own fault in that case.
  • CW: we can expect framework developers to have a mode, “don’t enable anything at all”.
  • JG: I think we do have a pref for this in WebGL in Firefox. Disable all extensions.
  • Resolution: everything the browser supports are exposed on the objects. (This is an implementation detail.) You enable extensions at device creation. The device has the list of enabled extensions and limits, for querying. If you’ve exposed the functions, and they’re called without being enabled, you throw an exception.
    • Also see below.
  • MM: beginning of each extension function is: if not enabled, throw.
  • CW: I think the function should throw. If you create an object with an argument not supported without an extension being enabled, that would trigger Continuous Internal Nullability (make the thing null / invalid).
  • RC: why can’t you say, calling a function not enabled throws, and if you use an enum not enabled, throw also?
  • CW: using an enum not defined by the extension is fine. Imagine set of extensions that implies we have to do all validation synchronously.
  • MM: but the device knows what extensions are enabled.
  • CW: imagine you can create a sparse texture, but not if it’s a certain format.
  • MM: you’re just checking the presence of extensions.
  • KN: we normally don’t validate that in the frontend, we do it in the backend.
  • JD: you don’t know you have to check an extension. The enums aren’t defined by the extension, just the combination is valid / invalid.
  • Discussion about deferring validation to the GPU process.
  • CW: checking smaller things (valid enum values, etc.) are easy to validate. More complex combinations which are enabled via extensions are harder to validate.
  • KN: you get to look at each argument and validate them individually. But validation of combinations of arguments should be validated by the backend.
  • RC: slippery slope. Some browsers will validate early and throw an exception.
  • CW: that’s what I’m saying.
  • MM: don’t want to add new things that use Maybe monad.
  • CW: they all use it.
  • MM: every object in the API?
  • CW: almost all of them. A fence, maybe not?
  • KN: what if your device is lost?
  • CW: you still have the fence, just question is how it’s specified.
  • KN: if you have a lot of stuff and then your device is lost, everything’s dead. Everything depending on device is internally nullable.
  • RC: maybe we need to do this on a case-by-case basis. e.g. these circumstances will throw, these will generate internal nullability. Have to have tests.
  • KN: what about extensions the browser doesn’t know about? Behavior has to be the same between those two cases.
  • RC: too loose to say “combinations of things works this way, and checking of single parameters throws an exception”.
  • MM: if you can define this rigorously it’d be best.
  • KN: think about comprehensibility of the API too. Want to throw an exception when developer does something wrong. Want to use internal nullability when something else goes wrong.
  • CW: no. Throw exception when developer’s developing. Apart from that we just want to push commands.
  • RC: we could say, just the functions throw exceptions.
  • MM: if everthing did Maybe monad, you’d think everything is working but get a black screen?
  • KN: you’d turn on debugging layer and get exceptions.
  • Discussion of errorLog() API call.
  • KN: another case: extension adds field to descriptor. Do you silently ignore the field?
  • CW: same problem as yesterday’s ImageBitmap thing. We don’t want to ignore it. Want to say, you weren’t allowed to use this field.
  • CW: I’ll make a proposal of what errors look like with extensions, when you do something that needed the extension enabled.
    • Accessing method not enabled: exception.
    • Adding field to descriptor…
    • Suggesting every browser does the same thing, and use internal nullability.
    • Question is when exceptions vs. internal nullability are used.
  • KN: from user perspective, looks the same as a bindings check; if you pass the wrong thing into a binding, it fails. If you pass in an enum you shouldn’t use, if browser doesn’t know about it, it’ll be a binding error. If it does, it’ll be a usage error. Want to match binding errors.
  • KR: if they access an enum the browser doesn’t know about they’ll get undefined.
  • KN: enums are just strings in the API.
  • MM: it’s fine if you can rigorously describe the behavior, what throws, etc.
  • KN: think we can match the way it works to call into various browser APIs.
  • YC: in Vulkan we have physical device features. How to expose?
  • CW: physical device features are like built-in optional extensions. Built into the core Vulkan 1.0 spec. In the end, it’s just an extension conceptually.
  • Resolution: extensions are what we said before, and we make a formal description of what happens if you use extension functionality without enabling it.

SSBO operations

  • CW: Metal storage buffer works by having a pointer, and you can see it as integer, float array, etc. OpenGL style: you have a structure and can have an unsized array at the end. D3D12: bag of bytes. Or, small structured buffer. <1024 bytes.
  • MM: or Metal approach, where you get a C pointer and just dereference it.
  • MM: from API point of view do we need to distinguish between these?
  • CW: no. Just in shading language.
  • RC: are we talking about enforcing one or the other?
  • CW: can’t do Metal’s model because it doesn’t translate to D3D or Vulkan. Do we have byte-addressed or structured buffer?
  • CB: does anyone know anyone who has scenarios using the OpenGL functionality?
    • KN: you could have 2 UAVs.
  • MM: OpenGL approach has value because fields that aren’t part of the unsized array are essentially globals. Just convenience though. Propose we do it the structured buffer Microsoft way.
  • DN: it’s whatever their code uses and it uses basically all the HLSL stuff.
  • MM: if they use HLSL stuff then the OpenGL support isn’t mandatory.
  • CW: you can implement the OpenGL style using ByteAddressedBuffer.
  • DN: in Vulkan we wrap the thing up and it’s another level of “boxedness”.
  • CW: customers using SSBO + unsized array at end?
  • DN: don’t know.
  • CW: both styles translate to one another.
  • RC: is one use case for dereferencing arbitrary memory: wanting to read an image?
  • CW: we’re talking about UAV buffers. Either way’s fine. They’re both emulatable on top of the other.
  • RC: if we do what HLSL does can we emulate this in Metal and Vulkan?
  • CW: Metal can do everything. Can Vulkan emulate ByteAddressedBuffer?
  • DN: not in the baseline spec. There’s a 16-bit storage extension. Maybe followed by 8-bit storage.
  • JG: can you alias buffers?
  • DN: yes.
  • JG: so, can alias it as R8?
  • DN: alias texel buffers as images. Consistency is tricky.
  • CW: concerned about the memory model.
  • DN: you can alias any of these storage thing as each other.
  • MM: I don’t think Metal supports arbitrary aliasing.
  • JG: I’m just concerned about polyfilling ByteAddressedBuffer.
  • DN: yes. Interleaving reads/writes, you have to put special operations in between.
  • CW: since they’re basically equivalent then we could say we’re doing both.
  • MM: this is a shading language concept. Discuss at same time as shading language, just not Metal style?
  • CW: yes, sounds fine.
  • JG: not related to text based shaders vs. binaries. Related to the concepts the shader execution uses.
  • DN: it’s programming model, linked from API to shaders.
  • Discussion about structures vs. ByteAddressedBuffer.
  • CW: e.g., 3 aliased views.
  • MM: we shouldn’t make our API depend on it. The only viable one is ByteAddressedBuffer.
  • CW: SSBO one has a structure and then an unsized array. Structured buffer is a <1024 length structure + unsized array.
  • DN: isn’t byte addressed problem that you get byte offsets but can’t always write 8 bytes at a time? Some architectures are 32-bit aligned.
  • MM: why can’t you emulate it, and do masking?
  • DN: conflicting writes from threads.
  • CW: OK, ByteAddressedBuffer is implementable trivially on Vulkan. There are alignment constraints. Just a matter of language rather than whether one’s implementable on the other.
  • JG: DWORD addressable buffers.
  • CW: we can implement Vulkan SSBO, OpenGL SSBO, things using ByteAddressedBuffer, and vice versa. It’s just a shading language discussion.
  • JG: these are the concepts for feeding in data. Orthogonal from the actual shading language.

Shading languages

  • TD: presentation on WHLSL null checks and pointers.
  • MM: 1-year progress update on WHLSL.
  • DN: presentation on Vulkan memory model.
  • CW: we’ll schedule another conversation on shading language choices.

Push constants

  • MM: not much to say, this is a similar situation to memcpy, it’s something you can do mechanically but you can sometimes do it another way.
  • DN: anecdotally, push constants are desired by the people I’m supporting as a way to pass scalar arguments to kernels.
  • MM: I think an API that looks/smells like push constants is fine, but such an API should have a fallback to buffers.
  • JG: we should transparently upload as much as we want.
  • CW: implementable on top of buffers? But fewer restrictions so sometimes it can be implemented on top of buffers?
  • DN: the question on my mind is when that decision is made, etc.
  • RC: there was a proposed Metal API. Did you get to the bottom of whether that has performance cliffs?
  • MM: it does have perf cliffs.
  • TD: and a cap of 4 KB.
  • KN: hundreds of times larger than the cap on push-constants.
  • MM: apparently some devices have 64 bytes of push constants in total. Some developer will say that this is a great way to send my matrix, and run into a cliff.
  • CW: agree that an MVP matrix is not a good use case. I’d say e.g. 16 bytes as a minimum.
  • MM: I’m asking for a week to gather data.
  • DN: my customers aren’t coming from transpiling HLSL to SPIR-V. It’s a bit speculative, I need to do some experimentation.
  • RC: I’ve been told on the D3D12 side that there are >=1 IHVs who will emulate root constants as a root constant buffer. And there are also >=1 IHVs where if you have a descriptor that points to a constant buffer and you say it’s static they’ll take the constant buffer and make it a root constant buffer. And the root constant buffer can be any size.
  • MM: that’s pretty strong ammunition that there are performance cliffs.
  • KN: it’s not a perf cliff, it’s a hardware limitation.
  • CW: some hardware doesn’t have push constants. They have to use uniform buffers to implement it. The root table maps almost 1:1 to AMD’s hardware for example.
  • RC: there are levels. Root constants. Root constant buffer. Descriptor to constant buffer.

Dynamic buffer offsets

  • DM: there’s clearly a case for dynamic offsets. When you want some data to be different between draw calls and you don’t want to rebind descriptors. You can do it with push constants. Persistently mapped buffer, fill some parts, and specify offsets when you bind descriptors.
  • CW: it’s a way to avoid creating tons of descriptor to just change an offset for the uniform buffer.
  • DM: scratch space where you don’t know the offsets beforehand. Doesn’t necessarily apply to us since we don’t have persistent mapping.
  • MM: implementing this on Metal’s trivial because we don’t have descriptors.
  • CW: arguably you could do this with push constants, and an array of the data in the uniform buffer. When you have multiple draw calls using the same data but different versions, declare the uniform buffer an array of the structure, and index the array with the push constant. But if people (Dota2) don’t do that today then maybe we should expose offsets.
  • RC: how are offsets done outside Metal?
  • CW: it’s in the Vulkan API. Not sure how you do it with D3D12.
  • DM: we use the root constants to provide those offsets.
  • RC: why can’t a web developer do that?
  • DM: they can if they have access to root constants. It won’t be as efficient on Metal b/c you don’t have to use any extra data to specify offsets. On Metal, supporting dynamic offsets will be more efficient than root constants.
  • CW: I would argue the opposite.
  • CW: applications should be able to do without it, but either I’m wrong, or they don’t have the engineering to do the refactor and we should do it for them.
  • DM: if we don’t expose push constants there’s no alternative for them. And we haven’t decided on push constants.
  • CW: even if we have push constants, I would argue for it as long as it’s efficient on D3D12 because it wouldn’t require devs to restructure their shaders to use push constant + array.
  • DM: if we do both, then dynamic offsets would reduce the number of supportable push constants on D3D12.
  • CW: then reduce number of push constants to 16 bytes.
  • JG, MM: no. :)
  • RC: what’s the feature?
  • CW: concept in Vulkan. DescriptorSetLayout has a slot for uniform buffer with dynamic offset. Offset not baked in to descriptor set. Pass it when you bind the descriptor set. Allows one descriptor set to be used but you move the offset little by little.
  • MM: is it expensive to make descriptor sets?
  • CW: it’s cheap but something that’s done very very often.
  • MM: thought it was something you shouldn’t do very often.
  • CW: Draw, set bind group, set push constants. These are the 3 most common calls. Would reduce the actual changes to the bind group. I’m neutral on this proposal, can wait for ISV feedback.
  • DM: fine with waiting for feedback from ISVs.
  • MM: can implement with root constants or uniform buffer arrays on D3D12?
  • CW: On Metal, could potentially be faster on some hardware
    • KN: Dynamic offset in the shader instead of dynamically binding.
  • More discussion about implementation possibilities.
  • Resolution: defer after MVP.

Test suite

  • KN: options for how to build the test suite:
Web Native
JS only JS source Node
JS that generates C++ JS source C++ trace
Compile something to JS & C++ Generated JS Generated C++
C++ that generates JS JS trace C++ source
C++ only WASM C++ source
  • DJ: native node bypasses browser?
  • CW: that would work, but much harder to debug in native
  • KN: node is conceptually possible but debugging means everyone has to figure out how to debug a node app that has native code behind it.
  • DJ: so, what you showed yesterday, only harder.
  • Discussion of tradeoffs.
  • DJ: Kai showed debugging wasm is impossible.
  • CW: I would argue the fourth row would be good - JS trace + wasm module to run in continuous integration.
  • DJ: the fourth row is an interesting option. Easier than second row, getting C++ out of JS.
  • JG: the trace is just a recording.
  • CW: you make a WebGPU shim that says, you record this.
  • JG: impossible. Once you have logic it falls down.
  • CW: you say, I expect this to be this value, etc. Doesn’t cover all cases of suite though. Not fences, buffer mapping, when the buffer gets mapped, etc.
  • RC: and all the promises.
  • JG: if you’re just doing a recording then you can’t have any control flow. Has to be straight through.
  • CW: still covers 90%. Look at dEQP. Other 10%, if we do this, there would be a bunch of tests of JS-specific behavior: promises, exceptions.
  • KN: only generate a trace if you’re debugging a test.
  • CW: thought it was an interesting tradeoff.
  • MM: how would promises work?
  • CW: when would use a promise in a test that checks e.g. that a sampler’s working correctly?
  • MM: map buffer async.
  • KN: I think we can improve this.
  • MM: maybe we say the JS test suite contains everything that would use Promises.
  • CW: validation rules. rendering results. that don’t depend on specific Promises and JS-ness. Arguable, but 90% of API testing falls in that category.
  • JG: the majority of WebGL tests have control flow.
  • CW: the control flow is OK as long as the expectations you have are constant.
  • JG: we can call it JavaScript-script.
  • CW: I suggest it’s based on lisp.
  • MM: parens facing outward. More open.
  • JG: the crux is, some of us want to run the tests directly on a native impl of the API. Others don’t want to.
  • CW: are you going to write your own WebGPU-to-Metal thing?
  • MM: our browser has lots of tests, we’ll add them.
  • CW: the question is, are you going to embed your backend in the browser? Or write a Dawn-like thing?
  • DJ: we don’t know yet.
  • CW: anyone who chooses to use an external library benefits from native tests.
  • JG: in order to run your native library against the test suite there has to be a native API to do so.
  • MM: we aren’t spec’ing a native API.
  • CW: Dawn’s API is meant to be an obvious wasm API for WebGPU.
  • JG: I don’t think we should write all the tests as Dawn tests and compile them with wasm for WebGPU tests.
  • JG: our core need is to test the web API that we’re building. The Q is how best to go about that.
  • KN: we’re talking about testing the rendering, etc.
  • JG: we’re not looking to cross-compile the dEQP tests.
  • CW: it’s not what we did
  • KR: We backported some WebGL test into dEQP so Android developers can run them easily. It is unlikely that they will run the WebGL tests. Don’t want WebGPU to catch driver bugs.
  • JG: so what do we write in?
  • CW: Dawn’s wasm API. We say, Dawn matches GPU 100%.
  • JG: absent Dawn existing, it’s a surprising decision for this group to write a C++ test suite.
  • KR: What is the majority of the content brought to this API. Likely WASM.
  • JG: that presumes that those APIs will be using WebGPU directly via wasm.
  • CW: even if they go through a Vulkan shim they’ll still be using the WebGPU interface.
  • JG: that’ll run against the JavaScript bindings.
  • CW: we have host bindings for wasm. Bind directly to the browser’s C++ code, no JS shim.
  • JG: writing it in C++ so you can test it with your native stack is helpful to Chrome. It’s not helpful to Firefox, running it against the JS stack.
  • CW: so Firefox’s WebGPU will be directly on top of gfx-rs?
  • JG: we should test the JS API. Not the native shim around the JS API.
  • CW: it’s not a shim. It’s the wasm bindings.
  • KR: The WebGL CTS is an end2end system test of graphics and browser-side constructs. It found a lot of driver bugs that are a burden on us browser vendors. It will happen for WebGPU too.
  • JG: Why is a native test suite on Dawn better than JS test suite on browser
  • KR: Because the translation to the driver is thinner.
  • DJ: Same thing if you run on node.
  • JG: Not attractive to write the test suite in not JS and have a compile step to get to JS. If we want the testing people want for the native stack on a node shim that would be better.
  • DJ: It is almost less work because ??
  • JG: The test would both run in node Dawn and in Firefox. The node integration is not something we would touch.
  • (Discussion about various possibilities)
  • kR: In the WebGL overtime we are converging in using ANGLE + additional WebGL2 semantics. Now cemented in ANGLE. Don’t we see value in doing the same thing in WebGPU. If folks used the same interface than it would be easier to share testing of it at the lower level. We are going to have this header if only for WASM.
  • DJ: People will make node WebGPU the same way they did node-WebGL.
  • JG: There isn’t a common WebGL header.
  • KR: They have the WebGL implementation inline. In Chrome the WebGL layer is a shim over the command buffer.
  • KR: Think browsers will end up in this direction, with less code implementing the low-level stuff and have a thin JS layer.
  • DJ: Whether the amount of native code we share and the amount is big or small, still valuable if it is JS tests.
  • KR: Would be easier to upstream the test suite to GPU vendors if it is smaller. Might have a chance at convincing them the smaller it is.
  • DJ: People writing test suite in this field want the least complexity integrating tests.
  • DJ+JG: Encourage testing of the browser
  • KR: We’ve been doing this and it is way harder.
  • KN: The test suite is much more unreliable with the browser.
  • JG: Suggestion is basically it would be nicer if you can hand the Dawn CTS to GPU vendors
  • CW: And the same library from Mozilla.
  • KR: Minimizing the effort to port the test suite is in our own best interest. For WebGL we have GC bugs and it is a lot of code to carry to run basic graphics tests.
  • MM: When you get a WebGL bug that is a driver bug what do you do?
  • KR: Contact the GPU and/or OS vendors (we file radar). This week’s bug is macOS Intel GPU-specific only bug. It is an oversight in the WebGL and native test suites. Would be easier if we already had our tests running upstream. Did this with some vendors but not all. Would be easier to do with a native test suite.
  • DM: Solution is to integrate in VK-GL-CTS?
  • KR: No, if we found a driver bug, we would need to port it to the VK-GL-CTS or point Apple to it. Or we could have a thin stack (no browser and GC etc >_>) and integrate it in vendors’ CI systems.
  • MM: You already do reductions of test cases.
  • KR: Takes a HUGE amount of time, should learn from the WebGL experience. We need to trim down the browser if we want to upstream.
  • RC: If we do what you suggest, don’t we want to run the tests in the browser itself?
  • KN+KR:
  • DS: we’ve been talking a lot about what language we’re going to write the test suite in, but how are we going to actually produce the test suite?
  • CW: I propose several people take apart the OpenGL, Vulkan, etc. suites, see what they cover, think about our specific validation, think about what we missed, and then write tests.
  • JG: can also just write tests. We’re going to write the validation, and can write tests against it.
  • CW: that’s what WebGL 1.0 was doing but it wasn’t enough. Need a formal test plan. Interactions.
  • JG: can choose to dig into a feature and write a bunch of tests for edge cases. But going out and coming up with a list of tests is busy work we’ll do anyway. Don’t think we have to do a lot of planning.
  • DN: the Vulkan CTS requires a test lead and a separate group focusing on it.
  • Resolutions: unclear. Need to start work on test plan, need to figure out test harnesses, still need to figure out what language to write tests in.

Agenda for next meeting

  • Multi-queue
Clone this wiki locally