Skip to content

GPU Web 2020 03 09

François Daoust edited this page Dec 1, 2020 · 1 revision

Chair: Corentin

Scribe: Kai / Austin / Corentin / Ken

Location: Google Meet

TL;DR

  • Next F2F is cancelled
  • Buffer upload
    • Discussion around Synchronous non-blocking mapWrite #594
      • Concern that there is zeroing or memcpy that needs to happen to initialize the content.
      • Some discussion about readonly/writeonly arraybuffers.
      • The proposal is not palatable and is closed.
      • Getting back to mapAsync the problem is that it over-flushes.
    • GC
      • WebAudio found that GC caused audio hitching.
      • Could make ArrayBuffer replace their pointer to data internally.
      • Browser folks need to reach out to their JS engines and see if it is possible.
  • PR burndown
    • #595 Add "comparison-sampler" binding type
      • Accepted, with some changes
    • #590 Rename BindGroup[Layout]Binding to ...Entry
      • Merged
    • #589 Rename BGLBinding.textureDimension to viewDimension
      • Merged

Tentative agenda

  • Cancelling the next F2F
  • Buffer upload
  • Investigation: Managing on-chip memory
  • PR Burndown
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Justin Fan
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Corentin Wallez
    • James Darpinian
    • Kai Ninomiya
    • Ken Russell
    • Shrek Shao
    • Ryan Harrisson
  • Intel
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Mehmet Oguz Derin

Cancelling the next F2F

  • Agreed to cancel Phoenix F2F; Khronos meeting is already cancelled
  • We should think of things we’d do at F2F meetings and do them in these meetings instead - like status updates.
  • CW: Dean, looks like everyone agrees on the charter, can you take one more edit pass and we can stamp it?

Buffer upload

  • Synchronous non-blocking mapWrite #594
  • KR: There are a lot of points in there I think we should discuss.
  • KR: The issues that JG and I have discussed, that the current designs don’t address:
    • Mapping subranges
    • Eliminating copies in gpu process architectures
  • KR: The design issue with writeToBuffer is that it’s taking client memory and copying it to shmem. At least one copy. Another concern is that it’s a fallback to GL style queued buffer uploads with staging behind them, rather than being explicit. mapWriteAsync gives you the ability to eliminate copies but maps the whole buffer causing overflushing.
  • KR: CreateBufferMapped removes a copy but it does a new allocation. Ideally we don't want to allocate all the time but reuse memory over and over again.
  • KR: Ideally want to be able to fallibly map subranges synchronously for fewer copies.
  • KR Should discuss the form of the API and what developers will want to do, and also the efficiency we can give them.
  • MM: You mentioned 0-copy but also the API lets you write only a subrange, doesn't there need to need a merge step with the existing content?
  • KR: Not zero-copy in the current proposal but faillable would give zero-copy because there is just on copy in the shared memory that's visible on the GPU. On the GPU process side it is possible to take a shmem chunk and wrap it into a GPU resource. And then there's a single GPU-GPU copy.
  • KR: JG had more thoughts on this, JG can you explain them?
  • JG: writeToBuffer because of the implementation details becomes as sophisticated as mapWrite unmap.
  • CW: not sure I follow
  • JG: imperative writeToBuffer - best you can do is ‘map/write/unmap’, can we just do that instead? Unsatisfying to punt on mapping - ‘we have a really good story on copying’. Can we implement the more powerful primitive?
  • CW: Overall problem with saying yes, we can do a shim, is that the shim needs to be tightly integrated into the application. Mapping always has a number of unfortunate side effects: garbage, or over copying, or more tracking, or needing some for of asynchronicity. We could do a shim that is just as good as writeToBuffer, but to make that easier to write, we’re adding more and more negative stuff to mapping -- GC, etc. I think we can have both primitives.
  • JG: It shouldn’t be fine to pick one or the other. I would prefer userland to do it because they know the right tradeoffs. We should offer the platform that we use, up until the point we can’t offer better tools to users.
  • KR: Do we think that mapWrite, allocating out of shmem will be less efficient than writeToBuffer?
  • DM: Yes.
  • KR: Sorry didn’t respond on Github. Creating a new memory region is already zero’ed in the implementation. I don’t know it’s better.
  • JG: That zero-ing does have some cost, but perhaps we could just give them back the same mapping that has their old data in it. (if they’re mapping for read and write).
  • DM: Don’t think you can do this. GPU process might need to copy the data into the GPU resource asynchronously. If the client maps multiple times, you’ll have to change the backing shmem to give them a new region. Can’t always return the same shmem for multiple maps.
  • JG: Sorry, don’t understand. Can you clarify?
  • DM: When you unmap (in the proposal), the GPU process gets a signal to do a copy into the resource. This copy takes time. The client at the same time could request another map of the same buffer. So you can’t give back the same data because the GPU process may still be using it. There’s a race between the CPU writing the data and the GPU reading it for the copy.
  • JG: Could stall.
  • CW: Not palatable for the web platform, but also we would have to know when to stall, which adds more negative sides.
  • JG: I think we know. For map, unmap, and remap -- that could incur a copy, and I think that would be okay.
  • CW: Yes, that’s essentially what #594 suggests. If you know the buffer is not in use, you can eagerly make a new shared memory region.
  • JG: Still think it’s interesting if you map for read/write, in the fast path we can return it directly. …
  • CW: Don’t think we can make ArrayBuffers read/writeonly
  • JG: If that’s the crux of the issue, I think we can make that happen.
  • KR: Heads up: In Java, sometimes that was a 10x performance hit. All writes and reads became virtual calls. JavaScript is different, but there may be a pitfall here.
  • KR: BTW, if data pooling is written by the content side and isn’t zero’ed out, is that okay for this proposal? not a security risk.
  • JG: Not super bullish. If some implementations give you back previous data, and some don’t, that’s not portable.
  • MM: +1. Developers might rely on that.
  • DM: Should either zero everything, or provide existing contents…
  • JG: Think my proposal does allow that. If you’re MAP_WRITE | COPY_SRC | MAP_READ..
  • KN: Because of usage flags, you know you anything you put there will always be there.
  • CW: Ah, thanks for the explanation; makes more sense.
  • KR: Does that mean copies into that buffer will require shadowing, especially on the content side?
  • JG: Yes, that would be a problem.
  • KN: I would probably say you can’t have a GPU usage and MAP_READ at the same time.
  • KR: Is GPU-writable and GPU-copyable different?
  • KN: Yes, if you only have MAP_WRITE | COPY_SRC | MAP_READ you don’t have that problem: the data can never change on the GPU process.
  • JG: Vertex / uniform data too.
  • CW: All these ideas always making tradeoffs. They still don’t reach zero-copy. Require shadowing, need state tracking, etc. A lot of applications already have the data somewhere (like they used the Fetch API) or they’re WebAssembly and already have it.
  • JG: I don’t think I’ve seen cases where you already have data and it’s not worse to map, copy into to, and unmap. Since we have to copy from the userland buffer to the shmem, it’s the same.
  • KR: mapWriteAsync over-flushes. You have to map the entire buffer. If they only write part of it.
  • MM: Yea, right now mapWriteAsync and mapReadAsync require you to map the whole buffer. But we can add offset/length.
  • CW: There was an old proposal from Kai where when you mapWriteAsync you get back an opaque object, and then can ask for ranges from that.
  • MM: So this is an API where you have an ArrayBuffer, and every time you modify part of it, you want to annotate that area, and the impl knows which part you want to update?
  • MM: That’s better than adding an offset and length?
  • AE: One promise instead of multiple promises.
  • MM: Only for writing, not reading?
  • CW: Yes.
  • KR: Appealing. Would be difficult for apps to work with asynchronous MapWrite APIs and try to figure out what regions they’re going to update in the next frame.

Design Requirements

  • DM: My main concern is that there are multiple proposals, and ideas for extensions or extra flags. It’s hard to know what exactly is being proposed. The point that Jeff made earlier is if we can shim writeToBuffer via the mapping then we shouldn’t have writeToBuffer. I actually agree with that. Convenience is not essential. One of the big things that is essential is that no garbage is produced. This can be pretty important. The other problem is zeroing things out which makes returning an ArrayBuffer slower than writeToBuffer.
  • KR: Assuming you have the data flattened like in an entity component system, you may have a lot of data that’s not tightly packed in various nodes. I don’t know if it’s realistic for applications to have data prepared and ready.
  • CW: Can do a lot of writeToBuffers.

Garbage

  • KR: Few weeks ago, one of the WebAudio architects pointed out there was a regression because of GC which caused audio hitching. This was around the collection of ArrayBuffers. I was hoping we could press our JS engine devs to improve this. A proposal was floated a while ago that would let an ArrayBuffer be a repointable object. You could pass something in and change what it points to. There might be details about detaching views, etc. Do you think that’s something worth exploring?
  • KN: What ArrayBuffer wrapper would be taken to re-map?
  • KR: Old results from map.
  • KN: So they would suddenly become valid and then invalid again?
  • CW: Kind of similar to an idea we’ve floated around. Could be avoided if for every creation, you replaceInto some existing object. Maybe that would help?
  • KN: Do think that would help a lot.
  • RC: Does that get us back to previous drawback where on different implementations, you get back different contents? Or could that be the same?
  • KN: Developer would give mapWrite the exact ArrayBuffer to repoint.
  • KR: If they’re doing upload loops, for example, they’ll have a neutered array buffer which they pass in.
  • RC: Okay, so either nothing, or a neutered existing one.
  • KR: In the general case, we could probably take in any ArrayBuffer.
  • JG: I like the [move?] semantics. I think it’s a good answer.
  • CW: Okay, so people ask JS engines; I can make a proposal about mapWriteAsync returning a promise, and then you can ask the buffer for mapped ranges.
  • KR: JG / DM is this palatable?
  • JG: Sort of what I want. fallible mapping with Promises instead of fences.
  • KR: But this makes writeToBuffer the synchronous upload option. Are you okay with that?
  • JG: Definitely not the direction I want to go.
  • MM: I’d like to see it written down. Hard for me to follow.
  • JG: I’d like to get to the place where we don’t have writeToBuffer because it’s imminently polyfillable. I can make a proposal where there’s no extra garbage with callbacks instead of Promises, or use ArrayBuffer moves. Anytime you would get a Promise, you can pass a callback instead to get rid of the garbage from the Promise object.
  • DM: Bigger picture of how I see us having uploads. Just writeToBuffer I don’t think is enough. CW was suggesting we still need mapWriteAsync which makes sense in the use case AE suggested - decoding into the mapping. In my view, we need three things. writeToBuffer+writeToTexture, mapWriteAsync, and createBufferMapped with the rule that you don’t get to map the buffer after you initially map it. One time with initial content.
  • MM: Makes sense to me.

Investigation: Managing on-chip memory

  • [Deferred to next week]

PR Burndown

  • https://github.com/gpuweb/gpuweb/pull/595 Add "comparison-sampler" binding type
    • Maps well to Vulkan and HLSL primitives
    • DM: why not just have nullable “compare” field?
    • AE: that was the suggestion on the PR.
    • DM: sure, then.
    • KN: more important part is adding the binding type.
    • MM: if it’s required for 2 APIs then that’s fine.
    • CW: we haven’t seen whether it’s okay to do this in Metal / D3D, could you check if there are constraints there?
    • AE: there’s nothing different on creation in Metal and D3D. Just matters how you write the shader and use it in there.
    • CW: resolve this offline. Need to change the IDL slightly.
    • RC: does this add any validation to shaders?
    • CW: no. Have to validate in the shader that sampler is set to comparison:true. Also set in the BindGroupLayout.
    • RC: ok, that’s the validation I was referring to.
  • https://github.com/gpuweb/gpuweb/pull/590 Rename BindGroup[Layout]Binding to ...Entry
    • ...agreed
  • https://github.com/gpuweb/gpuweb/pull/589 Rename BGLBinding.textureDimension to viewDimension
    • RC: SGTM given that the type is already GpuTextureView...Dimension.
    • MM: sounds fine.
  • Need to make more progress on canvas gpu-gpu present. Need to get it into the spec where previously we had only IDL.
    • MM: can we have a pull request on the HTML spec?
    • KN: we can document it in our spec first.
    • JG: probably should not document just in our spec, but in the higher level spec.
    • JG: actually sorry, was thinking of a different issue.

Agenda for next meeting

  • Managing on-chip memory.
  • MOAR data upload
  • PR burndown
  • PR #557
    • KN: will review it before editors’ meeting today.
Clone this wiki locally