Skip to content

Minutes 2018 12 10

Corentin Wallez edited this page Jan 4, 2022 · 1 revision

GPU Web 2018-12-10

Chair: Corentin

Scribe: Ken

Location: Google Meet

TL;DR

  • Agreed to not meet on Dec 24 and 31st
  • Next shading language meeting is Jan 7th 2019
  • Push constants #75
    • Myles provided a new benchmark results are mixed, using push constants can result in large regressions.
    • Results that push constants aren’t faster is surprising but benchmark looks correct.
    • AI CW (actually done by DM) to look at the benchmark more.
  • Dynamic buffer offsets #116
    • Looks good, AI CW to make an IDL PR
  • Compressed textures investigation #144
    • No format is universally supported. Could skip support for PVRTC.
    • Consensus: spec describes compressed texture support but doesn’t mandate them. Around ~3 compressed texture formats. No guaranteed support. Would be great if Basis shows up.
  • PR burndown:
    • Approved #139, #141 and #143.
    • #146 will be split in multiple PRs to make the discussion easier.
      • Discussion around inputSlot as an explicit member or an implicit value that is the index of the vertex buffer in the sequence.

Tentative agenda

  • Push constants
  • Compressed textures
  • Dynamic buffer offsets
  • More buffer mapping
  • <Your topic here>?
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Justin Fan
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Corentin Wallez
    • Dan Sinclair
    • John Kessenich
    • Kai Ninomiya
    • Ken Russell
  • Microsoft
    • Chas Boyd
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Elviss Strazdiņš
  • Mehmet Oguz Derin

Administrative Items

  • CW: Please register for F2F!
  • CW: Vacation soon. Should we skip meetings? If yes, which ones?
    • Dec 24? (yes, skip)
    • Dec 31? (yes, skip)
    • Next week (17th)?
      • Consensus is to meet.
    • Meet again on Jan 7.
  • CW: Edge & Chromium news. RC says this changes nothing about Microsoft’s participation in this group.

Push constants

  • Investigation: https://github.com/gpuweb/gpuweb/issues/75
  • https://github.com/gpuweb/gpuweb/issues/75#issuecomment-445000392
  • CW: Myles provided new benchmark - thanks!
  • MM: there were two independent requests to update the benchmark. Comparing writing into buffer, and attaching buffer to draw call, vs. writing a root constant.
    • Did this. Only have a limited number of Windows computers, ran on all available ones. 3 graphs:
      • Pushing root constant+draw call
      • Uploading data, doing draw call
    • 3 cases: Push constants +7.5% progression, +1.~% progression, -18% regression
  • MM: goal was to determine whether to include them in the API. Because the results were mixed, unclear what to do. Is benchmark bad? Or are root constants sometimes good and sometimes bad? I think the benchmark is pretty good, and the source code is out there in case any comments.
    • Given these results, I’m not sure we should be adding a new API concept of push constants b/c they’re not an improvement.
    • Tried to port the benchmark to Vulkan but it was really difficult and gave up.
  • CW: someone with AMD machine should run this there. Feedback on issue says should be a win there. In light of these results what do we think?
  • JK: seems strange that push constants would be slower than alternative, but not sure what the alternative is here.
  • MM: I was surprised too. Included the pseudocode. In root constant case: set root constant, draw indexed. Or, upload data, set root constants, draw indexed.
  • DM: Does SetGraphicsRootConstant copy from CPU to GPU memory?
  • CW: assume the memcpy in the second pseudocode is copying into a staging buffer, then SetGraphicsRootConstant (?) takes GPU VA (virtual address).
  • MM: Yes, the “...” is a big expression computing the VA of the buffer on the GPU.
  • DM: so there’s only one buffer, both CPU/GPU visible?
    • Will look closer at the code.
  • MM: for the second option (using buffers), using a buffer writable from the CPU and readable from the GPU. No staging buffer. Think that’s correct because in this case, you’ll tweak, draw, etc., staging buffer would be worse.
  • CW: agreed.
  • DM: you aren’t overwriting the same pieces of this buffer?
  • MM: No. The different draw calls have to see different values, not just overwrite them.
  • DM: is this a coherent memory type?
  • MM: yes. That’s why memcpy is part of the benchmark.
  • DM: thanks for benchmark, very impressive. Unexpected though.
  • JG: is the only work in this benchmark these updates? No other workload interspersed?
  • MM: correct.
  • JG: last axis to test, is this a pipeline issue? If the driver isn’t doing buffer subregion invalidation properly, you could have an issue where it could be slower in some cases because it’s waiting for the next thing.
  • CW: one question: what’s the Y axis measured in?
  • MM: microseconds per iteration of the for loop. Lower is better.
  • JG: do you have the raw data?
  • MM: yes, it’s linked from the second paragraph in the issue comment.
  • CW: will try to play with the benchmark a bit next week. Seems OK to proceed with just dynamic buffer offsets and try to gather more data later.
  • KR: possible to get any more ISV feedback from Vulkan WG?
  • JK: in Vulkan, push constants are supposed to be an optimization over using a uniform buffer. If there’s nothing special in the system it should be the same as a uniform buffer; if there is something special, it should go faster.
  • CW: agreed. Data shows that that’s not happening.
  • JG: super surprising. As much as benchmark is sophisticated it points to either a driver bug or a subtle issue in the benchmark.
  • CW: NVIDIA’s hardware doesn’t special case push constants. AMD’s hardware does. Fact that this shows a regression is...surprising.
  • JG: it’s suspect. Definitely not the intent of these APIs.
  • CW: I’ll look at the benchmark next week.
  • JG: would be helpful to push the benchmark to the driver vendors where it’s misbehaving and ask them what’s happening. We have contacts at NVIDIA, etc.
  • CB: how much data are we pushing?
  • MM: 8 floats. Has to be small because it has to fit into root constants.
  • JG: 32 bytes of data.
  • CW: assuming the benchmark is correct…
  • JG: I’d like to not assume the benchmark is correct
  • CW: Myles has put in a lot of work to this benchmark. Onus is on us to push the benchmark forward.
  • JG: little bit cost fallacy. Onus is on us to make the right call. One of these two things is at fault. Before we make feature decisions on the API we should figure which it is. I’ll go to NVIDIA and ask them what’s happening.
  • CB: with NVIDIA there’s a risk they’ll optimize their driver for that piece of code. We’ve been talking about getting a vertical API slice working. An API delivering a few percent perf improvement should be lower priority.
  • JG: agree, but I don’t think we should resolve to not have push constants.
  • MM: suggest tabling this for the MVP.
  • JG / CW: agreed.

Dynamic buffer offsets

  • Issue 116
  • CW: last time, no concern, but Mozilla wasn’t here last meeting. Any more issues?
  • MM: would love to see pull request.
  • JG: agreed.
  • CW: will do.

Compressed textures

  • Investigation by Jiawei from Intel about compressed texture formats
  • #144
  • CW: Conclusion: compressed texture formats are on every platform, but no universal one.
  • CW: like GL, WebGPU should define what happens with compressed textures, but not specify any individual one.
  • JG: commented just before the meeting. There are 2 choices here. Background: there’s value in having one compressed texture format. Tried to this in WebGL with ETC2 textures. Had software decode for running these emulated in ANGLE. Pulled this out due to developer feedback.
  • MM: question, emulation that was done, was it done in software or was it transcoding?
  • JG: impl was to decode into uncompressed texture format.
  • CW: microsoft contributed the latter, transcoding, but not enabled on the web, only to be used on Android phone applications that were OK with lossy transcode.
  • JG: that’s the problem; the quality gets even worse. Two approaches:
      1. Only allow formats that have actual support
      1. Always choose some set of texture formats that we can support, and qualify “what the support is” - emulated, HW supported, etc.
  • KN: would be nice if there were a universal compressed texture format.
  • JD: still research problem. Not ready to be in our standard yet.
  • KN: think we want to enable the Crunch use case. Apps should be able to ship Crunch. (A texture transcoding library and file format.)
  • CW: it’s like JPEG but instead of storing compressed RGBA8 it stores “compressed ETC2” or DXT, depending. Can be decoded on GPU using compute shaders. My hope is that this can be done on the application side.
  • KN: right. Want the ability to be there for applications to include Crunch; all the platform has to provide is the hardware supported compressed texture format. Whether we want to transcode ETC2 is orthogonal.
  • MM: can we have an API that compresses textures?
  • KN: compressing textures is very expensive.
  • JG: not free to ship these.
  • RC: value of shipping compressed textures is to ship less data to the client.
  • JG: if you have a lossy JPEG which you transcode, would get better over the wire sizes. Value on the GPUs is that they’re linearly addressible. JPEGs and WebMs are free of that restriction.
  • CW: do we want to have some compressed texture format, where we say “it’s available everywhere but might be software decompressed”? Or no compressed texture formats guaranteed?
  • MM: Any graphics APIs support emulated compressions?
  • CB: in what sense?
  • MM: in your two options Jeff, is there a precedent for that?
  • JG: we tried to emulate in ANGLE and had to take it out.
  • MM: is there any shipping emulated one?
  • JG: no. I’m just presenting the options. We could still decide we want to do option (b).
  • KN: if you use ANGLE on a platform that doesn’t have ETC2, you can still use it but we’ll decompress internally.
  • JG: propose that we do option (a) and just expose formats and extensions when we have hardware support.
  • CW: agree.
  • RC: agree. Can cross that bridge later.
  • CW: would like this group to provide an ETC2 decoder.
  • JG: do you want to publish a JS library that will let you transcode to uncompressed?
  • MM: instead of tons of extensions we should bucket them and have a small number of buckets to avoid fingerprinting.
  • KN: there are only ~3 or ~4.
  • CW: PVRTC for iOS too.
  • JG: think PVRTC is on par with ETC2 but not on par with ASTC. ASTC doesn’t have full backward compatibility support.
  • CW: so ~3 bits, and whether platform is desktop or not.
  • CB: don’t understand - thought you’d want e.g. JPEG/JPEG2000, and up to each implementation to transcode to native compressed formats?
  • JG: we can’t compress like that.
  • JD: the browser does decode JPEG for you, but you can’t re-encode it into a compressed texture format.
  • KN: would also violate license agreements.
  • JD: “transcoding” JPEG into DXT is essentially: decode, then re-encode.
  • CB: each browser would be responsible for transcoding JPEG into the compressed texture format.
  • KR: Basis is a research problem that companies like Binomial have been working on for years.
  • KR: If Basis is standardized, then this group could adopt that format in the spec.
  • CB: that’s another compressed texture format.
  • KR: yes, but the last one. :D
  • CW: API would have a way to upload JPEGs to graphics memory, uncompressed.
  • CB: that’s not been my experience - but thanks.
  • KN: on introducing other wire formats: we’re not introducing one if app chooses to use Crunch. They do whatever they want with the bytes. Just have a blob of data, and they have a wasm module that transcodes to ETC / BC / whatever. Even if we had support in the platform for Basis textures, for example, it doesn’t impact the wire format. It’s just how the bytes are consumed.
  • KR: They use dedicated hardware producing H264 streams, doesn’t exist for ETC2 / ASTC.
  • JG: complicated problem that we shouldn’t go down.
  • CW: to summarize: spec describes compressed texture support but doesn’t mandate them. Around ~3 compressed texture formats. No guaranteed support. Would be great if Basis shows up. Anything else we agree on?
  • CW: can we say each WebGPU impl has to support at least one of these 3 extensions?
  • KN: don’t think that’s possible on older iOS?
  • JG: true.
  • MM: what’s the benefit?
  • JG: you’re guaranteed to have at least one compressed texture format available.
  • DJ: if the system has to support one, but it can’t be hardware and has to software emulate it…?
  • JG: are there any platforms we want to support that doesn’t have support for these?
  • CW: SwiftShader…they decompress it. Too much codegen to handle them compressed.
  • JG: we can not require support for any of them.
  • MM: on Vulkan these are all extensions so there’ll be some devices that don’t support them.
  • KN: understood, we’re already agreed.

More buffer mapping

  • CW: apologies for missing deadline for producing homework. Would like to give chance to Mozilla to provide feedback on earlier proposal.
  • #138
  • DM: sorry, haven’t looked, all of us have been busy.
  • JG: we can look for next meeting.
  • CW: sounds good.

<Your topic here>?

  • MM: are we talking about shading languages next week or the meeting after?
  • CW: meeting after that is January. Let’s discuss Jan 7, 2019.
  • DJ: next week we’ll get Moz’s feedback on buffer mapping. Is everyone else happy with it?
  • CW: everyone else had needs to be addressed, but mainly happy with Proposal #1 (WebGPUMappedMemory). Everyone happy with setBufferSubData, and most happy with mapWriteSync.
  • DJ: can you update the proposal with these results?
  • CW: think I did. There was a lot of discussion.
  • RC: I have some opinions (mostly good) but haven’t put them in.
  • MM: I also have some and will comment today.

PR burndown

  • #139
    • WebGPUBinding renaming
    • JG: agree, WebGPUBinding is very ambiguous.
  • #141
    • Intel’s WebGPURasterizationState
    • Already got +1 from Google. Any more concerns?
    • RC: seems fine as long as we can implement it easily on all the APIs. Seems to be the case based on Yunchao’s description.
    • MM: I believe him when he says it’s implementable.
    • DM: we can put all of the DepthBias values into a struct.
    • CW: I’m OK with DepthBias being in the same struct as long as we have good defaults - which is a separate discussion.
    • JG: CORS (?) usually isn’t in there, so CCW shouldn’t be
  • #143
    • Remove ImageAspect from WebGPUTextureCopyView
    • Copying only depth or stencil of a texture is not supported on D3D12.
    • CW: Seems fine.
    • DM: reasonable.
  • #146
    • Cleanup WebGPUVertexDescriptor, from MM
    • CW: bit more contentious.
    • MM: three pieces. Should probably separate them:
        1. Idea of naming vertex attributes and buffers. Currently have bunch of buffers and attributes and the way you associate them is an int32 name. Seems fine except when you want to fill these buffer slots, the API you use is a “range based” set, say “this name” + a certain number of offsets. So having arbitrary names and range-based APIs don’t match very well. Better solution: keep range-based model, and instead of names being arbitrary, have them be based on the vertex buffer’s “sequence”. Buffers now identified by number in sequence rather than just an int.
        • CW: feedback I’ve received: was difficult to add holes in the bind groups, etc. Because it doesn’t use skeletal animation, has to have a hole, to have compatibility with other shaders.
        • MM: these are names of buffers, not attributes. Holes are totally possible using arrays in JavaScript.
        • KN: If I remember correctly, Dawn’s “input slot” came from D3D12 on D3DInputelementdescriptor struct. Limited from 0..15.
        • CW: on every API there is a limit
        • MMI there’s a limit, around 31 or 32. But you can still use arbitrary names in that range.
        • KN: I still think the descriptor should be a sparse array. Aside from that, not sure how it should work.
        • MM: you think it should be a sparse array instead of the input slot value, or both?
        • KN: instead of the input slot value.
        • MM: should add some background. Couple people on our team who were implementing this stuff and couldn’t understand how it worked. Had to explain it’s an arbitrary name.
        • KN: suggesting we remove that layer of indirection entirely?
        • MM: no. Just removing the input slot value should do it.
        • KN: I agree.
        • DJ: so then the input slot becomes an index into the array? Will we have a max length on the array?
        • KN: yes. Will be ~16 for D3D.
        • DJ: so you’ll have to update start slot.
        • DM: I agree. Even though API allows you to spread vertex buffers wherever, it encourages you to use linear ones.
        • CW: OK. So we say, the index is given by its position in the array, but you can skip elements if you really want to.
        • DJ: and you can skip by calling multiple times or passing null?
        • CW: no, this is a descriptor, used only once.
        • DM: we only have to specify descriptors with nulls.
        • MM: we could do that. Either specify arrays with [1], [3], or use undefined.
        • KN: somebody else pointed this out. What if you specify undefined for (2)? Would it unbind or leave alone? Also, native APIs have range-based APIs. So your SetVertexBuffer is split into multiple SetVertexBuffer calls.
        • DJ: that’s what I’m asking, is it a big deal to do that?
        • CW: probably not.
        • MM: let’s say slot #2 is a hole and you want to fill in slots 1 and 3. WebGPU program could call it twice, or app could call it twice.
        • KN: the only reason I’m against it is it’s ambiguous what it does.
        • MM: I’m not married to it.
        • DM: is there value in unbinding? Will our validation rules fail if something is bound and it’s in conflict?
        • MM: I’ll split this into a couple different pieces.
    • JG: do appreciate this is the third remaining PR we have, and have only 2 others blocked on these. Good job team!

Agenda for next meeting

  • JG / CW: homework to re-investigate push constants
  • CW: Buffer mapping. Will put together more spec-y text.
  • JG: still owe a proposal for barriers + multi-queue. Not sure if I’ll get to that this week.
  • CW: agenda will be open for other ideas.
Clone this wiki locally