Skip to content

WGSL 2020 12 01

François Daoust edited this page Dec 2, 2020 · 1 revision

:

🪑 Chair

:

⌨️ Scribe
Google Meet

:

🗺 Location
https://webgpu.dev/wgsl

:

Specification
WGSL Issues

:

Open Issues
Marked Issues

:

Meeting Issues

Agenda

  • update IO-shareable to include bool (#1244)
  • Bools are IO-shareable, but only for builtins (#1246)
  • Add textureLoad() and textureStore() (#1261)
  • Consider whether to allow trailing commas (#1243)
  • WGSL is missing a sample_mask builtin. (#1259)
  • texture builtins: For most texture builtins, Vulkan only allows compile-time-constant offset, and with with bounded range (#1235)
  • conflict between mat2x2 layout in uniform buffer and storage buffer (#1258)
  • wsgl: Add rule that offsets must be compile time constant (#1238)
  • Out-of-bounds clamping brings big perf regression on TFJS ML demos. (#1202)
  • Proposal to enhance defining input/output variables. (#1155)
  • should empty structs be permitted? (#1222)
  • texture sampling with integer offset - portability concern (#1206)
  • WGSL: Use let instead of var? (#1234)
  • spell out when an implementation must reject a shader module (#1241)
  • Placement of read_only attributes (#1159)

📋 Attendance

WIP, the list of all the people invited to the meeting. In bold, the people that have been seen in the meeting:

  • Apple
    • Dean Jackson
    • Fil Pizlo
    • Myles C. Maxfield
    • Robin Morisset
  • Google
    • Corentin Wallez
    • Dan Sinclair
    • David Neto
    • Kai Ninomiya
    • James Darpinian
    • Brandon Jones
    • Ryan Harrison
    • Sarah Mashayekhi
    • Ben Clayton
  • Intel
    • Yunchao He
    • Narifumi Iwamoto
  • Microsoft
    • Damyan Pepper
    • Greg Roth
    • Michael Dougherty
    • Rafael Cintron
    • Tex Riddell
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Kings Distributed Systems
    • Daniel Desjardins
    • Hamada Gasmallah
    • Wes Garland
  • Dominic Cerisano
  • Joshua Groves
  • Kris & Paul Leathers
  • Lukasz Pasek
  • Matijs Toonen
  • Mehmet Oguz Derin
  • Pelle Johnsen
  • Timo de Kort
  • Tyler Larson
  • Eduardo H.P. Souza

⚖️ Discussion

update IO-shareable to include bool (#1244)

Bools are IO-shareable, but only for builtins (#1246)

  • DS: I think we are just waiting on David to update the PR and then it is ready to merge. This blocks frag depth.
  • DM: I think you mean front-facing, which is the boolean flag.
  • DM: Why are we disallowing booleans?
  • DS: I believe Vulkan cannot do that.
  • DM: Kai said that it does not conflict. It is supported for input and output.
  • DS: Only for built-in in/out, not user in/out. See comment in 1244.
  • DS: We could convert the boolean to something else. But why?
  • DM: It’s just a weird inconsistency.
  • KN: We could promote it to a small integer type without the implementation knowing.
  • DM: Is this supported for host-sharable? We’re not including uniforms.
  • DM: Let’s accept this for now. I’ll file a post-MVP issue to relax this.

[wgsl] Update storage textures to have template params for access type. #1182

  • DM: Please be ready to discuss this next time! We need to make progress.

Add textureRead() and textureWrite() (#1261)

  • DM: Last point of contention is int vs uint. Related to different issue we have filed by CW.
  • DS: That’s they way does the spec say i32 vs u32.
  • DS: Spec currently says i32 everywhere due to parsing. If you made it a u32 you’d have to suffix everything with “u”. So we parse it as an int, and then validate that it is positive.
  • DM: Didn’t we agree to have mipmap levels as a u32?
  • DS: Not sure. But same thing - it is a u32 for validation, but an i32 for parsing.
  • DM: I think the PR now should be using i32, until we have proper support for u32.
  • DS: Yes, and that change should be done everywhere that it is applicable.
  • Resolution: Accept once the PR has removed the bits changing i32 to u32s (spec text says that the value is limited)

Consider whether to allow trailing commas (#1243)

  • DJ: Sounds like everyone wants it. Is anyone against? No, resolved, accepted.

WGSL is missing a sample_mask builtin. (#1259)

  • DS: Biggest question is that this is both input and output, so what do we do?
  • DM: Could come up with different names to avoid confusion and avoid the special case of a builtin that is used in both cases.
  • DS: You’re suggesting we could have providedSampleMask and returnedSampleMask?
  • DM: I suggest investigating the uses of the input sample mask to see what we should name it.
  • DS: Agreed. I like that it is split into two variables. So, we update the PR to say we need an investigation, with the idea being we’ll probably end up with 2 sample mask builtins. One for input, one for output. Names TBD.

texture builtins: For most texture builtins, Vulkan only allows compile-time-constant offset, and with with bounded range (#1235)

  • Discussed yesterday - will use literals for the moment.
  • [Some discussion on const_expr]

should empty structs be permitted? (#1222)

  • DS: My question here is “are these useful?”. I kind of get Kai’s point. So I’m ok with having them.
  • DS: As long as there is no issue with generating them in SPIR-V. I don’t know if it is possible. This came up because it crashed on NVIDIA.
  • DM: Is it going to be a problem with the host size? Doesn’t it require us to support zero-sized bindings?
  • DS: So it is restricted to only between shader stages. Not host sharable.
  • Resolved: As above.

WGSL: Use let instead of var? (#1234)

  • DS: It is only “var” because I couldn’t describe it well enough to David at the time.
  • KN: I think david liked var because it implied that it was storage.
  • DS: But we can make that a rule for let. The only difference in JS is scoping.
  • KN: I have always said that we should go in the direction of our host language, in this case JS and thus “let” (module tradeoffs)
  • DM: Isn’t this a bit confusing, to have a “let” without an initializer at the global scope?
  • KN: JS allows this.
  • DM: Weirdly confusing. How do you feel about “let” in local scope, and “var” at the global scope.
  • DJ: In JS it doesn't matter, it’s if it’s scoped or not. JS is the host language, so maybe we should follow that, but don’t care either way.
  • DM: Trying to mimic JS could lead us on weird path for the shading language. Weak stance is to follow Rust.
  • KN: Follow rust as const->let and var -> ???
  • DM: Replace const by let at local scope. Const in local scope is different from const at module scope, so a tension to resolve.
  • KN: Talked about 3 different things for constexpr. Const could mean constexpr, let for non-constexpr values and var as is.
  • DJ: The fact everyone describes in terms of another language and which language you choose is a different thing, we need to decide whats most appropriate here
  • GR: If understanding correctly, Let means const in Rust and not in JS. That seems like a good reason to avoid altogether.
  • KN: Let means the same thing in every language except for JS. Not a good reason either as our host language is JS.
  • DJ: Is there any confusion sticking with Var. That seems like the easy thing.
  • DS: In modern JS, you’re pushed to let instead of var.
  • DJ: Which is due to weird scoping rules which you have to be aware of in JS.
  • ?: If you expect WGSL var to work like JS var (larger scope than JS let), you will get an error and be corrected.
  • KN: [that + the fact that let means const in many languages is a] Fair argument for keeping var.
  • DM: Great
  • KN: Keep const as is and come back when we have constexpr.

Out-of-bounds clamping brings big perf regression on TFJS ML demos. (#1202)

  • KN: Basically, folks at Intel working on TF.js saw a big perf loss when turning on our robustness pass in Dawn. I think it was on D3D12 backend.
  • DS: They tested both clamping and min?
  • KN: I dont’ think they tested min, because we didn’t have it implemented. What they did was look at how FXC translates and it makes an imax and imin instruction. Probably woudln’t be enough. Telling us a couple things. 1. Probably need to make robustness rules relaxed enough so we can use vulkan robustness rather then requiring implementations making it well defined with clamp. 2. In presence of that, probably need to do some optimization to remove unnecessary clamps in cases where we do use clamps.
  • DM: Are you suggesting ar[4] and using non 0-3 then implementation can silently allow that without warnings? Thats what robustness gives you.
  • KN: Yes.
  • DM: No way to discover from programmer perspective. Weird nonportable behaviour
  • KN: Could have dev option to issue warnings somehow, but we have to achieve performance and don’t see anyother way to do it. Maybe we can do it with clamps if we have enough analysis to emit unnecessary clamps. But that’s not a given and is hard. My understanding si range analysis on numbers is hard.
  • RM: Idea of clamp was portability. At least it runs same way everywhere. If we think a different default like picking min or max or something else would be more efficient then fine with changing it for everyone. Unless we have numbers showing that I’m concerned what we pick is going to be unsafe or have a cost to it. Not clear optimizing clamp will be harder then the other option we want.
  • KN: Right, I don't think optimizing clamp is harder than optimizing min. We expect to replace clamp with min as indexing will be unsigned so we no longer need the clamp
  • GR: Yup.
  • DJ: How can you replace clamp with min?
  • KN: If the index types in WGSL are unsigned we do a unsigned min with the length before using ti to do the index in the generated code
  • DJ: And that’s faster then clamp?
    KN: Don’t have an answer to that. Probably faster in some clases. FXC makes clamp imax + imin. Wasn’t that the intent? Didnt’ WGSL already make the indices unsigned to allow min instead of clamp.
  • DS: I believe we talked about it but I don’t know if spec text was written.
  • KN: We also need to know what the numbers look like with min vs clamp. But, regardless of min or clamp this is an early indicator that perf will not be good enough.
  • RC: Were the perf issues on windows?
  • KN: I think they checked on windows on D3D and Vulkan.
  • RC: On windows, unless a root constan buffer you should be getting robust buffer access
  • KN: What does that mean in D3D12?
  • RC: A descriptor where you give it a size then RB access. For computek shaders you get 0 and for vertices you get alpha 1.0 or something like that.
  • KN: So well defined?
  • RC: Yes. As long as it’s same defined with other APIs we should pick that
  • KN: Vulkan is looser.
  • RC: What does vulkan say?
  • KN: 0, 1, -1, inf, nan. Any value inside allocation could be returned. Basically anything but safely.
  • RC: That means safely but not portability guarantees
  • KN: May not be able to get perf without going there
  • RC: If you do vulkan robust buffer do you get perf issues? What’s delta?
  • DM: Like 50% of the app
  • KN: We may have it always on. They have 2 columns, enable and disable RBA. Pretty sure vulkan RBA is on in both columns, and difference is if transform is applied. Don’t know perf of Vulkan RBA but it should be better then clamping, but no worse.
  • DM: Different issues. Local array clamping vs robustness on the binding array indexes. Does’t guard local array access. If understand correctly perf is lost on local array buffer accesses not binding ranges
  • RC: Do we clamp on binding ranges?
  • KN: Yup.
  • DJ: Seems like if we ignore portability reqs, at least D3D12 could not clamp on bound arrays but still needs clamps on everything else. Third point in issue, if we always clamp for write, then could have multiple threads writing to same memory value which is still undefined.
  • KN: That came up before, but you can do that anyway.
  • DM: Really like D3D12 semantics. Mini dream for webgpu is to follow d3d12 semantics. Including local arrays. So we insert code to discard write or return 0 for read on everything. On d3d12 you can omit the code. On Vulkan can use robustness2 which is the same as d3d12
  • KN: Is it as strict as d3d12?
    DM: I believe so
  • KN: Didn’t realize that
  • RC: One caveat robustness is applied if given a descriptor with a size. Some APIs where there is no size and it does not help you. All new apis like mesh and ray tracing all given a buffer address and no clamping. Some cases in the future where we have to do something. Sometimes it saves you and sometimes it doesn’t. In the future we’ll have to save the developers themselves, we should pick the most efficient ones evne when we don’t need it. When it does save us we can omit it, but when it does’t it’s efficient.
  • RM: Curious, in a typical shader, how many of these checks can we elimient with range analysis? In my experience, returns working on this type of analysis is decreasing. Naive approach find lots of things, finding all of them is super hard. Typically we have text for entire shader can optimize across functions. Should be possible to, not a perfect job, but good enough to eliminate many of these checks. Don’t know how hard or how much it saves us.
  • KN: This inner loop they have with 2 nested loops going from 0-4 and inner 0-4. That’s the most common case. The body of the loops are guarded by inner_rows < 4 inner_cols < 4 and if these were unsigned it would be easy to omit the clamps entirely. Maybe that is the majority of the problem. Maybe range analysis can solve the majority of the analysis for us. We should find out, but I think it will require unsigned numbers. Harder to now signed integer < 4 is a valid index.
  • DM: Sounds like we can say this is block on someone doing range analysis prototype in Tint or Naga and see if that helps
  • KN: I think that’s what needs to be done. We implemented robustness pass in the first place.
  • DN: In the meantime we can suggest not do multi-dimensional arrays. That woudl speed up by a factor of 2.
  • KN: They already turned off robustness for testing until we’ve fixed some problems.
  • KN: I’ll write up a response.
  • DJ: Does D3D have plans to add robust buffer to local arrays?
  • GR: Not at this time, I don’t know of any plans.
  • RC: For the bindings, been told the trend is to not have help. Continue with no robust buffer access. You’re safe that the memory accessed is part of process.
  • DJ: Right, similar in metal. Doesn’t help us that much.
  • RC: Unless each WebGPU gets own GPU process.
  • GR: DXC does the same as FXC and replaces clamp with min + max.
  • KN: Good to know. In order to write range analysis we can do it for signed integers but if indexing were unsigned, or allowed unsigned, we could implementat it simpler since unsigned numbers have a slightly simpler analysis.
  • GR: Switching to unsigned we remove max calcls. Don’t see reason to do that, only time using negative index is using pointers and we don’t have that.
  • KN: Only problem is you have to type the u.
  • DS: Can have spec say that it’s >=0.
  • KN: We have for int now. Could write uint(i)
  • RM: If it starts non-negative and we only add we can expect any range analysis to conclude it’s positive..
  • KN: Yes, but harder. Want to prototype for speed determination. When prototyping may just ignore integers can be negative and run on targeted shaders where it doesn’t matter. Just seeing value is < 4 and dropping clamp is easy. Seeing increment is harder, or seems harder. Plus we have goal of the compiler doing as little as possible and making indices unsigned would lower the burden on the compiler.
  • RM: I think it’s reasonable. In this particular case could likely do a good job even with signed.
  • KN: I think that’s true of most cases.

Proposal to enhance defining input/output variables. (#1155)

texture sampling with integer offset - portability concern (#1206)

spell out when an implementation must reject a shader module (#1241)

Placement of read_only attributes (#1159)

Clone this wiki locally