Minutes 2019 08 26

GPU Web 2019-08-26

Chair: Corentin, Dean

Scribe: Ken, Austin

Location: Google Meet

TL;DR

Process for spec PRs:
- PRs with behavior changes require 1 editor review, 2-3 implementer reviews and must span the “meeting homework time” (2 working days before the meeting)
- Others require review from a spec editor.
- Spec editors will set up project tracking and ensure there are no work collisions.
- Style of the spec doesn’t have to Web-y but must be non-ambiguous.
PR Burndown
- #423, #378, #376 accepted with nits.
Coordinate systems and winding #416
- Concern that letting applications choose the Y-flip can lead to worse performance in some cases (would need an extra blit).
- Need to discuss more and CW will make a proposal.
Out of bounds clamping #311
- Requiring validation to be on always would induce worse perf. But letting validate on the CPU or clamp on the GPU induces portability issues.
- Needs more discussion.

Tentative agenda

PR Burndown
Coordinate systems and winding
Out-of-bounds clamping
Agenda for next meeting

Attendance

Apple
- Justin Fan
- Myles C. Maxfield
Google
- Austin Eng
- Corentin Wallez
- James Darpinian
- Kai Ninomiya
- Ken Russell
- Shrek Shao
Intel
- Yunchao He
Microsoft
- Chas Boyd
- Damyan Pepper
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
Timo de Kort

PR Burndown

JF: Is there a process for discussing PRs after the fact?
KN: I’m OK with either. Nicer if we didn’t split everything into 2 steps. File PR, discuss and resolve disagreements. Want spec language to not be blocking on pull request; more around the meat of the specification. Nice to keep it as one step so we have all the discussion in one place.
MM: what’s the process for knowing when it’s safe to merge? Are we going to bring up each of these large PRs on the call?
KN: only have to talk about behavioral changes on the call. Don’t have to talk about how the text is written. If disagreements on behavioral things, we should add it to the meeting agenda.
CW: I think a process that could work: PR goes through comments/fixes; if sticky point, then on call, talk about that specific part.
MM: if someone writes PR and no comments at all, is the requirement for pushing it a time limit? An actual approval from each major browser vendor?
KN: I think we should make it based on approval from spec editors. I’m fine with it being 2 spec editors, but maybe it should be 3.
CW: I think spec text should be gated on at least 1 spec editor. Gated on maybe 2-3 browser impls, and 1 spec editor.
KN: makes sense.
MM: sounds good.
CW: I’d say, spec editor responsible for following these rules before it’s landed. If sticky point, add TODO for next meeting.
MM: sounds pretty good. Worried about: 1 person who’s spec editor talks with other browser company and then push it. Can we also add a time limit of at least a few days?
CW: I think spec editor doesn’t count for their own reviews. Don’t want to gate editorial stuff and things everyone agrees on on a time limit like a week; seems fairly long. But does seem to make sense to have time limit so everyone can look at it.
MM: we can have multiple PRs in flight at once so it doesn’t seem to hold up the process.
CW: we have the homework deadline for this meeting, e.g. ~48 hours. We could say, everyone has to do their homework in these 2 days. If you do something before e.g. Thursday noon, and if no objection before this meeting, sounds ok.
MM: sounds ok as long as we have some time to review the changes.
CW: let’s make that Thursday afternoon / Fri.
MM: to clarify: every change to the spec has to span a 48 working hour period from Thursday afternoon to Monday afternoon Pacific?
CW: yes, for spec changes with behavioral changes. Refactoring, etc., shouldn’t wait.
MM: including spec’ing previously unspecified stuff?
KN: yes, that’s included in behavioral changes.
CW: we should have a process so we don’t step on each others’ toes like just happened to Kai and Justin. Maybe have list of all aspects we can do and have everyone contribute.
MM: on other specs the editors talk with each other and tell them what they’re doing.
CW: I’d like to do buffer mapping spec (for example) so talk with spec editor?
JF: I think having central place where the spec decisions are made would be better.
KN: having a tracker spreadsheet would be good.
CW: I can make a tracker spreadsheet if you like or perhaps you could do it. Why don’t you sync up.
JF: sounds good.
CW: I think we’re going to have to decide how the spec will be targeted: e.g. more at users? Implementers?
KN: I would be OK doing it the Khronos way and not quite the W3C way. I see the spec as a useful reference for reading, and the tests as the ground truth for what they implement. The ultra-precision should mostly be in the tests. When things are more or less clear in the spec language, we don’t need to ultra-specify it in the spec.
RC: I’m OK with that as long as the tests are well commented - both negative and positive parts.
KN: most of the language can go in the spec. Things not useful for users at all, consider not including if they’re too verbose. Example: the way web specs are written, it’s framed as “here’s the algorithm the browser implements”. Seems to be more verbose than writing what the rules are, and letting the tests verify “this is the order in which you check things”, etc. Hard to say in general and not on a case by case basis.
MM: not much of an opinion. Goal is specificity. Way to accomplish is to describe every line of code the browser has in it. If a collection of goals is as well specified that can be fine too.
DM: Think there is a difference. I used to work with GL/Vulkan/D3D. When I see spec written from the browser’s perspective, I think it gives the user a mental model of what the browser is doing and reason about the API better. See value in describing algorithms and steps. Maybe not super-detailed, but the steps themselves are valuable.
KN: ok. As we write spec text we can decide “we can make this more specific”, but doesn’t mean we can’t land it first and make it more specific afterward.
CW: makes sense. Seems responsibility of spec editors to make sure incoming contributions respect the style of the spec. We can grow this over time.
KN: I think it’ll grow over time. We’ll first have most of a spec document and then we can fill things out. Hard to make things consistent when things don’t exist yet.
MM: steps are particularly important in subsystems like media (video / audio) where there are many events which fire. Steps give you orderings of events which might not fall out from a collection of rules. Maybe for a sequence of events like buffer mapping / unmapping guidelines we need steps.
KN / CW: sounds good.
CW: we’ll need to spec CPU-side events for validation, etc.
CW: think we agree roughly on the level where we write specs. Let’s go back to the PR review.
#422 - Start writing spec for device/adapter, introduce internal objects
- Skipping for today, not enough reviews
#419 - Initial spec for GPUDevice.createBuffer
- Skipping for today, not enough reviews
#423 - replace u32/i32/u64 with normal int types or flags32/buffersize64
- DM: why is this buffersize64 and not just buffersize?
- KN: doesn’t have to be - I don’t care.
- DM: also flags32 is weird. Also is it weird that they’re all lowercase, no underscores etc?
- MM: this at least would help me. I didn’t know what a u32 was in WebIDL.
- CW: I think this is fine to go aside maybe from bikeshedding on names of things. Please discuss on the PR and land.
#378 - Do not require vertexInput in GPURenderPipelineDescriptor
- CW: this is because vertexInput only has a sequence of vertex attributes and a default for vertexFormat. If we can say the sequence is 0 by default then we can default the whole descriptor.
- MM: so this lets authors just skip empty square brackets?
- DM: I think we should reserve default cases for the situations where it’s really a default. Most people will have vertex inputs so this isn’t really a default.
- CW: WebIDL rules say that dictionaries with default values should be defaulted.
- DM: But you still need to specify the vertexStage
- CW: No, but this is only for the VertexInputDescriptor.
- DM: but will most people have to specify it anyway?
- CW: yes.
- DM: then why?
- CW: because it’s a bunch of curly braces and empty brackets. vertexInput: { vertexBuffers: [] }
- MM: I think DM’s saying that having to write that and think about it is better than forgetting, leaving it off and getting black screen.
- DM: they’ll get validation error because pipeline will need vertex inputs and they won’t have any. There’s no person and use case here that would say, yes, we need that.
- MM: one piece of evidence, most of the tutorials for learning Metal, the intro ones don’t have vertex inputs but do vertex pulling in the shader.
- CW: that’s why Francois did it this way, in his Hello, World he didn’t need it.
- DM: In Metal, those tutorials can easily bind a buffer to the stage. For us it implies a lot of things: bind group layout, pipeline layout, etc. Makes it more complicated than simply a buffer.
- KN: the example Francois wrote was using constant data in the shader so it didn’t have to bind anything. (vertices hard-coded into the shader to draw a simple triangle).
- CW: it’s more limited than Metal.
- DM: I’m a weak no here. I don’t think it’s needed but it’s fine if it’s added.
- CW: I think we have a very weak yes here
- MM: On the fence
- CW: Jeff was a weak no.
- RC: I noticed the vertex inputs had the vertex formats by default. We don’t have a strong opinion about this. Can have it be required. Weak yes. But if tomorrow you put up a PR making it required again I’ll support that too.
- CW: let’s submit the pull request then.
#376 - Add a default for GPURenderPassColorAttachmentDescriptor.storeOp
- CW: is this controversial?
- MM: great idea, we should do it.
- CW: done.
#411 - Remove vertex input * * *
Multithreading PRs #399 #400 #401 #402 * * *
#408 - Clarify render bundle state inheritance * * *
#412 - Change vertex stride/offset to u32 * * *
#389 - Add defaults to GPUTextureViewDescriptor, remove createDefaultView * * *
#424 - Fully specify the defaults of GPUTextureViewDescriptor * * *

Coordinate systems and winding

CW: this is Yunchao’s nice investigation. #416
CW: APIs don’t agree in coordinate systems; have to choose one.
CW: proposal: use WebGL’s coordinate system / Vulkan’s. Just like WebGL except in swap chain you say whether you want to flip Y or not.
KR: Not sure about that. Been having a lot of discussion with ChromeOS about doing this. Doing flip-y could demote overlays.
KN: Does that mean we always pay that cost in WebGL.
KR: Do a lot of contortions to hide this from the user. Extension was added to Mesa to flip Y framebuffer coordinates. So I would like to point out it’s not completely free.
RC: If we make it flip Y for render targets, not everything will go directly to the screen..
CW: This would be provided on the swapchain -- so format and flip Y. So everything would be like WebGL except for the last present step.
RC: So tex coords would be top-left.
CW: Well all of these things only matter when presenting. (0, 0) is top-left based on what option you set.
MM: ??
CW: In our experience you can flip Y in the shader which is basically free. We have a set of tests, a fair number, that test this and show you can implement it either way correctly.
MM: How many tricks?
CW: Swap some cull modes sometimes and do one or two flip operations in the shaders.
DM: Just wanted to ask KR to follow-up. Have investigation from Yunchao, but it doesn’t mention blitting anywhere in the costs. Shouldn’t we describe this in the investigation before making a decision?
YH: What’s missing?
KR: Whether or not we have to do one copy at the end of the rendering pipeline to do the flip.
CW: Essentially everything can be translated without blits excepts for the final presentation.
MM: Does that mean Chrome compositor doesn’t composite using glDraw***(), and sometimes uses blits?
CW: CC tries to defer as much as it can to the system to save a lot of power.
YH: If I understand correctly, you need to do the flip because in WebGL the framebuffer is Y up but the camera is Y down. My proposal is to use Y-down for framebuffer coordinates. Don’t think we have the problem.
CW: Worried it makes it difficult to.. Babylon JS assumes everything is Y-up. So it’s hard for them to do stuff.
RC: We should pick whatever gives the best performance. ex. for the case Ken described. In VR, all these extra costs add up. We should gather all the cases we have to do a blit and pick the one that has the fewest.
CW: That would be Y-down for presentation.
MM: Isn’t the result of what you [RC] said going to be different on different platforms?
RC: ..
KR: Depends on what the native API does. I don’t remember what D3D’s coord system looks like. If it’s top-left for presentation of D3D textures in a direct comp layer, then it’s backwards from what WebGL would typically does. That’s why ANGLE supports the flip. So we should try to enumerate the native APIs coordinate systems to figure out if top-left for swapchain origin or lower left produces the fewest number of flips on the most platforms. Don’t think we should offer an option. On WebGL, we’re finding on some hardware, the overlay planes are rotated. So the optimal presentation is actually 90 degree rotation. So lie to the app unless they’re sophisticated and query the real native rotation. Emulating this is really nasty because you have to flip a lot of stuff and change the behavior of the coordinate system.
CW: So swapchain.getPreferredOrientation() or on the canvas. So they can query and ask what is specifically the best.
KR: Maybe, but I think we should defer that and go with the simplest solution for now. Still up in the air how we want to spec this at the web level. Like if we past context-creation attribution to the canvas, then would you have to specify…. Do a CSS transform? or does the coord system implicitly flip?
CW: I would image texture size and coord system is different. But yes, it does seem we need more information about presentation cost and blitting.
KR: And also how hard it would be to emulate to avoid an extra blit. Unfortunately the MESA extension is not bug-free. There have been some issues with it.
CW: In WebGPU not super easy. Might have to recompile pipelines. We can talk about it more on the bug.
YH: I would recommend you TAL at my investigation. For WebGL, framebuffer and textures are Y-up. So if you do a blit or copy, I suppose in the driver there is a flip when you present. For all modern APIs, they all use Y-down for frame buffer and texture coordinates. Which is the same as camera coordinates, so there’s no issue with a Y flip
CW: So would be great if you can add a discussion of presentation
MM: is it worth resolving that we want to pick one of them and not making them configurable?
CW: I was going to write the proposal with the configurable bit / query of the preferred orientation.
YH: In my opinion, even if it’s configurable, we need a default.
CW: Yes.

Out of bounds clamping #311

MM: think there’s two ways we can do this. 1) it’s illegal for anyone to attach a buffer to a draw call when the buffer’s smaller than the first item of the draw call. (Assuming buffer’s an array.) Because then you can’t clamp, and you have to choose a different strategy, reads return 0 and writes are dropped. One option is, guarantee ability of impl so it can always clamp. That means the buffers have to be at least the appropriate size. Other way: don’t provide that guarantee. Shader, when compiled, at least has to handle the case where the buffer’s too small and the impl couldn’t inject clamping instructions. Impl would have to do the reads -> 0, writes -> nothing. Depending on the perf of the machine you’re on you might have 2 code paths. Or shader is compiled twice, one with clamping and one without. Hard question. Compiles might get slower with one option, draw calls get slower with the other.
CW: Validating that all the buffers are the right length is a cpu side queue-submit time validation. Or even at recording.
MM: would have to do it in the draw call. Earliest time the shaders are accompanied by buffers.
CW: There’s ways to optimize, I would assume. Like when you bind bind groups you can put the size of the buffers somewhere, and then the compares can be done in batches with fixed-function.
MM: another piece: number of buffers user can attach is 4 * 32 = number of comparisons in the brute force case. Maybe 128 comparisons is not that expensive.
CW: It might not be that bad. How about: would it make sense to say that implementations have a choice and can either do the validation at draw-call time and return an error, or give you robust-buffer access behavior?
MM: think that would lead to non-portable impls. Some draw calls with some buffers -> errors = nonportable impls.
KR: You’ll get zeros anyway.
MM: No I’m talking about a UAV. So in one you no-op the draw call. In the other, you read zeros for that UAV, but the others will work just fine.
KR: In WebGL, we used to generate errors all the time like this, but the perf impact was too great and it was changed to be optional. You now get errors, 0, or modulo. We should try to achieve the best performance possible here.
MM: I’m worried about implementing validate-at-draw-time solution, and then Chrome does something else and then we (WebKit) have to reverse-engineer what Chrome is doing and throw out our validation.
CW: Chrome on Mac would have the same constraints you do. Actually if we had the choice in Dawn we’d probably rely on robust buffer access where available, but on Mac we’d do the validation. Not a good answer because someone could develop on Windows/Linux and deploy on Mac.
KR: In the clamping case.. I see we can’t even clamp the index to 0. Might not be enough data.
CW: Something tells me if we tested this in WebGL, there might be bugs.
KR: Right, maybe in arrays in uniform buffers which hasn’t been as thoroughly tested in out-of-range clamping in WebGL 1.
MM: relevant even if there aren’t arrays. Like if there’s a Uniform Buffer block where the bound buffer isn’t large enough for it.
KR: Just had a bug this past week with a uniform block with an array of 1000 elements, and I only supplied 20. And WebGL wouldn’t let the draw call.
KN: it was worse than that. When you asked the OGL driver how much data you needed to fill the uniform buffer, Mac was giving a larger answer than the other platforms.
CW: The right solution is to do the validation CPU-side and throw an error if there isn’t enough data. At least for UAVs and not-big-enough for uniform buffer. But for platforms where we don’t have RBA, we might be throwing perf out the window.
MM: if we can’t decide on the right thing to do then the conservative thing is to do validation at the draw call site and relax it later. But we can’t make draw calls that used to work, not work in the future.
RC: on the WebGL side in practice have we seen important people rely on different things on different platforms?
KR: I don’t think anyone has complained about different error cases.. actually, we have had that. With lines showing up differently. Turns out they weren’t doing something right with vertex attributes. We have had the perennial problem of apps getting slower with more and more validation added.

Agenda for next meeting

CW: F2F is 4 weeks away. Let’s skip brainstorming until next meeting.
Brainstorm F2F - next call will be 2 weeks before the F2F.
Coordinate systems again.
- DM: proposal before that?
- CW: yes.
Rehash validation issue
PR burndown
KN: will continue having lots of PRs.
YH: it’s a public holiday in America next week. Labor Day.
CW: discuss in 2 weeks then. Try to push PRs into the repo [online] and not rely on the call. Can rework the agenda as it gets closer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2019 08 26

GPU Web 2019-08-26

TL;DR

Tentative agenda

Attendance

PR Burndown

Coordinate systems and winding

Out of bounds clamping #311

Agenda for next meeting

Clone this wiki locally