Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine draw commands to improve rendering performance #2421

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from

Conversation

douira
Copy link
Collaborator

@douira douira commented Apr 14, 2024

This PR makes it so that draw commands that read from adjacent vertex data are combined. This reduces the number of draw commands by around 30% and improves fps on my system by up to 57% depending on the scene and circumstances. I'm on macOS with a 6900 XT. This performance improvement likely comes, as jellysquid stated on discord, from reduced CPU overhead in the driver and better GPU occupancy.

Please test if this results in a similar improvement or other effect, as it's probably dependent on graphics card, memory bandwidth, and platform (os/driver/vendor etc).

Here's a recording of the number of draw commands per pass:
ts on, before:
Draw total for pass Solid: 15531
Draw total for pass Cutout: 13277
Draw total for pass Translucent: 2298

ts on, after:
Draw total for pass Solid: 9571
Draw total for pass Cutout: 8306
Draw total for pass Translucent: 2298

ts off, before:
Draw total for pass Solid: 15531
Draw total for pass Cutout: 13277
Draw total for pass Translucent: 3812

ts off, after:
Draw total for pass Solid: 9571
Draw total for pass Cutout: 8306
Draw total for pass Translucent: 3645

Here's some screenshots without and with this patch. The fps numbers here are outdated, since this branch has been updated in the meantime. See the newest comments at the bottom of this thread instead.

Screenshot 2024-04-14 at 04 29 12
Screenshot 2024-04-14 at 04 27 59
Screenshot 2024-04-14 at 04 30 51
Screenshot 2024-04-14 at 04 32 25
Screenshot 2024-04-14 at 04 48 21
Screenshot 2024-04-14 at 04 49 16

…ing distance sorting through

the detection of primary intersectors when geometry is intersecting and then sorting them in a fixed order
…iately instead of keeping them to avoid memory usage

buffer caching would be a better solution but that's complicated and doesn't currently work correctly
also removed the warning message about unpartitionable geometry as it seems to not be a relevant problem
… not recalculated when the normal is quantized.

also fixed aligned quads not receiving the more accurate center based on the average of the unique vertexes.
@douira
Copy link
Collaborator Author

douira commented May 8, 2024

Testing on Discord has shown that these changes can improve performance by around 35%, highly variable depending on specific combinations of many system and scene-related factors. There don't seem to have been any regressions that are statistically significant.

An even more radical optimization that attempts to organize sections such that then combining draw commands across sections is possible did not yield useful results, but I suspect the implementation has a bug. It can be found here (link), but isn't included in this PR.

@douira
Copy link
Collaborator Author

douira commented May 8, 2024

I'm marking it as ready for review/merging

@douira douira marked this pull request as ready for review May 8, 2024 03:01
@jellysquid3
Copy link
Member

Because the changes from #2352 have been squashed into /dev, the pull request needs to be re-based to properly isolate the relevant changes.

# Conflicts:
#	src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/compile/ChunkBuildBuffers.java
@douira
Copy link
Collaborator Author

douira commented May 20, 2024

I think it's good to go now

# Conflicts:
#	common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/compile/ChunkBuildBuffers.java
#	common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/compile/tasks/ChunkBuilderMeshingTask.java
#	common/src/main/java/net/caffeinemc/mods/sodium/client/render/chunk/data/SectionRenderDataStorage.java
#	src/main/java/net/caffeinemc/mods/sodium/client/gl/util/VertexRange.java
@douira
Copy link
Collaborator Author

douira commented Sep 20, 2024

Updated to dev, fixed, and changed some things. The effect on performance seems to have increased. On my computer it now goes from 260 to 410 (peak) between dev and this branch. Note again that it will somewhat depend on what point of view each section was initially loaded from.

this
draw command combining

dev
dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants