-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do those improvements work? #5
Comments
I performed some benchmarks and noticed that the stack allocations were impacting performance. When I use standard arrays which are allocated on the heap, it performed better. As for removing the aggressive inlining, basically when those functions were inlined, the caller would run out of registers, so heaps of CPU cycles were wasted swapping variables in and out of registers from the stack, rather than doing meaningful calcuations. So yes, it was a combination of benchmarks and identifying the JITed assembly code. |
@master131 stackallocs can be boosted by up to 30% by disabling zeroing see https://github.com/dotnet/coreclr/issues/1279 Also the heap allocations can be cached via ArrayPool ;-) Great work by the way! |
Thanks for bringing that to my attention @EgorBo. As an experiment I removed .initlocals and re-instated the stackalloc to see if performance was any better than the v0.3.2. On .NET Framework 4.5: On .NET Core 2.2: I did also check the JITed assembly to verify that it was not being zero initialized. For simplicity reasons I may just leave the allocation as-is (as the performance increase in .NET Core is negligible) and not mess around with micro-optimisations. |
Hello,
I was looking through the commits and noticed two commits: 1c452a9 and 429a82a
The first one changed some code, and I'd like to ask what changes were made here, and more importantly how exactly that improved performance? What was the big bottleneck here?
I understand the code (at least somewhat), but the diffs are a bit mangled and hard to follow.
The second one surprises me as well, how were those aggressive inlining attributes identified as harmful to the performance? Did you manually comment them out and re-run the benchmarks? (I doubt it a bit, that'd be a somewhat "blind" approach, no??) Did you inspect the generated asm code and identified some issues there?
Great work on the library! I love it! :)
The text was updated successfully, but these errors were encountered: