Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: align fib() function to DSB boundary (64 byte) #129

Open
Kogia-sima opened this issue Mar 3, 2021 · 3 comments
Open

Suggestion: align fib() function to DSB boundary (64 byte) #129

Kogia-sima opened this issue Mar 3, 2021 · 3 comments

Comments

@Kogia-sima
Copy link

Kogia-sima commented Mar 3, 2021

It seems that some benchmarks for statically compiled languages (C, Rust, Fortran etc.) heavily depends on where the instructions of fib() function will be placed. In modern processors, which has 64-byte DSB boundaries, small loops or recursive calls may fit in a single μops cache, but it depends on the code alignment.

Here is my experiments for C benchmark:

memory address total execution time [s]
0x1880 11.208
0x1890 11.323
0x18a0 13.320
0x18b0 10.769

This alignment issue causes different benchmark results on different platforms, compiler versions, compiler options (e.g. #28), or even what linker you use. If you stick with fib() function benchmark, you should manually specify function alignment to get consistent results.

@drujensen-happymoney
Copy link

Hi @Kogia-sima interesting stuff! The original goal was to show speed differences between Crystal and Ruby, but it's since grown to include more than just the top 10 languages.

I will look closer into the code alignment issue. Someone mentioned it before but didn't have a suggestion on how to address it. I think there are other issues with this benchmark when trying to compare languages like C vs Rust and I probably should add a disclaimer.

@Kogia-sima
Copy link
Author

Kogia-sima commented Mar 3, 2021

Even though another factor also affects the performance, I bet the code alignment is dominant here. For example, I see that Rust 1.42.0 and 1.50.0 produces different results for 16-byte alignment, but almost same results for 64-bytes alignment.

In the case of Rust, there is actually one more factor that affects performance: LLVM inserts nop before loops to avoid alignment issue on some processors. With this padding fib() functions exceeds 64 bytes, which may cause DSB cache misses (not always, but under some situations). You can avoid this behavior by passing -C llvm-args=-x86-experimental-pref-loop-alignment=0 to rustc. When I specified this flag, I see that C, C++, and Rust all results in same performance.

The original goal was to show speed differences between Crystal and Ruby

I understood your goals, so the best solution would be to add proper disclaimer to readme.

@Kogia-sima
Copy link
Author

Here is another experiments to prove that 64 bytes alignment produces consistent results.

memory address total execution time [s]
0x1a00 11.205
0x19c0 11.196
0x1980 11.232
0x1940 11.204

@Kogia-sima Kogia-sima changed the title Suggestion: align fib() function to DSB boundary (64 bit) Suggestion: align fib() function to DSB boundary (64 byte) Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants