Suggestion: align fib() function to DSB boundary (64 byte) #129

Kogia-sima · 2021-03-03T22:32:12Z

It seems that some benchmarks for statically compiled languages (C, Rust, Fortran etc.) heavily depends on where the instructions of fib() function will be placed. In modern processors, which has 64-byte DSB boundaries, small loops or recursive calls may fit in a single μops cache, but it depends on the code alignment.

Here is my experiments for C benchmark:

memory address	total execution time [s]
0x1880	11.208
0x1890	11.323
0x18a0	13.320
0x18b0	10.769

This alignment issue causes different benchmark results on different platforms, compiler versions, compiler options (e.g. #28), or even what linker you use. If you stick with fib() function benchmark, you should manually specify function alignment to get consistent results.

The text was updated successfully, but these errors were encountered:

drujensen-happymoney · 2021-03-03T23:18:26Z

Hi @Kogia-sima interesting stuff! The original goal was to show speed differences between Crystal and Ruby, but it's since grown to include more than just the top 10 languages.

I will look closer into the code alignment issue. Someone mentioned it before but didn't have a suggestion on how to address it. I think there are other issues with this benchmark when trying to compare languages like C vs Rust and I probably should add a disclaimer.

Kogia-sima · 2021-03-03T23:51:26Z

Even though another factor also affects the performance, I bet the code alignment is dominant here. For example, I see that Rust 1.42.0 and 1.50.0 produces different results for 16-byte alignment, but almost same results for 64-bytes alignment.

In the case of Rust, there is actually one more factor that affects performance: LLVM inserts nop before loops to avoid alignment issue on some processors. With this padding fib() functions exceeds 64 bytes, which may cause DSB cache misses (not always, but under some situations). You can avoid this behavior by passing -C llvm-args=-x86-experimental-pref-loop-alignment=0 to rustc. When I specified this flag, I see that C, C++, and Rust all results in same performance.

The original goal was to show speed differences between Crystal and Ruby

I understood your goals, so the best solution would be to add proper disclaimer to readme.

Kogia-sima · 2021-03-04T01:26:54Z

Here is another experiments to prove that 64 bytes alignment produces consistent results.

memory address	total execution time [s]
0x1a00	11.205
0x19c0	11.196
0x1980	11.232
0x1940	11.204

Kogia-sima changed the title ~~Suggestion: align fib() function to DSB boundary (64 bit)~~ Suggestion: align fib() function to DSB boundary (64 byte) Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: align fib() function to DSB boundary (64 byte) #129

Suggestion: align fib() function to DSB boundary (64 byte) #129

Kogia-sima commented Mar 3, 2021 •

edited

drujensen-happymoney commented Mar 3, 2021

Kogia-sima commented Mar 3, 2021 •

edited

Kogia-sima commented Mar 4, 2021

Suggestion: align fib() function to DSB boundary (64 byte) #129

Suggestion: align fib() function to DSB boundary (64 byte) #129

Comments

Kogia-sima commented Mar 3, 2021 • edited

drujensen-happymoney commented Mar 3, 2021

Kogia-sima commented Mar 3, 2021 • edited

Kogia-sima commented Mar 4, 2021

Kogia-sima commented Mar 3, 2021 •

edited

Kogia-sima commented Mar 3, 2021 •

edited