Bart Massey
This code is originally by /u/bruce3434
on this Reddit
thread.
The fundamental issue was that dropping a BufWriter
on top
of StdoutLocked
sped the code up by a factor of 2× even
though the writes contained no newlines. This Reddit
comment
explains what is going on; this codebase is the underlying
code being measured.
-
glacial.rs
uses unlockedStdout
. This is really slow due to all the locking. -
slow.rs
usesStdoutLocked
. This is still pretty slow, for reasons explained in the comment above. -
fast.rs
uses aBufWriter
atopStdoutLocked
. This is the version that is 2× faster than the slow version. -
speedy.rs
uses aBufWriter
atop a raw UNIXFile
. It is a little faster than the fast version, but is portable only to UNIX systems and has anunsafe
in it. -
turbo.c
is the original inspiration and about the fastest, a C implementation authored by DEC05EBA. Its speedup tricks are used by the other fast versions here. -
turbo.rs
is a fairly straightforward port ofturbo.c
, which avoids standard library routines for things in favor of hand-calculation.turbo.rs
is about 30% slower thanturbo.c
. -
lightning.cpp
is a port ofturbo.rs
authored by DEC05EBA and contributed by Hossain Adnan. It uses a manual buffer. It is comparable in performance toturbo.c
. -
lightning.rs
is a port oflightning.cpp
contributed by Hossain Adnan. It uses a manual buffer currently backed bystd::Vec::<u8>
along with POSIXwrite()
. It's about 30% slower thanturbo.rs
. -
ludicrous.rs
is a version by DEC05EBA that uses a handmade buffer. It is about 10% slower thanturbo.c
. -
serious.rs
(not actually serious) is a C-like Rust implementation with tons ofunsafe
employing all the tricks. It is the same speed asturbo.c
(currently insignificantly faster, actually), which is reasonable given that it's even uglier and no safer. "You can write FORTRAN in any language."
Many of these will run only on a POSIX system. I have tried them only on Linux.
Compiler choice matters for the faster C / C++ benchmarks
here. clang
/ gcc
and clang++
/ g++
will give
different answers. By default clang
and clang++
are used
to increase comparability with Rust's LLVM toolchain.
-
To run the benchmarks:
-
Install Hyperfine with
cargo install hyperfine
-
Build the Rust benchmarks with
cargo build --release
-
Say
make bench
The results will be available in
BENCH.md
. Here are my results from 2022-11-29 on an AMD Ryzen 9 3900X withrustc
1.64.0 andclang
/clang++
14.0.6. They are not significantly different than when run several years ago on older hardware. -
-
To check that the benchmarks produce the same output say
make check
. Themd5sum
s should match.