Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with OS X build #16

Open
kimbauters opened this issue Jul 9, 2013 · 3 comments
Open

issues with OS X build #16

kimbauters opened this issue Jul 9, 2013 · 3 comments
Assignees

Comments

@kimbauters
Copy link

There are two issues with the build on OS X. Firstly, there is an error in src/lz4mt_compat.cpp . Specifically, in the untested code there is a reference to count. however, count does not exist. I would assume that this should be &c instead of &count. Secondly, the LDFLAGS should not be "-lrt -pthread" on OS X since -lrt is, in general, not supported on OS X. I would assume that this needs to be changed to LDFLAGS = -pthread .

The resulting code can be compiled and seems to work as expected. I have compared against lz4c and the resulting archive can be decompressed and results in the original file. In terms of encoding speed, lz4mt is roughly 20% to 25% faster on a core i7 Haswell CPU (most likely because the single core performance is considerably boosted when using lz4c). However, decoding speed takes an 8% hit compared to lz4c. When using single thread mode, lz4mt is in general about 20% slower (for both encoding and decoding).

@t-mat
Copy link
Owner

t-mat commented Jul 9, 2013

Thanks for the report !
I'll investigate this problem.

(1) I have two questions:

  • What compiler/version did you use ?
    • gcc -v or clang -v will show precise information.
  • Which branch did you use ?

(2) Your fix for LDFLAGS and &count looks right.

(3) And your benchmark is very interesting.
On Linux and Windows, lz4mt is N^0.5 .. N^0.8 times faster than lz4c (N:Number of cores).
I think my code has some problem for synchronizing.

@ghost ghost assigned t-mat Jul 9, 2013
@kimbauters
Copy link
Author

Glad I could help.

(1) I am using a freshly compiled version of gcc 4.8.1. Everything is completely standard, so no graphite loops or any other fancy additions. See below for the output of gcc-4.8.1 -v:
Using built-in specs.
COLLECT_GCC=gcc-4.8.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin12.4.1/4.8.1/lto-wrapper
Target: x86_64-apple-darwin12.4.1
Configured with: .././configure --enable-languages=c,c++ --program-suffix=-4.8.1
Thread model: posix
gcc version 4.8.1 (GCC)

(3) The speedup is not too far off. My processor is an i7-4650U (http://ark.intel.com/products/75114), which is an ultra-low power dual core processor (with four threads). According to your estimate, I should see performance in the range of 140% to 175%. Perhaps my samples were worst-case scenarios. If you have any updated code in the future, I would be more than happy to test it for you and report back on the results.

edit: until now, I had run the benchmark on those files for which I would use lz4. However, these files are already heavily compressed, which I assume is a worst-case scenario when it comes to the speed difference between the standard implementation of lz4 and lz4mt. I reran the benchmarks and now picked a more suitable test case. In the first test I tarred and then compressed a folder containing mostly PDF files, along with a few htm, gif and txt files. This gave me a performance benefit with lz4mt of roughly 70%, so clearly in the ballpark. The second test was run on a folder containing epub files (i.e. zip files) and their unzipped version (mostly html files). This gave a very appreciable speedup of 120%. In terms of decoding, the speedup was roughly 15% and 25%, respectively. It thus seems that the problem of the slower decoding was mainly to do with the kinds of files that I used in my initial benchmark (mostly tar.bz2 files). Nevertheless, it does seem weird that test-cases can be selected where using multiple cores actually results in lower performance.

t-mat added a commit that referenced this issue Jul 11, 2013
@t-mat
Copy link
Owner

t-mat commented Jul 17, 2013

Here is a my experiment (memo):

in short

  • Build errors have been resolved @342c7e63be & @ed575c4
  • I've run the benchmark (enwik8) on ramdisk
    • Compression speed is good 😃
    • Decompression speed is not so good 😥

TODO

  • Investigate reported strange behavior for incompressible (.tar.bz2) files.

Install gcc 4.8.1

$ brew install gcc48

$ gcc-4.8 -v
Using built-in specs.
COLLECT_GCC=gcc-4.8
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc48/4.8.1/gcc/libexec/gcc/\
x86_64-apple-darwin11.4.2/4.8.1/lto-wrapper
Target: x86_64-apple-darwin11.4.2
Configured with: ../configure --build=x86_64-apple-darwin11.4.2 \
--prefix=/usr/local/Cellar/gcc48/4.8.1/gcc \
--datarootdir=/usr/local/Cellar/gcc48/4.8.1/share \
--bindir=/usr/local/Cellar/gcc48/4.8.1/bin \
--enable-languages=c --program-suffix=-4.8 --with-gmp=/usr/local/opt/gmp \
--with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc \
--with-cloog=/usr/local/opt/cloog --with-isl=/usr/local/opt/isl \
--with-system-zlib --enable-libstdcxx-time=yes --enable-stage1-checking \
--enable-checking=release --enable-lto --disable-werror --enable-plugin \
--disable-nls --disable-multilib
Thread model: posix
gcc version 4.8.1 (GCC)

Check CPU Spec

$ sysctrl -a | grep brand_string
machdep.cpu.brand_string: Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz

Benchmark on ramdisk

$ diskutil erasevolume HFS+ 'MyRamDisk512Mib' `hdiutil attach -nomount ram://1048576`
$ cd /Volumes/MyRamDisk512Mib
$ curl -O https://cs.fit.edu/~mmahoney/compression/enwik8.bz2
$ bzip2 -dk enwik8.bz2
$ ln -s /your/path/to/lz4c
$ ln -s /your/path/to/lz4mt

$ ./lz4c -c -y enwik8 enwik8.lz4c
*** LZ4 Compression CLI , by Yann Collet (Jul  9 2013) ***
Compressed 100000000 bytes into 56995506 bytes ==> 57.00%
Done in 0.68 s ==> 139.66 MB/s

$ ./lz4mt -c -y enwik8 enwik8.lz4mt
Total time: 0.31985sec

$ cmp -b enwik8.lz4c enwik8.lz4mt

$ ./lz4c -d -y enwik8.lz4c enwik8.out
*** LZ4 Compression CLI , by Yann Collet (Jul  9 2013) ***
Successfully decoded 100000000 bytes
Done in 0.34 s ==> 283.23 MB/s

$ cmp -b enwik8 enwik8.out

$ ./lz4mt -d -y enwik8.lz4c enwik8.out
Total time: 0.256552sec

$ cmp -b enwik8 enwik8.out

$ cd
$ hdiutil detach /Volumes/MyRamDisk512Mib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants