-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues with OS X build #16
Comments
Thanks for the report ! (1) I have two questions:
(2) Your fix for (3) And your benchmark is very interesting. |
Glad I could help. (1) I am using a freshly compiled version of gcc 4.8.1. Everything is completely standard, so no graphite loops or any other fancy additions. See below for the output of gcc-4.8.1 -v: (3) The speedup is not too far off. My processor is an i7-4650U (http://ark.intel.com/products/75114), which is an ultra-low power dual core processor (with four threads). According to your estimate, I should see performance in the range of 140% to 175%. Perhaps my samples were worst-case scenarios. If you have any updated code in the future, I would be more than happy to test it for you and report back on the results. edit: until now, I had run the benchmark on those files for which I would use lz4. However, these files are already heavily compressed, which I assume is a worst-case scenario when it comes to the speed difference between the standard implementation of lz4 and lz4mt. I reran the benchmarks and now picked a more suitable test case. In the first test I tarred and then compressed a folder containing mostly PDF files, along with a few htm, gif and txt files. This gave me a performance benefit with lz4mt of roughly 70%, so clearly in the ballpark. The second test was run on a folder containing epub files (i.e. zip files) and their unzipped version (mostly html files). This gave a very appreciable speedup of 120%. In terms of decoding, the speedup was roughly 15% and 25%, respectively. It thus seems that the problem of the slower decoding was mainly to do with the kinds of files that I used in my initial benchmark (mostly tar.bz2 files). Nevertheless, it does seem weird that test-cases can be selected where using multiple cores actually results in lower performance. |
Here is a my experiment (memo): in short
TODO
Install gcc 4.8.1
Check CPU Spec
Benchmark on ramdisk
|
There are two issues with the build on OS X. Firstly, there is an error in src/lz4mt_compat.cpp . Specifically, in the untested code there is a reference to count. however, count does not exist. I would assume that this should be &c instead of &count. Secondly, the LDFLAGS should not be "-lrt -pthread" on OS X since -lrt is, in general, not supported on OS X. I would assume that this needs to be changed to LDFLAGS = -pthread .
The resulting code can be compiled and seems to work as expected. I have compared against lz4c and the resulting archive can be decompressed and results in the original file. In terms of encoding speed, lz4mt is roughly 20% to 25% faster on a core i7 Haswell CPU (most likely because the single core performance is considerably boosted when using lz4c). However, decoding speed takes an 8% hit compared to lz4c. When using single thread mode, lz4mt is in general about 20% slower (for both encoding and decoding).
The text was updated successfully, but these errors were encountered: