You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The benchmarks should be entirely written in rust.
The benchmarks should be portable and not rely on the presence of platform defined dictionary files.
The benchmarks should have the ability to be run with specific parameters
Number of input lines
Fraction of duplicates
Distribution of input line length
Char set (binary/text)
The benchmarks should still be able to run against all the preexisting commands (sort|uniq).
Design
A CLI application should be written that produces a set of random tokens according to the parameters specified on the CLI:
genbench --charset ascii/binary --delim CHAR --number NUM --duplicates PERCENTAGE --short LEN --long LEN
The short/long parameters each indicate the 90% percentile of string lengths, using a gaussian distribution.
For the actual benchmark we should write a benchmark executor that runs each of the implementations with a variety of parameters handed to genbench.
Tests
We can reuse the same strategy for testing by generating test data with genbench and then comparing the output of the full huniq and a super naive, unoptimized huniq implementation. We should specifically make sure, that buffer growing is tested (supply some very long, >20kb strings).
The text was updated successfully, but these errors were encountered:
Requirments
The benchmarks should be entirely written in rust.
The benchmarks should be portable and not rely on the presence of platform defined dictionary files.
The benchmarks should have the ability to be run with specific parameters
The benchmarks should still be able to run against all the preexisting commands (
sort|uniq
).Design
A CLI application should be written that produces a set of random tokens according to the parameters specified on the CLI:
The short/long parameters each indicate the 90% percentile of string lengths, using a gaussian distribution.
For the actual benchmark we should write a benchmark executor that runs each of the implementations with a variety of parameters handed to
genbench
.Tests
We can reuse the same strategy for testing by generating test data with genbench and then comparing the output of the full huniq and a super naive, unoptimized huniq implementation. We should specifically make sure, that buffer growing is tested (supply some very long, >20kb strings).
The text was updated successfully, but these errors were encountered: