[Rust] Will this work with Criterion? #8

jasonwilliams · 2020-01-19T15:18:32Z

A lot of rust projects use Criterion which overrides cargo bench, that may output something different, would this project work with that?

What does the output need to look like?

For e.g im using
https://github.com/jasonwilliams/criterion-compare-action
which is a wrapper around https://github.com/bheisler/criterion.rs

This is the output from criterion on https://github.com/jasonwilliams/boa:

Create Realm            time:   [389.43 us 391.46 us 393.64 us]
                        change: [+2.9053% +3.3116% +3.7569%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Symbol Creation         time:   [476.63 us 478.43 us 480.62 us]
                        change: [-8.0982% -7.3473% -6.5933%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

     Running target\release\deps\fib-b7cfe35d42f9c87b.exe
Benchmarking fibonacci (Execution): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 21.1s or reduce sample count to 30
fibonacci (Execution)   time:   [4.0417 ms 4.0513 ms 4.0612 ms]
                        change: [-3.1531% -2.4175% -1.7071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

     Running target\release\deps\parser-ab35171b3898e94f.exe
Expression (Parser)     time:   [15.656 us 15.673 us 15.693 us]
                        change: [-0.7893% -0.5205% -0.2658%] (p = 0.00 < 0.05)
                        Change within noise threshold.

     Running target\release\deps\string-bdac66e3ac549711.exe
Hello World (Execution) time:   [391.07 us 394.24 us 397.40 us]
                        change: [-3.0040% -2.2291% -1.5351%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

Hello World (Lexer)     time:   [1.9896 us 1.9961 us 2.0032 us]
                        change: [-3.2960% -2.6175% -2.0389%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

Hello World (Parser)    time:   [3.8064 us 3.8102 us 3.8141 us]
                        change: [-1.0105% -0.2551% +0.3856%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

https://bheisler.github.io/criterion.rs/book/user_guide/command_line_output.html

The text was updated successfully, but these errors were encountered:

rhysd · 2020-01-20T03:29:20Z

I'm currently not a user of criterion (since I could not run it on my local macOS). However, as far as seeing your output example, we can extract the benchmark result and put remaining outputs to extra field. Your contribution will be more than welcome: https://github.com/rhysd/github-action-benchmark/blob/master/CONTRIBUTING.md

jasonwilliams · 2020-01-20T14:13:49Z

Thanks @rhysd
I had a go here, not sure how to fill in range
https://regex101.com/r/8mrOSE/5

I suppose the range would be the high-value - low-value
[low-value median-value high-value]

CC @bheisler

bheisler · 2020-01-21T04:00:05Z

Hey, Criterion.rs maintainer here. The command-line output isn't really intended for programmatic access so the format isn't guaranteed to stay stable. I don't currently plan to change it, so this is more of a heads-up that you may need to fix it if it changes in the future.

Criterion.rs generates a file that is designed for programs rather than humans (see here for more on that) but it contains the raw measurements, not the statistics that Criterion.rs calculates from them, so if you wanted to use that you would need to decide on how you want to summarize those numbers. If you go that direction, I would recommend also reading up on the measurement and analysis process to clarify what some of those numbers mean.

Unrelated: @rhysd - if it's not too much trouble, would you mind explaining your difficulties with Criterion.rs on macOS, or perhaps opening an issue on the Criterion.rs repository? I don't have a Mac to test with but it ought to work.

rhysd · 2020-01-21T07:49:24Z

@bheisler Thank you for the comment. I did not know that it supports machine-readable format. The format would be preferable.

if it's not too much trouble, would you mind explaining your difficulties with Criterion.rs on macOS

I'm sorry that I cannot remember clearly, but when I tried Criterion example at that time, the benchmark program crashed with SEGV. Since it was about 6 months ago and I used older macOS (10.12) at that time, now it might work fine.

bheisler · 2020-01-21T13:15:49Z

OK, thanks. If you decide to try it again and run into the same problem, please feel free to file an issue.

rhysd · 2020-01-21T14:14:33Z

I'm now trying to take over @jasonwilliams's work. I'll notify you if I face an issue. Thank you for taking care.

rhysd · 2020-01-21T16:10:45Z

I tried to implement simple calculation based on the analysis process document:

const { linear } = require('regression');
const fs = require('fs');

const lines = fs.readFileSync('target/criterion/Bench Fib 10/new/raw.csv', 'utf8').split('\n');
const measurements = lines
    .slice(1)
    .filter(l => l !== '')
    .map(l => {
        const data = l.split(',');
        const measured = parseFloat(data[5]);
        const iter = parseInt(data[7], 10);
        return [iter, measured];
    });

let max = measurements[0][1] / measurements[0][0];
let min = max;
for (const m of measurements.slice(1)) {
    const v = m[1] / m[0];
    if (v > max) {
        max = v;
    }
    if (v < min) {
        min = v;
    }
}

console.log('min:', min);
console.log('estimate:', linear(measurements).equation[0]);
console.log('max:', max);

and output was:

min: 160.11831801157643
estimate: 171.46
max: 196.80116265096436

Criterion.rs's output was

Bench Fib 10            time:   [169.75 ns 171.26 ns 173.01 ns]

so I think there is a difference in calculation of estimation. And min/max seems wrong.

Hmm, I'm not confident to maintain the logic. After all I'm feeling using CLI stdout might be better. Though it may change without major version up of Criterion.rs.

CSV format document says

The results of Criterion.rs' analysis of these measurements are not currently available in machine-readable form. If you need access to this information, please raise an issue describing your use case.

so we might adopt some machine readable output of summary in the future.

jasonwilliams · 2020-01-21T17:37:08Z

so we might adopt some machine readable output of summary in the future.

Yeah, a good idea would be to have github-action-benchmark accept the CLI output for now, and raise a separate issue for migrating to calculating the CSV logic, its a beefy task and should be worked on separately.
The CLI output format won't change any time soon, so we have time to do this properly.

pksunkara · 2020-03-03T14:52:25Z

Or criterion could add the calculated info to the machine accessable file. @bheisler What do you think?

Or maybe it could support the normal cargo bench format with an option? We want to use this action in https://github.com/clap-rs/clap repo.

bheisler · 2020-06-07T05:12:45Z

Hey folks. I've been thinking about adding better machine-readable output to this cargo-criterion subcommand I've been working on. Specifically, this would include the calculated statistics, not just the raw measurements.

I'm thinking it would be broadly similar to Cargo's existing --message-format json option. You'd run cargo criterion --message-format json and it would emit JSON messages. Cargo sends the messages over stdout and prints all of its actual output on stderr. I could probably have cargo-criterion do the same, but I'm also considering using a TCP socket on localhost, still undecided there. Naturally, you would have to cargo install cargo-criterion first.

The messages themselves are still TBD, but I do want to keep my options open for the future so I'm thinking that the only field that is guaranteed to be there would be some estimate of a typical value. All of the other stats (mean, variance, MAD, confidence bounds, etc.) would be officially documented as optional so that I still have the option to not report them. For example, I'm imagining an alternate sampling method for long-running benchmarks where we just run the benchmark say 100 times and record each individual time. Those measurements would produce a different set of statistics and I wouldn't want to break downstream code by only sometimes reporting, say, the slope measurements. I'm also thinking I might set it up to parse bencher's statistics from its stdout output - then I could incorporate bencher benchmarks into cargo-criterion's reports and JSON messages, but of course bencher reports far fewer statistics than Criterion.rs does, so there would have to be a lot of things missing from those JSON messages if I did that.

None of this is a promise, mind - I'm still designing a lot of this stuff and figuring out what makes sense. bencher support is particularly dubious, but it's a possibility.

Would that work for what you folks need? Do you have any comments or concerns with this idea so far? Is there anything in particular you want to have in the message format?

pksunkara · 2020-06-07T07:09:20Z

@bheisler This is already solved with the bencher compat output criterion provides. It is the reason I asked you to add it originally.

brainstorm · 2023-01-05T13:36:00Z

@bheisler Support for cargo-criterion has been implemented/tested in #145.

epompeii · 2023-08-16T12:41:49Z

If anyone else is blocked by this/#145, Bencher supports Criterion: https://github.com/bencherdev/bencher#supported-benchmark-harnesses

It seems that `github-action-benchmark` does not support criterion for the moment, and the suggested alternative Bencher requires payment for private projects and does not seem to work on Windows. See: benchmark-action/github-action-benchmark#8

rhysd added the question Further information is requested label Jan 20, 2020

jasonwilliams mentioned this issue Jan 20, 2020

Add criterion support #10

Closed

GopherJ mentioned this issue Apr 12, 2020

add github-action-benchmark casbin/casbin-rs#104

Closed

morenol mentioned this issue Apr 26, 2020

Allow '/' in test name (rust) #26

Merged

epompeii mentioned this issue Dec 27, 2024

continuous benchmarking cmu-db/optd#263

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] Will this work with Criterion? #8

[Rust] Will this work with Criterion? #8

jasonwilliams commented Jan 19, 2020 •

edited

Loading

rhysd commented Jan 20, 2020

jasonwilliams commented Jan 20, 2020 •

edited

Loading

bheisler commented Jan 21, 2020

rhysd commented Jan 21, 2020 •

edited

Loading

bheisler commented Jan 21, 2020

rhysd commented Jan 21, 2020

rhysd commented Jan 21, 2020

jasonwilliams commented Jan 21, 2020

pksunkara commented Mar 3, 2020

bheisler commented Jun 7, 2020

pksunkara commented Jun 7, 2020

brainstorm commented Jan 5, 2023

epompeii commented Aug 16, 2023

[Rust] Will this work with Criterion? #8

[Rust] Will this work with Criterion? #8

Comments

jasonwilliams commented Jan 19, 2020 • edited Loading

rhysd commented Jan 20, 2020

jasonwilliams commented Jan 20, 2020 • edited Loading

bheisler commented Jan 21, 2020

rhysd commented Jan 21, 2020 • edited Loading

bheisler commented Jan 21, 2020

rhysd commented Jan 21, 2020

rhysd commented Jan 21, 2020

jasonwilliams commented Jan 21, 2020

pksunkara commented Mar 3, 2020

bheisler commented Jun 7, 2020

pksunkara commented Jun 7, 2020

brainstorm commented Jan 5, 2023

epompeii commented Aug 16, 2023

jasonwilliams commented Jan 19, 2020 •

edited

Loading

jasonwilliams commented Jan 20, 2020 •

edited

Loading

rhysd commented Jan 21, 2020 •

edited

Loading