Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow parsers #5

Open
vsbuffalo opened this issue Feb 29, 2024 · 2 comments
Open

Slow parsers #5

vsbuffalo opened this issue Feb 29, 2024 · 2 comments

Comments

@vsbuffalo
Copy link
Owner

vsbuffalo commented Feb 29, 2024

flamegraph

sudo cargo flamegraph --bin granges -- map  --genome tests_data/hg38_seqlens.tsv \
  --left windows_1Mb.bed  --right test_bed5.bed.gz --func mean > /dev/null

GRange's parsers are slow-ish. I got a 20-25% gain in speed from using serde + csv. But, I think String types are killing us, compared to raw bytes. But to materialize those benefits a new ASCII or raw byte vector type is need through out (I think noodles uses a similar approach?).

@vsbuffalo
Copy link
Owner Author

cd05f1f brought in serde+csv.

This, from an API perspective is much cleaner — users can just specify structs and use serde's Deserialize derive attribute to handle parsing.

However, comparing against here appears to be a performance hit. Here is f144ab4, but with the benches/bedtools_comparison.rs from HEAD:

command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  139.37 s         67.22 s                       51.7679
adjust        60.24 s          29.68 s                       50.7238
map_min       66.54 s          45.44 s                       31.7153
map_mean      65.98 s          45.53 s                       30.9976
map_max       72.54 s          45.03 s                       37.9216
map_sum       64.87 s          45.21 s                       30.3143
map_median    65.95 s          46.16 s                       30.012
flank         83.87 s          47.29 s                       43.6118
filter        78.89 s          39.74 s                       49.6282
windows       280.89 s         47.56 s                       83.0676

Here are two runs on HEAD:

# run 1
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  134.51 s         73.06 s                      45.6827
adjust        59.98 s          65.52 s                      -9.23242
map_min       66.54 s          54.71 s                      17.7839
map_mean      65.75 s          55.76 s                      15.2025
map_max       67.37 s          55.96 s                      16.942
map_sum       64.78 s          54.68 s                      15.5909
map_median    66.60 s          54.59 s                      18.0371
flank         84.39 s          31.96 s                      62.1299
filter        78.31 s          41.01 s                      47.6287
windows       281.53 s         149.98 s                     46.7274

# run 2 
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  137.36 s         73.51 s                      46.4823
adjust        59.91 s          65.41 s                      -9.19381
map_min       66.01 s          54.51 s                      17.4298
map_mean      66.05 s          57.25 s                      13.3131
map_max       69.62 s          54.67 s                      21.4671
map_sum       64.61 s          54.60 s                      15.4947
map_median    66.99 s          55.53 s                      17.1005
flank         84.60 s          32.04 s                      62.1338
filter        78.90 s          41.21 s                      47.7687
windows       283.60 s         150.69 s                     46.8675

So far, it looks like serde's deserialization lead to speed ups, but the serialization (or maybe csv) is slower.

Making windows is a fast operation in absolute terms, so this matters little... but one can see the cost of serde deserialize versus the old TsvSerialize approach here:

Screenshot 2024-02-28 at 11 56 15 PM

@vsbuffalo
Copy link
Owner Author

Updates: this is on f991e23 which may disappear as it's squashed etc.

python scripts/benchmark_summary.py
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_median    110.73 s         95.96 s                       13.3354
map_sum       108.87 s         95.72 s                       12.0726
map_max       114.47 s         84.39 s                       26.2822
adjust        109.28 s         80.88 s                       25.9942
flank         145.60 s         50.54 s                       65.2901
map_multiple  296.48 s         119.34 s                      59.7466
map_mean      109.97 s         94.83 s                       13.7727
filter        118.36 s         58.79 s                       50.3318
merge_empty   63.51 s          31.01 s                       51.181
windows       515.97 s         173.53 s                      66.3682
map_min       114.82 s         94.74 s                       17.4876

with --features=bench-big

python scripts/benchmark_summary.py
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_median    524.46 s         547.00 s                     -4.29831
map_sum       510.89 s         539.91 s                     -5.67975
map_max       505.62 s         535.22 s                     -5.85555
adjust        968.14 s         696.75 s                     28.0319
flank         22.22 min        397.84 s                     70.1543
map_multiple  21.01 min        577.71 s                     54.1614
map_mean      502.51 s         540.29 s                     -7.51975
filter        20.25 min        641.71 s                     47.1769
merge_empty   447.83 s         210.99 s                     52.8856
windows       519.41 s         172.34 s                     66.8201
map_min       503.93 s         538.19 s                     -6.79718

So parsing is slow, but something in particular isn't scaling well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant