-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow numerics in operations #4
Labels
Comments
So interestingly this isn't an issue on Apple silicon:
vs Ubuntu x86_64
|
vsbuffalo
added a commit
that referenced
this issue
Feb 28, 2024
…` added, etc. - New `Sequences::region_apply_into_granges()`, which applies a function to regions of a sequence, and puts the results in a new `GRanges`. - New `Sequences` test cases corresponding to `granges_test_case_01()`, and lots of new `Sequence`-related tests. - Added `map_into_array1()` and `map_into_array2()` for the --feature=ndarray, with tests. - `take_ranges()` and `take_both()` added. - New `GRanges::midpoints()` method, with tests. - New `GRanges::data_indices()` method, to build a `GenomeMap` out of the data indices. - New `GRanges::data_by_seqnames()` which builds a `GenomeMap` of `Vec<U>` of the data, by sequence name. `GRanges::data_refs_by_seqnames()` also added, for references. - `GRanges::into_vecranegs()` added, with `From` method for `COITreesEmpty` to `VecRangesEmpty` added since it's needed. - Cleaned lifetime out of `IndexedDataContainer`, brought down to assoc. type level. Cleans up things a lot! - Remove section on readme on slow benchmarks -- this is now GH issue #4.
vsbuffalo
added a commit
that referenced
this issue
Feb 28, 2024
…` added, etc. - New `Sequences::region_apply_into_granges()`, which applies a function to regions of a sequence, and puts the results in a new `GRanges`. - New `Sequences` test cases corresponding to `granges_test_case_01()`, and lots of new `Sequence`-related tests. - Added `map_into_array1()` and `map_into_array2()` for the --feature=ndarray, with tests. - `take_ranges()` and `take_both()` added. - New `GRanges::midpoints()` method, with tests. - New `GRanges::data_indices()` method, to build a `GenomeMap` out of the data indices. - New `GRanges::data_by_seqnames()` which builds a `GenomeMap` of `Vec<U>` of the data, by sequence name. `GRanges::data_refs_by_seqnames()` also added, for references. - `GRanges::into_vecranegs()` added, with `From` method for `COITreesEmpty` to `VecRangesEmpty` added since it's needed. - Cleaned lifetime out of `IndexedDataContainer`, brought down to assoc. type level. Cleans up things a lot! - Remove section on readme on slow benchmarks -- this is now GH issue #4.
vsbuffalo
added a commit
that referenced
this issue
Feb 28, 2024
…added, etc. - Renamed `region_apply` in `region_map` to be more consistent with Rust's naming. - `retain_seqnames`, etc for `TsvRecordIterator`. - New `Sequences::region_map_into_granges()`, which applies a function to regions of a sequence, and puts the results in a new `GRanges`. - New `Sequences` test cases corresponding to `granges_test_case_01()`, and lots of new `Sequence`-related tests. - Added `map_into_array1()` and `map_into_array2()` for the --feature=ndarray, with tests. - `take_ranges()` and `take_both()` added. - New `GRanges::midpoints()` method, with tests. - New `GRanges::data_indices()` method, to build a `GenomeMap` out of the data indices. - New `GRanges::data_by_seqnames()` which builds a `GenomeMap` of `Vec<U>` of the data, by sequence name. `GRanges::data_refs_by_seqnames()` also added, for references. - `GRanges::into_vecranegs()` added, with `From` method for `COITreesEmpty` to `VecRangesEmpty` added since it's needed. - Cleaned lifetime out of `IndexedDataContainer`, brought down to assoc. type level. Cleans up things a lot! - `into_array1()` and `into_array2()` and tests. - Remove section on readme on slow benchmarks -- this is now GH issue #4.
Is this "apples to apples" ? (Pun intended.) The raw times reported for Apple Silicon are 10X less than for Ubuntu/x86. Are you sure that the Apple numbers reflect "large data" tests? |
On a second read, maybe the Apple Silicon numbers are okay? They're not 10x smaller but more like 2x? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
On large (1M ranges) datasets, bedtools map is beating GRanges. It look like it comes down to numerics:
where inline here refers to
#[inline(always)]
above theOperations::run()
(suggested by @molpopgen).The text was updated successfully, but these errors were encountered: