Skip to content

Commit

Permalink
Benchmark windows and lots of doc clarity updates
Browse files Browse the repository at this point in the history
  • Loading branch information
vsbuffalo committed Feb 26, 2024
1 parent 3e22670 commit 9d1cf48
Show file tree
Hide file tree
Showing 6 changed files with 244 additions and 98 deletions.
48 changes: 46 additions & 2 deletions benches/bedtools_comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fn bench_range_adjustment(c: &mut Criterion) {
let input_bedfile = random_bed3file(BED_LENGTH);

// configure the sample size for the group
// group.sample_size(10);
group.sample_size(10);

// bedtools slop
group.bench_function("bedtools_slop", |b| {
Expand Down Expand Up @@ -145,10 +145,54 @@ fn bench_flank(c: &mut Criterion) {
});
}

fn bench_windows(c: &mut Criterion) {
let width = 100_000;
let step = 1_000;

// create the benchmark group
let mut group = c.benchmark_group("windows");

// configure the sample size for the group
// group.sample_size(10);
group.bench_function("bedtools_makewindows", |b| {
b.iter(|| {
let bedtools_output = Command::new("bedtools")
.arg("makewindows")
.arg("-g")
.arg("tests_data/hg38_seqlens.tsv")
.arg("-w")
.arg(width.to_string())
.arg("-s")
.arg(step.to_string())
.output()
.expect("bedtools makewindows failed");
assert!(bedtools_output.status.success());
});
});

group.bench_function("granges_windows", |b| {
b.iter(|| {
let granges_output = Command::new(granges_binary_path())
.arg("windows")
.arg("--genome")
.arg("tests_data/hg38_seqlens.tsv")
.arg("--width")
.arg(width.to_string())
.arg("--step")
.arg(step.to_string())
.output()
.expect("granges windows failed");
assert!(granges_output.status.success());
});
});
}


criterion_group!(
benches,
bench_filter_adjustment,
bench_range_adjustment,
bench_flank,
);
bench_windows,
);
criterion_main!(benches);
8 changes: 4 additions & 4 deletions src/granges.rs
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ where
/// [`GRanges<VecRangesIndexed, JoinData>`].
///
/// The [`JoinData`] container contains the owned left data container and has
/// a reference to the right data container, as as well as a [`Vec<LeftGroupedJoin`]
/// a reference to the right data container, as as well as a [`Vec<LeftGroupedJoin>`]
/// that contains information about each overlap between a left and zero or more right
/// ranges.
fn left_overlaps(
Expand Down Expand Up @@ -674,7 +674,7 @@ where
/// [`GRanges<VecRangesIndexed, JoinDataRightEmpty>`].
///
/// The [`JoinData`] container contains the left data container and has
/// a reference to the right data container, as as well as a [`Vec<LeftGroupedJoin`]
/// a reference to the right data container, as as well as a [`Vec<LeftGroupedJoin>`]
/// that contains information about each overlap between a left and zero or more right
/// ranges.
fn left_overlaps(
Expand Down Expand Up @@ -726,7 +726,7 @@ where
/// [`GRanges<VecRangesIndexed, JoinDataLeftEmpty>`].
///
/// The [`JoinDataLeftEmpty`] contains no left data, and a reference to the
/// right data container, as as well as a [`Vec<LeftGroupedJoin`]
/// right data container, as as well as a [`Vec<LeftGroupedJoin>`]
/// that contains information about each overlap between a left and zero or more right
/// ranges.
fn left_overlaps(
Expand Down Expand Up @@ -778,7 +778,7 @@ where
/// [`GRanges<VecRangesIndexed, JoinDataBothEmpty>`].
///
/// The [`JoinDataBothEmpty`] contains no data, since neither left of right
/// [`GRanges`] objects had data. However, it does contain a [`Vec<LeftGroupedJoin`],
/// [`GRanges`] objects had data. However, it does contain a [`Vec<LeftGroupedJoin>`],
/// and each [`LeftGroupedJoin`] contains information about the number of overlapping
/// ranges and their lengths. This can be used to summarize, e.g. the number
/// of overlapping basepairs, the overlap fraction, etc.
Expand Down
29 changes: 0 additions & 29 deletions src/io/parsers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,35 +70,6 @@
//! iterator types defined in [`GenomicRangesParser`]. The best examples of this are the
//! `granges` subcommands implementations.
//!
//! ## File Format Terminology
//!
//! - BED3
//! - BED*
//! - BED-like
//!
//!
//! : it could be ranges-only (i.e. a BED3), or contain data (e.g. BED5). The lazy BED parser will
//! output a [`GenomicRangeRecord<Option<String>>`], where the data would be `None` only in the
//! case that three columns were encountered in the file (which must be a BED3).
//!
//! In GRanges, there are two types of ranges: ranges with an index to an element in the data
//! container, and ranges without indices (i.e. what we would use in the case of processing a BED3
//! file). Since a [`GRanges`] object needs to have a single, concrete range type in its range
//! containers, it must be known *at compile time* how one should convert the
//!
//! Downstream pipelines must immediately determine how to handle whether there is additional data,
//! or all of the [`GenomicRangeRecord`] entries are `None`, and
//!
//!
//! # BED-like File Parser Design
//!
//! All BED formats (BED3, BED5, etc) are built upon a BED3. Often when working with these types
//! of formats, many operations do not immediately require full parsing of the line, past the
//! *range components*. This is because downstream operations may immediately filter away entries
//! (e.g. based on width), or do an overlap operation, and then filter away entries based on some
//! overlap criteria. Either way, it may be advantageous to work with ranges that just store some
//! generic data.
//!
//! [`GRanges`]: crate::granges::GRanges
//! [`GRanges<R, T>`]: crate::granges::GRanges
//! [`GRangesEmpty`]: crate::granges::GRangesEmpty
Expand Down
Loading

0 comments on commit 9d1cf48

Please sign in to comment.