Skip to content

Dividing heterogeneous long-read sequencing into groups with de Bruijn graphs

License

Notifications You must be signed in to change notification settings

bluenote-1577/devider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

devider - long-read haplotypes from mixtures of "small" sequences

devider is a method that separates long reads (Nanopore or PacBio) into groups with similar alleles. This is called "phasing" or "haplotyping".

devider is a "local haplotyping" method, so it works best when the sequence-of-interest is approximately the size of the reads. For bacterial genome-scale haplotyping, consider another tool such as floria.

Example use cases:

  • mixed viral long-read samples (e.g. co-infections or quasispecies)
  • amplicon/enriched sequencing of specific genes
  • haplotyping small sections of multi-strain bacterial communities

High-depth, heterogeneous sequencing that spans a 1kb gene.

Separated groups ("haplotypes") after running devider.

Why devider?

Similar tools exist for detection of similar haplotypes in mixtures. devider was developed to fill the following gaps:

  • Speed and low-memory - devider scales approximately linearly with sequencing depth and # of SNPs. > 30,000x coverage genes can be haplotyped in minutes.
  • High heterogeneity and coverage - devider uses a de Bruijn Graph approach, which works with very diverse samples (> 10 haplotypes)
  • Ease-of-use + interpretable outputs - conda installable, engineered in rust, simple command line. Outputs are easy to interpret (haplotagged BAM or MSA).

Install

Conda (preferred)

mamba install -c bioconda devider
devider -h 

Static binary (only x86_64 architectures, without extra pipeline scripts)

wget https://github.com/bluenote-1577/devider/releases/download/latest/devider
chmod +x devider
./devider

See the installation instructions on the wiki if you want want to compile devider (written in Rust) or you're not on x86-64 CPUs.

Quick Start after install

Option 1 (more flexible): Running devider with VCF + BAM

git clone https://github.com/bluenote-1577/devider
cd devider
devider -b hiv_test/3000_95_3.bam  -v hiv_test/3000_95_3.vcf.gz  -r hiv_test/OR483991.1.fasta -o devider_output

# results folder
ls devider_output

Option 2 (easier): Running devider with reads

If installed from conda:

git clone https://github.com/bluenote-1577/devider
cd devider
run_devider_pipeline -i hiv_test/3000_95_3.fastq.gz -r hiv_test/OR483991.1.fasta -o devider_pipeline_output 

# results folder
ls devider_pipeline_output

# intermediate files (bam + vcf files)
ls devider_pipeline_output/pipeline_files

If you did not install via conda and want to run the pipeline script, ensure the following are in PATH.

  • tabix
  • minimap2
  • lofreq
  • devider

Then run scripts/run_devider_pipeline in the GitHub repository.

How to use devider

Citation

devider: long-read reconstruction of many diverse haplotypes. Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes, Heng Li. bioRxiv (2024).

About

Dividing heterogeneous long-read sequencing into groups with de Bruijn graphs

Resources

License

Stars

Watchers

Forks

Packages

No packages published