Skip to content

ABySS Performance Tips

Ben Vandervalk edited this page Oct 22, 2013 · 2 revisions

Parallel Assemblies (MPI)

  1. Split your input files into smaller pieces.

As of version 1.3.6, ABySS can load up to N files in parallel, where N is the number of ABySS MPI processes ("ranks"). Loading files in parallel can significantly reduce assembly time. For example, we recently benchmarked an assembly of human individual NA12878 (from the 1000 Genomes Project) with a single input bam file vs. 10 input bam files, and found that the overall ~ 3-day assembly time was reduced by about 12 hours.

Caveat: Each input PET/MPET library should be split into no more than ~ 50 files. This constraint is due to the fact that abyss-map, the default alignment program used by the ABySS pipeline, opens all of the files in each library simultaneously. Having too many files open at once causes abyss-map to perform poorly.