Skip to content

VincentGardeux/CAW

 
 

Repository files navigation

Cancer Analysis Workflow

caw version Licence nextflow version Join the chat at https://gitter.im/SciLifeLab/CAW Travis status

CAW is a complete open source pipeline to detect somatic variants from WGS data developed at the National Genomics Infastructure at SciLifeLab Stockholm, Sweden.

The pipeline uses Nextflow, a bioinformatics domain specific language for workflow building.

This pipeline is primarily used with a SLURM cluster on the Swedish UPPMAX systems. However, the pipeline should be able to run on any system that Nextflow supports. We have done some limited testing using Docker, and the pipeline comes with some configuration for such system. See the installation documentation for more information.

We utilize GATK best practices to align, realign and recalibrate short-read data in parallel for both normal and tumor sample. After these preprocessing steps, several somatic variant callers scan the resulting BAM files: MuTect1, MuTect2 and Strelka are used to find somatic SNVs and small indels, also GATK HaplotyeCaller for both the normal and the tumor sample. For structural variants we use Manta. Furthermore, we are applying ASCAT to estimate sample heterogeneity, ploidy and CNVs.

The pipeline can begin the analysis either from raw FASTQ files, only from the realignment step, or directly with any subset of variant callers using recalibrated BAM files. At the end of the analysis the resulting VCF files are merged to facilitate further downstream processing, though results from each caller are also retained. The flow is capable of accommodating additional variant calling software or CNV callers. It is also prepared to process normal, tumor and several relapse samples.

Besides variant calls, the workflow provides quality controls presented by MultiQC.

The CAW-containers repository contains Dockerfiles for each process for easier deployment.

Documentation

The CAW pipeline comes with documentation about the pipeline, found in the doc/ directory:

  1. Installation documentation
  2. Reference files documentation
  3. Running the pipeline
  4. Examples
  5. TSV file documentation
  6. Processes documentation
  7. Tools and dependencies
  8. More information about ASCAT
  9. Folder structure

For further information/help contact: [email protected], [email protected] or join the gitter chat: gitter.im/SciLifeLab/CAW.

Authors


Packages

No packages published

Languages

  • Groovy 43.4%
  • R 42.6%
  • Python 11.2%
  • Shell 2.4%
  • Awk 0.4%