Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asmultipcf implementation for multi-sample ASCAT CNV calling #1646

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [1638](https://github.com/nf-core/sarek/pull/1638) - Added additional documentation detailing ASCAT WES usage.
- [1640](https://github.com/nf-core/sarek/pull/1620) - Add `lofreq` as a tumor-only variant caller
- [1642](https://github.com/nf-core/sarek/pull/1642) - Back to dev
- [1646](https://github.com/nf-core/sarek/pull/1646) - Added asmultipcf functionality for multisample ASCAT calls.
- [1653](https://github.com/nf-core/sarek/pull/1653) - Updates `sarek_subway` files with `lofreq`
- [1660](https://github.com/nf-core/sarek/pull/1642) - Add `--length_required` for minimal reads length with `FASTP`
- [1663](https://github.com/nf-core/sarek/pull/1663) - Massive conda modules update
Expand Down
24 changes: 24 additions & 0 deletions conf/test/tools_somatic_ascat_asmultipcf.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/sarek -profile test,<extra_test_profile>,<docker/singularity> --outdir <OUTDIR>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

params {
input = "${projectDir}/tests/csv/3.0/ascat_somatic_asmultipcf.csv"
genome = 'GATK.GRCh37'
germline_resource_tbi = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz.tbi"
ascat_loci = "G1000_loci_hg38.zip"
ascat_min_base_qual = 30
chr_dir = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/sequence/chromosomes.tar.gz"
germline_resource = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz"
intervals = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.bed"
step = 'variant_calling'
tools = 'ascat'
wes = false
}
7 changes: 7 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,13 @@ The output is a tab delimited text file with the following columns:

The file `<tumorsample_vs_normalsample>.cnvs.txt` contains all segments predicted by ASCAT, both those with normal copy number (nMinor = 1 and nMajor =1) and those corresponding to copy number aberrations.

--asmultipcf if this is turned on. It will run local module asmultipcf which corrects segments calls on multiple samples from the same patient. This will give you two additional output files

- `<tumorsample_vs_normalsample>._asmultipcf_purityploidy.txt`
- file with information about purity and ploidy corrected for multiple samples
- `<tumorsample_vs_normalsample>._asmultipcf_segments.txt`
- file with information about copy number segments corrected for multiple samples

</details>

#### CNVKit
Expand Down
8 changes: 8 additions & 0 deletions modules/local/asmultipcf/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: asmultipcf
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bioconda::ascat=3.1.1
- bioconda::cancerit-allelecount=4.3.0
66 changes: 66 additions & 0 deletions modules/local/asmultipcf/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
process ASMULTIPCF {
tag "$meta.id"
label 'process_medium'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:ba3e6d2157eac2d38d22e62ec87675e12adb1010-0':
'biocontainers/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:ba3e6d2157eac2d38d22e62ec87675e12adb1010-0' }"

input:
tuple val(meta), path(tumor_logr_files), path(tumor_baf_files), path(normal_logr_file), path(normal_baf_file)


output:
tuple val(meta), path("*_asmultipcf_segments.txt"), emit: asmultipcf_segments
tuple val(meta), path("*_asmultipcf_purityploidy.txt"), emit: asmultipcf_purityploidy
path "versions.yml", emit: versions

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
#!/usr/bin/env Rscript
library(ASCAT)

# Concatenate tumor LogR files
tumor_logr_data <- do.call(cbind, lapply(strsplit("${tumor_logr_files}", " "), function(file) {
read.table(file, header = TRUE, check.names = FALSE)
}))
write.table(tumor_logr_data, file = "combined_tumor_logr.txt", sep = "\t", quote = FALSE, row.names = FALSE)

# Concatenate tumor BAF files
tumor_baf_data <- do.call(cbind, lapply(strsplit("${tumor_baf_files}", " "), function(file) {
read.table(file, header = TRUE, check.names = FALSE)
}))
write.table(tumor_baf_data, file = "combined_tumor_baf.txt", sep = "\t", quote = FALSE, row.names = FALSE)

# Load the data
ascat.bc <- ascat.loadData(
Tumor_LogR_file = "combined_tumor_logr.txt",
Tumor_BAF_file = "combined_tumor_baf.txt",
Germline_LogR_file = "$normal_logr_file",
Germline_BAF_file = "$normal_baf_file"
)

# Run multi-sample segmentation
ascat.bc <- ascat.asmultipcf(ascat.bc, penalty = ${params.ascat_asmultipcf_penalty ?: 5})

# Run ASCAT
ascat.output <- ascat.runAscat(ascat.bc)

# Write out segmented regions
write.table(ascat.output[["segments"]], file="${prefix}_asmultipcf_segments.txt", sep="\t", quote=FALSE, row.names=FALSE)

# Write out purity and ploidy info
purity_ploidy <- data.frame(
Sample = names(ascat.output\$aberrantcellfraction),
Purity = unlist(ascat.output\$aberrantcellfraction),
Ploidy = unlist(ascat.output\$ploidy)
)
write.table(purity_ploidy, file="${prefix}_asmultipcf_purityploidy.txt", sep="\t", quote=FALSE, row.names=FALSE)

# Version export
writeLines(c("\\"${task.process}\\":", paste0(" ascat: ", packageVersion("ASCAT"))), "versions.yml")
"""
}
119 changes: 119 additions & 0 deletions modules/local/asmultipcf/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
name: asmultipcf
description: Performs multi-sample segmentation using ASCAT
keywords:
- bam
- copy number
- cram
tools:
- ascat:
description: ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy. ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell), expressed as multiples of haploid genomes from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).
documentation: https://github.com/VanLoo-lab/ascat/tree/master/man
tool_dev_url: https://github.com/VanLoo-lab/ascat
doi: "10.1093/bioinformatics/btaa538"
licence: ["GPL v3"]
input:
- args:
type: map
description: |
Groovy Map containing tool parameters. MUST follow the structure/keywords below and be provided via modules.config. Parameters must be set between quotes. (optional) parameters can be removed from the map, if they are not set. For default values, please check the documentation above.

```
{
[
"gender": "XX",
"genomeVersion": "hg19"
"purity": (optional),
"ploidy": (optional),
"gc_files": (optional),
"minCounts": (optional),
"BED_file": (optional) but recommended for WES,
"chrom_names": (optional),
"min_base_qual": (optional),
"min_map_qual": (optional),
"ref_fasta": (optional),
"skip_allele_counting_tumour": (optional),
"skip_allele_counting_normal": (optional)
]
}
```
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- input_normal:
type: file
description: BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation For modifying chromosome notation in bam files please follow https://josephcckuo.wordpress.com/2016/11/17/modify-chromosome-notation-in-bam-file/.
pattern: "*.{bam,cram}"
- index_normal:
type: file
description: index for normal_bam/cram
pattern: "*.{bai,crai}"
- input_tumor:
type: file
description: BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation
pattern: "*.{bam,cram}"
- index_tumor:
type: file
description: index for tumor_bam/cram
pattern: "*.{bai,crai}"
- allele_files:
type: file
description: allele files for ASCAT WGS. Can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS
- loci_files:
type: file
description: loci files for ASCAT WGS. Loci files without chromosome notation can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS Make sure the chromosome notation matches the bam/cram input files. To add the chromosome notation to loci files (hg19/hg38) if necessary, you can run this command `if [[ $(samtools view <your_bam_file.bam> | head -n1 | cut -f3)\" == *\"chr\"* ]]; then for i in {1..22} X; do sed -i 's/^/chr/' G1000_loci_hg19_chr_${i}.txt; done; fi`
- bed_file:
type: file
description: Bed file for ASCAT WES (optional, but recommended for WES)
- fasta:
type: file
description: Reference fasta file (optional)
- gc_file:
type: file
description: GC correction file (optional) - Used to do logR correction of the tumour sample(s) with genomic GC content
- rt_file:
type: file
description: replication timing correction file (optional, provide only in combination with gc_file)
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- allelefreqs:
type: file
description: Files containing allee frequencies per chromosome
pattern: "*{alleleFrequencies_chr*.txt}"
- metrics:
type: file
description: File containing quality metrics
pattern: "*.{metrics.txt}"
- png:
type: file
description: ASCAT plots
pattern: "*.{png}"
- purityploidy:
type: file
description: File with purity and ploidy data
pattern: "*.{purityploidy.txt}"
- segments:
type: file
description: File with multi-sample segments data
pattern: "*.{asmultipcf_segments.txt}"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@aasNGC"
- "@lassefolkersen"
- "@FriederikeHanssen"
- "@maxulysse"
- "@SusiJo"
maintainers:
- "@aasNGC"
- "@lassefolkersen"
- "@FriederikeHanssen"
- "@maxulysse"
- "@SusiJo"
8 changes: 8 additions & 0 deletions modules/nf-core/asmultipcf/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

66 changes: 66 additions & 0 deletions modules/nf-core/asmultipcf/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading