nf-core · alexanderchang1 · Sep 4, 2024 · Sep 4, 2024 · Sep 4, 2024 · Sep 4, 2024
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [1638](https://github.com/nf-core/sarek/pull/1638) - Added additional documentation detailing ASCAT WES usage.
 - [1640](https://github.com/nf-core/sarek/pull/1620) - Add `lofreq` as a tumor-only variant caller
 - [1642](https://github.com/nf-core/sarek/pull/1642) - Back to dev
+- [1646](https://github.com/nf-core/sarek/pull/1646) - Added asmultipcf functionality for multisample ASCAT calls.
 - [1653](https://github.com/nf-core/sarek/pull/1653) - Updates `sarek_subway` files with `lofreq`
 - [1660](https://github.com/nf-core/sarek/pull/1642) - Add `--length_required` for minimal reads length with `FASTP`
 - [1663](https://github.com/nf-core/sarek/pull/1663) - Massive conda modules update

@@ -0,0 +1,24 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/sarek -profile test,<extra_test_profile>,<docker/singularity> --outdir <OUTDIR>
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*/
+
+params {
+    input                 = "${projectDir}/tests/csv/3.0/ascat_somatic_asmultipcf.csv"
+    genome                = 'GATK.GRCh37'
+    germline_resource_tbi = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz.tbi"
+    ascat_loci            = "G1000_loci_hg38.zip"
+    ascat_min_base_qual   = 30
+    chr_dir               = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/sequence/chromosomes.tar.gz"
+    germline_resource     = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz"
+    intervals             = "${params.modules_testdata_base_path}/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.bed"
+    step                  = 'variant_calling'
+    tools                 = 'ascat'
+    wes                   = false
+}
@@ -712,6 +712,13 @@ The output is a tab delimited text file with the following columns:
 
 The file `<tumorsample_vs_normalsample>.cnvs.txt` contains all segments predicted by ASCAT, both those with normal copy number (nMinor = 1 and nMajor =1) and those corresponding to copy number aberrations.
 
+--asmultipcf if this is turned on. It will run local module asmultipcf which corrects segments calls on multiple samples from the same patient. This will give you two additional output files
+
+- `<tumorsample_vs_normalsample>._asmultipcf_purityploidy.txt`
+  - file with information about purity and ploidy corrected for multiple samples
+- `<tumorsample_vs_normalsample>._asmultipcf_segments.txt`
+  - file with information about copy number segments corrected for multiple samples
+
 </details>
 
 #### CNVKit

@@ -0,0 +1,8 @@
+name: asmultipcf
+channels:
+  - conda-forge
+  - bioconda
+  - defaults
+dependencies:
+  - bioconda::ascat=3.1.1
+  - bioconda::cancerit-allelecount=4.3.0
@@ -0,0 +1,66 @@
+process ASMULTIPCF {
+    tag "$meta.id"
+    label 'process_medium'
+
+    conda "${moduleDir}/environment.yml"
+    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+        'https://depot.galaxyproject.org/singularity/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:ba3e6d2157eac2d38d22e62ec87675e12adb1010-0':
+        'biocontainers/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:ba3e6d2157eac2d38d22e62ec87675e12adb1010-0' }"
+
+    input:
+    tuple val(meta), path(tumor_logr_files), path(tumor_baf_files), path(normal_logr_file), path(normal_baf_file)
+
+
+    output:
+    tuple val(meta), path("*_asmultipcf_segments.txt"), emit: asmultipcf_segments
+    tuple val(meta), path("*_asmultipcf_purityploidy.txt"), emit: asmultipcf_purityploidy
+    path "versions.yml", emit: versions
+
+    script:
+    def args = task.ext.args ?: ''
+    def prefix = task.ext.prefix ?: "${meta.id}"
+    """
+    #!/usr/bin/env Rscript
+    library(ASCAT)
+
+    # Concatenate tumor LogR files
+    tumor_logr_data <- do.call(cbind, lapply(strsplit("${tumor_logr_files}", " "), function(file) {
+        read.table(file, header = TRUE, check.names = FALSE)
+    }))
+    write.table(tumor_logr_data, file = "combined_tumor_logr.txt", sep = "\t", quote = FALSE, row.names = FALSE)
+
+    # Concatenate tumor BAF files
+    tumor_baf_data <- do.call(cbind, lapply(strsplit("${tumor_baf_files}", " "), function(file) {
+        read.table(file, header = TRUE, check.names = FALSE)
+    }))
+    write.table(tumor_baf_data, file = "combined_tumor_baf.txt", sep = "\t", quote = FALSE, row.names = FALSE)
+
+    # Load the data
+    ascat.bc <- ascat.loadData(
+        Tumor_LogR_file = "combined_tumor_logr.txt",
+        Tumor_BAF_file = "combined_tumor_baf.txt",
+        Germline_LogR_file = "$normal_logr_file",
+        Germline_BAF_file = "$normal_baf_file"
+    )
+
+    # Run multi-sample segmentation
+    ascat.bc <- ascat.asmultipcf(ascat.bc, penalty = ${params.ascat_asmultipcf_penalty ?: 5})
+
+    # Run ASCAT
+    ascat.output <- ascat.runAscat(ascat.bc)
+
+    # Write out segmented regions
+    write.table(ascat.output[["segments"]], file="${prefix}_asmultipcf_segments.txt", sep="\t", quote=FALSE, row.names=FALSE)
+
+    # Write out purity and ploidy info
+    purity_ploidy <- data.frame(
+        Sample = names(ascat.output\$aberrantcellfraction),
+        Purity = unlist(ascat.output\$aberrantcellfraction),
+        Ploidy = unlist(ascat.output\$ploidy)
+    )
+    write.table(purity_ploidy, file="${prefix}_asmultipcf_purityploidy.txt", sep="\t", quote=FALSE, row.names=FALSE)
+
+    # Version export
+    writeLines(c("\\"${task.process}\\":", paste0("    ascat: ", packageVersion("ASCAT"))), "versions.yml")
+    """
+}
@@ -0,0 +1,119 @@
+name: asmultipcf
+description: Performs multi-sample segmentation using ASCAT
+keywords:
+  - bam
+  - copy number
+  - cram
+tools:
+  - ascat:
+      description: ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy. ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell), expressed as multiples of haploid genomes from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).
+      documentation: https://github.com/VanLoo-lab/ascat/tree/master/man
+      tool_dev_url: https://github.com/VanLoo-lab/ascat
+      doi: "10.1093/bioinformatics/btaa538"
+      licence: ["GPL v3"]
+input:
+  - args:
+      type: map
+      description: |
+        Groovy Map containing tool parameters. MUST follow the structure/keywords below and be provided via modules.config. Parameters must be set between quotes. (optional) parameters can be removed from the map, if they are not set. For default values, please check the documentation above.
+
+        ```
+        {
+          [
+            "gender": "XX",
+            "genomeVersion": "hg19"
+            "purity": (optional),
+            "ploidy": (optional),
+            "gc_files": (optional),
+            "minCounts": (optional),
+            "BED_file": (optional) but recommended for WES,
+            "chrom_names": (optional),
+            "min_base_qual": (optional),
+            "min_map_qual": (optional),
+            "ref_fasta": (optional),
+            "skip_allele_counting_tumour": (optional),
+            "skip_allele_counting_normal": (optional)
+          ]
+        }
+        ```
+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test', single_end:false ]
+  - input_normal:
+      type: file
+      description: BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation For modifying chromosome notation in bam files please follow https://josephcckuo.wordpress.com/2016/11/17/modify-chromosome-notation-in-bam-file/.
+      pattern: "*.{bam,cram}"
+  - index_normal:
+      type: file
+      description: index for normal_bam/cram
+      pattern: "*.{bai,crai}"
+  - input_tumor:
+      type: file
+      description: BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation
+      pattern: "*.{bam,cram}"
+  - index_tumor:
+      type: file
+      description: index for tumor_bam/cram
+      pattern: "*.{bai,crai}"
+  - allele_files:
+      type: file
+      description: allele files for ASCAT WGS. Can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS
+  - loci_files:
+      type: file
+      description: loci files for ASCAT WGS. Loci files without chromosome notation can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS Make sure the chromosome notation matches the bam/cram input files. To add the chromosome notation to loci files (hg19/hg38) if necessary, you can run this command `if [[ $(samtools view <your_bam_file.bam> | head -n1 | cut -f3)\" == *\"chr\"* ]]; then for i in {1..22} X; do sed -i 's/^/chr/' G1000_loci_hg19_chr_${i}.txt; done; fi`
+  - bed_file:
+      type: file
+      description: Bed file for ASCAT WES (optional, but recommended for WES)
+  - fasta:
+      type: file
+      description: Reference fasta file (optional)
+  - gc_file:
+      type: file
+      description: GC correction file (optional) - Used to do logR correction of the tumour sample(s) with genomic GC content
+  - rt_file:
+      type: file
+      description: replication timing correction file (optional, provide only in combination with gc_file)
+output:
+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test', single_end:false ]
+  - allelefreqs:
+      type: file
+      description: Files containing allee frequencies per chromosome
+      pattern: "*{alleleFrequencies_chr*.txt}"
+  - metrics:
+      type: file
+      description: File containing quality metrics
+      pattern: "*.{metrics.txt}"
+  - png:
+      type: file
+      description: ASCAT plots
+      pattern: "*.{png}"
+  - purityploidy:
+      type: file
+      description: File with purity and ploidy data
+      pattern: "*.{purityploidy.txt}"
+  - segments:
+      type: file
+      description: File with multi-sample segments data
+      pattern: "*.{asmultipcf_segments.txt}"
+  - versions:
+      type: file
+      description: File containing software versions
+      pattern: "versions.yml"
+authors:
+  - "@aasNGC"
+  - "@lassefolkersen"
+  - "@FriederikeHanssen"
+  - "@maxulysse"
+  - "@SusiJo"
+maintainers:
+  - "@aasNGC"
+  - "@lassefolkersen"
+  - "@FriederikeHanssen"
+  - "@maxulysse"
+  - "@SusiJo"