Skip to content

Commit

Permalink
Merge pull request #1475 from nf-core/umi_dedup_log_path
Browse files Browse the repository at this point in the history
Fix log publishing around umitools/ umicollapse
  • Loading branch information
pinin4fjords authored Dec 20, 2024
2 parents a27ec8e + 0a21c3f commit 8033764
Show file tree
Hide file tree
Showing 6 changed files with 137 additions and 34 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Special thanks to the following for their contributions to the release:
- [PR #1471](https://github.com/nf-core/rnaseq/pull/1471) - Fix prepare_genome subworkflow for sortmerna
- [PR #1473](https://github.com/nf-core/rnaseq/pull/1473) - Bump STAR modules
- [PR #1474](https://github.com/nf-core/rnaseq/pull/1474) - Bump versions to 3.18.0
- [PR #1475](https://github.com/nf-core/rnaseq/pull/1475) - Fix log publishing around umitools/ umicollapse

## Parameters

Expand Down
8 changes: 4 additions & 4 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ If multiple libraries/runs have been provided for the same sample in the input s

</details>

[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-tools dedup](#umi-tools-dedup) section.
[UMI-tools](https://github.com/CGATOxford/UMI-tools) and [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse) deduplicate reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI dedup](#umi-dedup) section.

To facilitate processing of input data which has the UMI barcode already embedded in the read name from the start, `--skip_umi_extract` can be specified in conjunction with `--with_umi`.

Expand Down Expand Up @@ -305,7 +305,7 @@ The original BAM files generated by the selected alignment algorithm are further

![MultiQC - SAMtools mapped reads per contig plot](images/mqc_samtools_idxstats.png)

### UMI-tools dedup
### UMI dedup

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -314,7 +314,7 @@ The original BAM files generated by the selected alignment algorithm are further
- `<SAMPLE>.umi_dedup.sorted.bam`: If `--save_umi_intermeds` is specified the UMI deduplicated, coordinate sorted BAM file containing read alignments will be placed in this directory.
- `<SAMPLE>.umi_dedup.sorted.bam.bai`: If `--save_umi_intermeds` is specified the BAI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
- `<SAMPLE>.umi_dedup.sorted.bam.csi`: If `--save_umi_intermeds --bam_csi_index` is specified the CSI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
- `<ALIGNER>/umitools/`
- `<ALIGNER>/umitools/` (UMI-tools only)
- `*_edit_distance.tsv`: Reports the (binned) average edit distance between the UMIs at each position.
- `*_per_umi.tsv`: UMI-level summary statistics.
- `*_per_umi_per_position.tsv`: Tabulates the counts for unique combinations of UMI and position.
Expand All @@ -323,7 +323,7 @@ The content of the files above is explained in more detail in the [UMI-tools doc

</details>

After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information using the UMI-tools `dedup` command. This will generate a filtered BAM file after the removal of PCR duplicates.
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information. UMI deduplication can be carried out either with [UMI-tools](https://github.com/CGATOxford/UMI-tools) or [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse), set via the `umi_dedup_tool` parameter. The output BAM files are the same, though UMI-tools has some additional outputs, as described above. Either method will generate a filtered BAM file after the removal of PCR duplicates.

### picard MarkDuplicates

Expand Down
2 changes: 1 addition & 1 deletion tests/.nftignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ umitools/*.umi_extract.log
{hisat2,star_rsem,star_salmon}/stringtie/*.ballgown/t_data.ctab
{hisat2,star_rsem,star_salmon}/stringtie/*.gene.abundance.txt
{hisat2,star_rsem,star_salmon}/stringtie/*.{coverage,transcripts}.gtf
{hisat2,star_rsem,star_salmon}/umitools/genomic_dedup_log/*_UMICollapse.log
{hisat2,star_rsem,star_salmon}/{umitools,umicollapse}/{genomic,transcriptomic}_dedup_log/*.log
{multiqc,multiqc/**}/multiqc_report.html
{multiqc,multiqc/**}/multiqc_report_data/fastqc_{raw,trimmed}_top_overrepresented_sequences_table.txt
{multiqc,multiqc/**}/multiqc_report_data/hisat2_pe_plot.txt
Expand Down
2 changes: 2 additions & 0 deletions tests/umi.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ nextflow_pipeline {
umi_dedup_tool = 'umicollapse'
aligner = 'hisat2'
outdir = "$outputDir"
save_umi_intermeds = true
}
}

Expand Down Expand Up @@ -49,6 +50,7 @@ nextflow_pipeline {
umitools_dedup_stats = true
skip_bbsplit = true
outdir = "$outputDir"
save_umi_intermeds = true
}
}

Expand Down
92 changes: 71 additions & 21 deletions tests/umi.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,10 @@
"star_salmon/RAP1_IAA_30M_REP1",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.sorted.bam",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.sorted.bam.bai",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.bam",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.filtered.bam",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.sorted.bam",
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.sorted.bam.bai",
"star_salmon/RAP1_IAA_30M_REP1/aux_info",
"star_salmon/RAP1_IAA_30M_REP1/aux_info/ambig_info.tsv",
"star_salmon/RAP1_IAA_30M_REP1/aux_info/expected_bias.gz",
Expand All @@ -629,6 +633,9 @@
"star_salmon/RAP1_UNINDUCED_REP1",
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.sorted.bam",
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.sorted.bam.bai",
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.bam",
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.sorted.bam",
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.sorted.bam.bai",
"star_salmon/RAP1_UNINDUCED_REP1/aux_info",
"star_salmon/RAP1_UNINDUCED_REP1/aux_info/ambig_info.tsv",
"star_salmon/RAP1_UNINDUCED_REP1/aux_info/expected_bias.gz",
Expand All @@ -646,6 +653,9 @@
"star_salmon/RAP1_UNINDUCED_REP2",
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.sorted.bam",
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.sorted.bam.bai",
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.bam",
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.sorted.bam",
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.sorted.bam.bai",
"star_salmon/RAP1_UNINDUCED_REP2/aux_info",
"star_salmon/RAP1_UNINDUCED_REP2/aux_info/ambig_info.tsv",
"star_salmon/RAP1_UNINDUCED_REP2/aux_info/expected_bias.gz",
Expand All @@ -663,6 +673,10 @@
"star_salmon/WT_REP1",
"star_salmon/WT_REP1.umi_dedup.sorted.bam",
"star_salmon/WT_REP1.umi_dedup.sorted.bam.bai",
"star_salmon/WT_REP1.umi_dedup.transcriptome.bam",
"star_salmon/WT_REP1.umi_dedup.transcriptome.filtered.bam",
"star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam",
"star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam.bai",
"star_salmon/WT_REP1/aux_info",
"star_salmon/WT_REP1/aux_info/ambig_info.tsv",
"star_salmon/WT_REP1/aux_info/expected_bias.gz",
Expand All @@ -680,6 +694,10 @@
"star_salmon/WT_REP2",
"star_salmon/WT_REP2.umi_dedup.sorted.bam",
"star_salmon/WT_REP2.umi_dedup.sorted.bam.bai",
"star_salmon/WT_REP2.umi_dedup.transcriptome.bam",
"star_salmon/WT_REP2.umi_dedup.transcriptome.filtered.bam",
"star_salmon/WT_REP2.umi_dedup.transcriptome.sorted.bam",
"star_salmon/WT_REP2.umi_dedup.transcriptome.sorted.bam.bai",
"star_salmon/WT_REP2/aux_info",
"star_salmon/WT_REP2/aux_info/ambig_info.tsv",
"star_salmon/WT_REP2/aux_info/expected_bias.gz",
Expand Down Expand Up @@ -1261,10 +1279,18 @@
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt",
"umitools",
"umitools/RAP1_IAA_30M_REP1.umi_extract.log",
"umitools/RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz",
"umitools/RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz",
"umitools/RAP1_UNINDUCED_REP1.umi_extract.fastq.gz",
"umitools/RAP1_UNINDUCED_REP1.umi_extract.log",
"umitools/RAP1_UNINDUCED_REP2.umi_extract.fastq.gz",
"umitools/RAP1_UNINDUCED_REP2.umi_extract.log",
"umitools/WT_REP1.umi_extract.log",
"umitools/WT_REP2.umi_extract.log"
"umitools/WT_REP1.umi_extract_1.fastq.gz",
"umitools/WT_REP1.umi_extract_2.fastq.gz",
"umitools/WT_REP2.umi_extract.log",
"umitools/WT_REP2.umi_extract_1.fastq.gz",
"umitools/WT_REP2.umi_extract_2.fastq.gz"
],
[
"genome_gfp.fasta:md5,e23e302af63736a199985a169fdac055",
Expand Down Expand Up @@ -1467,14 +1493,22 @@
"WT_REP2.umi_dedup.sorted_per_umi_per_position.tsv:md5,6f5656947a7f0076df446e6f40430027",
"WT_REP2.umi_dedup.transcriptome.sorted_edit_distance.tsv:md5,3e3c6a7e8996e566350742e9911366d3",
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi.tsv:md5,0c986c4cb7a77f650a19e2c454b9b179",
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi_per_position.tsv:md5,af9028dbdab81de3854a32cd1d19ac8b"
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi_per_position.tsv:md5,af9028dbdab81de3854a32cd1d19ac8b",
"RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz:md5,e83d7f738fbbfaa541a2e71fe4663447",
"RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz:md5,4f2873cbf584d6e84187238a4ae2b8fa",
"RAP1_UNINDUCED_REP1.umi_extract.fastq.gz:md5,9e42242fd68baac592140f63a8a716ce",
"RAP1_UNINDUCED_REP2.umi_extract.fastq.gz:md5,5a92b642927b8603c4765e5305e23e9c",
"WT_REP1.umi_extract_1.fastq.gz:md5,f312fac9c384a889ae4f959839263604",
"WT_REP1.umi_extract_2.fastq.gz:md5,ffca24924108fd54151620b7538b9e1a",
"WT_REP2.umi_extract_1.fastq.gz:md5,c3180451a24ce51fc35c1684521ae287",
"WT_REP2.umi_extract_2.fastq.gz:md5,067ff23f8d1307ad241cd70bc186b5c1"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.2"
"nf-test": "0.9.2",
"nextflow": "24.10.3"
},
"timestamp": "2024-12-11T18:07:55.751564456"
"timestamp": "2024-12-20T00:02:04.611696704"
},
"Params: --aligner hisat2 --umi_dedup_tool 'umicollapse'": {
"content": [
Expand Down Expand Up @@ -2130,13 +2164,13 @@
"hisat2/stringtie/WT_REP2.coverage.gtf",
"hisat2/stringtie/WT_REP2.gene.abundance.txt",
"hisat2/stringtie/WT_REP2.transcripts.gtf",
"hisat2/umitools",
"hisat2/umitools/genomic_dedup_log",
"hisat2/umitools/genomic_dedup_log/RAP1_IAA_30M_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umitools/genomic_dedup_log/RAP1_UNINDUCED_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umitools/genomic_dedup_log/RAP1_UNINDUCED_REP2.umi_dedup.sorted_UMICollapse.log",
"hisat2/umitools/genomic_dedup_log/WT_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umitools/genomic_dedup_log/WT_REP2.umi_dedup.sorted_UMICollapse.log",
"hisat2/umicollapse",
"hisat2/umicollapse/genomic_dedup_log",
"hisat2/umicollapse/genomic_dedup_log/RAP1_IAA_30M_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umicollapse/genomic_dedup_log/RAP1_UNINDUCED_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umicollapse/genomic_dedup_log/RAP1_UNINDUCED_REP2.umi_dedup.sorted_UMICollapse.log",
"hisat2/umicollapse/genomic_dedup_log/WT_REP1.umi_dedup.sorted_UMICollapse.log",
"hisat2/umicollapse/genomic_dedup_log/WT_REP2.umi_dedup.sorted_UMICollapse.log",
"multiqc",
"multiqc/hisat2",
"multiqc/hisat2/multiqc_report.html",
Expand Down Expand Up @@ -2548,10 +2582,18 @@
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt",
"umitools",
"umitools/RAP1_IAA_30M_REP1.umi_extract.log",
"umitools/RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz",
"umitools/RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz",
"umitools/RAP1_UNINDUCED_REP1.umi_extract.fastq.gz",
"umitools/RAP1_UNINDUCED_REP1.umi_extract.log",
"umitools/RAP1_UNINDUCED_REP2.umi_extract.fastq.gz",
"umitools/RAP1_UNINDUCED_REP2.umi_extract.log",
"umitools/WT_REP1.umi_extract.log",
"umitools/WT_REP2.umi_extract.log"
"umitools/WT_REP1.umi_extract_1.fastq.gz",
"umitools/WT_REP1.umi_extract_2.fastq.gz",
"umitools/WT_REP2.umi_extract.log",
"umitools/WT_REP2.umi_extract_1.fastq.gz",
"umitools/WT_REP2.umi_extract_2.fastq.gz"
],
[
"genome_gfp.fasta:md5,e23e302af63736a199985a169fdac055",
Expand Down Expand Up @@ -2688,14 +2730,22 @@
"cmd_info.json:md5,809380ddce725a8fab75dd7741b64bf6",
"lib_format_counts.json:md5,d231ba7624b67eb654989f69530e2925",
"R_sessionInfo.log:md5,fb0da0d7ad6994ed66a8e68348b19676",
"tx2gene.tsv:md5,0e2418a69d2eba45097ebffc2f700bfe"
"tx2gene.tsv:md5,0e2418a69d2eba45097ebffc2f700bfe",
"RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz:md5,e83d7f738fbbfaa541a2e71fe4663447",
"RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz:md5,4f2873cbf584d6e84187238a4ae2b8fa",
"RAP1_UNINDUCED_REP1.umi_extract.fastq.gz:md5,9e42242fd68baac592140f63a8a716ce",
"RAP1_UNINDUCED_REP2.umi_extract.fastq.gz:md5,5a92b642927b8603c4765e5305e23e9c",
"WT_REP1.umi_extract_1.fastq.gz:md5,f312fac9c384a889ae4f959839263604",
"WT_REP1.umi_extract_2.fastq.gz:md5,ffca24924108fd54151620b7538b9e1a",
"WT_REP2.umi_extract_1.fastq.gz:md5,c3180451a24ce51fc35c1684521ae287",
"WT_REP2.umi_extract_2.fastq.gz:md5,067ff23f8d1307ad241cd70bc186b5c1"
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.2"
"nf-test": "0.9.2",
"nextflow": "24.10.3"
},
"timestamp": "2024-12-11T18:01:45.228731692"
"timestamp": "2024-12-19T22:33:42.012684597"
},
"--umi_dedup_tool 'umitools - stub": {
"content": [
Expand Down Expand Up @@ -2804,9 +2854,9 @@
]
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.10.2"
"nf-test": "0.9.2",
"nextflow": "24.10.3"
},
"timestamp": "2024-12-11T18:08:48.404716766"
"timestamp": "2024-12-19T23:28:01.570835895"
}
}
}
Loading

0 comments on commit 8033764

Please sign in to comment.