Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Assembly Spades #513

Open
ingridvanw opened this issue May 6, 2024 · 3 comments
Open

[question] Assembly Spades #513

ingridvanw opened this issue May 6, 2024 · 3 comments
Labels
fixed question Further information is requested

Comments

@ingridvanw
Copy link

ingridvanw commented May 6, 2024

Hi Robert,

I tried to run bactopia version v3.0.0 with this commandline:
bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades --shovill_opts "--isolate"

SAB.txt looks like
sample runtype genome_size species r1 r2 extra
B12345-1 paired-end 2800000 Staphylococcus aureus B12345-1_WGS_R1.fastq.gz B12345-1_WGS_R2.fastq.gz

Seems that this is working well.

Now I tried to run bactopia version v3.0.1 with the same commandline
However, then no assembly could be made
Using skesa was no problem

"main" > " assembler" > assembler_error.txt
B12345-1_WGS assembled successfully, but 0 contigs were formed. Please investigate B12345-1_WGS to determine a cause (e.g. metagenomic, contaminants, etc...) for this outcome. Further assembly-based analysis of B12345-1 will be discontinued.

nf-assembler.log
R1 B12345-1_WGS_R1.fastq.gz
R2 B12345-1_WGS_R2.fastq.gz
SE null
[shovill] Hello ingrid
[shovill] You ran: /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/shovill --R1 B12345-1_WGS_R1.fastq.gz --R2 B12345-1_WGS_R2.fastq.gz --gsize 2800000 --outdir results --assembler spades --opts true --minlen 500 --mincov 2 --force --keepfiles --depth 0 --noreadcorr --namefmt B12345-1_WGS_%05d --cpus 4 --ram 7
[shovill] This is shovill 1.1.0
[shovill] Written by Torsten Seemann
[shovill] Homepage is https://github.com/tseemann/shovill
[shovill] Operating system is linux
[shovill] Perl version is v5.32.1
[shovill] Machine has 32 CPU cores and 62.70 GB RAM
[shovill] Using bwa - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/bwa | Version: 0.7.18-r1243-dirty
[shovill] Using flash - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/flash | FLASH v1.2.11
[shovill] Using java - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/java | openjdk version "17.0.3-internal" 2022-04-19
[shovill] Using kmc - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/kmc | K-Mer Counter (KMC) ver. 3.2.4 (2024-02-09)
[shovill] Using lighter - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/lighter | Lighter v1.1.3
[shovill] Using megahit - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/megahit | MEGAHIT v1.2.9
[shovill] Using megahit_toolkit - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/megahit_toolkit | v1.2.9
[shovill] Using pigz - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/pigz | pigz 2.8
[shovill] Using pilon - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/pilon | Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500
[shovill] Using samclip - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/samclip | samclip 0.4.0
[shovill] Using samtools - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/samtools | Version: 1.18 (using htslib 1.17)
[shovill] Using seqtk - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/seqtk | Version: 1.4-r122
[shovill] Using skesa - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/skesa | SKESA 2.5.1
[shovill] Using spades.py - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/spades.py | SPAdes genome assembler v3.15.5
[shovill] Using trimmomatic - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/trimmomatic | 0.39
[shovill] Using velvetg - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/velvetg | Version 1.2.10
[shovill] Using velveth - /home/ingrid/.bactopia/conda/bioconda--bactopia-assembler-1.0.4/bin/velveth | Version 1.2.10
[shovill] Found spades version: 003015000
[shovill] Will use spades 003015000 options: --isolate and --merged
[shovill] Using tempdir: /tmp/jHtxvJ2Seb
[shovill] Changing into folder: /disk/ingrid/SAB/work/ec/e3cc440c7bff3057d0caaa34c9bef7/results
[shovill] Collecting raw read statistics with 'seqtk'
[shovill] Running: seqtk fqchk -q3 /disk/ingrid/SAB/work/ea/851a177bcdd4a46551e614e8c26353/results/B13028376_WGS_R1.fastq.gz >/tmp/FtfPpqr7rQ 2>&1 | sed 's/^/[seqtk] /' | tee -a shovill.log
[shovill] Read stats: max_len = 151
[shovill] Read stats: min_len = 25
[shovill] Read stats: total_bp = 280001668
[shovill] Read stats: avg_len = 126
[shovill] Using genome size 2800000 bp
[shovill] Estimated sequencing depth: 100 x
[shovill] No read depth reduction requested or necessary.
[shovill] Appending -Xmx7g to _JAVA_OPTIONS
[shovill] Running: ln -sf /disk/ingrid/SAB/work/ea/851a177bcdd4a46551e614e8c26353/results/B12345-1_WGS_R1.fastq.gz R1.fq.gz 2>&1 | sed 's/^/[ln] /' | tee -a shovill.log
[shovill] Running: ln -sf /disk/ingrid/SAB/work/ea/851a177bcdd4a46551e614e8c26353/results/B12345-1_WGS_R2.fastq.gz R2.fq.gz 2>&1 | sed 's/^/[ln] /' | tee -a shovill.log
[shovill] Average read length looks like 126 bp
[shovill] Setting k-mer range to (31 .. 94)
[shovill] Estimated K-mers: 31 47 63 79 [kn=5, ks=16, kmin=31, kmax=94]
[shovill] Using kmers: 31,47,63,79
[shovill] Enabled --noreadcorr, so no read correction will be performed
[shovill] Overlapping/stitching PE reads with 'FLASH'
[shovill] Running: flash -m 20 -M 151 -d . -o flash -z -t 4 R1.fq.gz R2.fq.gz 2>&1 | sed 's/^//' | tee -a shovill.log
[FLASH] Starting FLASH v1.2.11
[FLASH] Fast Length Adjustment of SHort reads
[FLASH]
[FLASH] Input files:
[FLASH] R1.fq.gz
[FLASH] R2.fq.gz
[FLASH]

[FLASH] Output files:
[FLASH] ./flash.extendedFrags.fastq.gz
[FLASH] ./flash.notCombined_1.fastq.gz
[FLASH] ./flash.notCombined_2.fastq.gz
[FLASH] ./flash.hist
[FLASH] ./flash.histogram
[FLASH]
[FLASH] Parameters:
[FLASH] Min overlap: 20
[FLASH] Max overlap: 151
[FLASH] Max mismatch density: 0.250000
[FLASH] Allow "outie" pairs: false
[FLASH] Cap mismatch quals: false
[FLASH] Combiner threads: 4
[FLASH] Input format: FASTQ, phred_offset=33
[FLASH] Output format: FASTQ, phred_offset=33, gzip
[FLASH]
[FLASH] Starting reader and writer threads
[FLASH] Starting 4 combiner threads
[FLASH] Processed 25000 read pairs
[FLASH] Processed 50000 read pairs
[FLASH] Processed 75000 read pairs
[FLASH] Processed 100000 read pairs
[FLASH] Processed 125000 read pairs
[FLASH] Processed 150000 read pairs
[FLASH] Processed 175000 read pairs
[FLASH] Processed 200000 read pairs
[FLASH] Processed 225000 read pairs
[FLASH] Processed 250000 read pairs
[FLASH] Processed 275000 read pairs
[FLASH] Processed 300000 read pairs
[FLASH] Processed 325000 read pairs
[FLASH] Processed 350000 read pairs
[FLASH] Processed 375000 read pairs
[FLASH] Processed 400000 read pairs
[FLASH] Processed 425000 read pairs
[FLASH] Processed 450000 read pairs
[FLASH] Processed 475000 read pairs
[FLASH] Processed 500000 read pairs
[FLASH] Processed 525000 read pairs
[FLASH] Processed 550000 read pairs
[FLASH] Processed 575000 read pairs
[FLASH] Processed 600000 read pairs
[FLASH] Processed 625000 read pairs
[FLASH] Processed 650000 read pairs
[FLASH] Processed 675000 read pairs
[FLASH] Processed 700000 read pairs
[FLASH] Processed 725000 read pairs
[FLASH] Processed 750000 read pairs
[FLASH] Processed 775000 read pairs
[FLASH] Processed 800000 read pairs
[FLASH] Processed 825000 read pairs
[FLASH] Processed 850000 read pairs
[FLASH] Processed 875000 read pairs
[FLASH] Processed 900000 read pairs
[FLASH] Processed 925000 read pairs
[FLASH] Processed 950000 read pairs
[FLASH] Processed 975000 read pairs
[FLASH] Processed 1000000 read pairs
[FLASH] Processed 1025000 read pairs
[FLASH] Processed 1050000 read pairs
[FLASH] Processed 1075000 read pairs
[FLASH] Processed 1100000 read pairs
[FLASH] Processed 1111471 read pairs
[FLASH]
[FLASH] Read combination statistics:
[FLASH] Total pairs: 1111471
[FLASH] Combined pairs: 1024439
[FLASH] Uncombined pairs: 87032
[FLASH] Percent combined: 92.17%
[FLASH]
[FLASH] Writing histogram files.
[FLASH]
[FLASH] FLASH v1.2.11 complete!
[FLASH] 26.409 seconds elapsed
[shovill] Assembling reads with 'spades'
[shovill] Running: spades.py -1 flash.notCombined_1.fastq.gz -2 flash.notCombined_2.fastq.gz --isolate --threads 4 --memory 7 -o spades --tmp-dir /tmp/jHtxvJ2Seb -k 31,47,63,79 true --merged flash.extendedFrags.fastq.gz 2>&1 | sed 's/^/[spades] /' | tee -a shovill.log
[spades] SPAdes genome assembler v3.15.5
[spades]
[spades] Usage: spades.py [options] -o <output_dir>
[spades] spades.py: error: argument -k: invalid kmers value: 'true'
[shovill] Assembly failed - spades.fasta has zero contigs!
[shovill] Assembly failed - spades.fasta has zero contigs!
removed 'results/flash.extendedFrags.fastq.gz'
removed 'results/flash.notCombined_1.fastq.gz'
removed 'results/flash.notCombined_2.fastq.gz'
removed 'results/R1.fq.gz'
removed 'results/R2.fq.gz'

nf_assembler.out
...........................
[FLASH] Read combination statistics:
[FLASH] Total pairs: 1111471
[FLASH] Combined pairs: 1024439
[FLASH] Uncombined pairs: 87032
[FLASH] Percent combined: 92.17%
[FLASH]
[FLASH] Writing histogram files.
[FLASH]
[FLASH] FLASH v1.2.11 complete!
[FLASH] 26.409 seconds elapsed
[spades] SPAdes genome assembler v3.15.5
[spades]
[spades] Usage: spades.py [options] -o <output_dir>
[spades] spades.py: error: argument -k: invalid kmers value: 'true'
[shovill] Assembly failed - spades.fasta has zero contigs!
removed 'results/flash.extendedFrags.fastq.gz'
removed 'results/flash.notCombined_1.fastq.gz'
removed 'results/flash.notCombined_2.fastq.gz'
removed 'results/R1.fq.gz'
removed 'results/R2.fq.gz'

Not sure if it a bug or if i am doing something wrong :)

@ingridvanw ingridvanw added the question Further information is requested label May 6, 2024
@rpetit3
Copy link
Member

rpetit3 commented May 6, 2024

Hi @ingridvanw

This looks like it might be a bug

[spades] Usage: spades.py [options] -o <output_dir>
[spades] spades.py: error: argument -k: invalid kmers value: 'true'

For some reason true is being passed with -k 31,47,63,79 true

Let me see if I can get this sorted out, will update soon. Thank you very much for letting me know about this!
RObert

@rpetit3
Copy link
Member

rpetit3 commented May 6, 2024

Ah! Figured it out @ingridvanw and I need to document this better

Can you try rerunning with:

bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades --shovill_opts="--isolate"

Notice the = sign in --shovill_opts="--isolate", without the equal sign Nextflow will interpret --isolate as a parameter (due to the double dash) and set it to true.

You did nothing wrong, I just need to document the need for the = better in these types of --opts parameters.

On another note (more important!) you will not need to provide --isolate to spades. Shovill will do this automatically, so you have rerun things like so:

bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades 

@rpetit3 rpetit3 added the fixed label May 6, 2024
@ingridvanw
Copy link
Author

ingridvanw commented May 15, 2024

Hi Robert,

My apologies! Sorry for the late response. I was not in the office for a couple of days..
I thought I ran --shovill_opts with and without "=", but it seems not. Sorry!

bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades --shovill_opts "--isolate"

0 contigs

bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades --shovill_opts="--isolate"

assembly succesfully
:)

bactopia --samples SAB.txt --ask_merlin --shovill_assembler spades

assembly succesfully
:)

You mentioned that it is not necessary to add --isolate to spades, because Shovill do this automatically. Does this also mean that you cannot run spades without --isolate mode using Shovill in the bactopia pipeline?
So, in case of enriched cultures, it is better to run skesa then, right?

Sorry again! And thanks for your reply!

Best,
Ingrid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants