Investigate pandora failing to map reads at the edge of a gene #316

leoisl · 2022-12-20T00:39:40Z

This is an example from drprg, Michael's data. Pandora discover and map output here: /hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314.

We are looking at the pncA gene here. It is a gene that has lots of variants at the start (20+, complex structure on the left), and then gets simple. Most of the variants are alleles with ~500 bps, but there is one that is just 1 bp, i.e. a long deletion with respect to all other alleles. The sample should genotype towards this 1-bp allele.

In the PRG file (omitted here), this complex structure is described in the first site, which spans 10250 bps of the PRG. At position 10251 we are past this complex structure (i.e. we are at the red 29-bp node). Looking at the pandora SAM file (/hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.filtered.sam), we are able to map 379 reads, but all of the mappings start after position 10250, i.e. past the complex structure. Thus, we don't map any reads to the complex structure, and to the 1-bp deletion we should genotype to.

If we map the sample reads using minimap2 against the gene's denovo sequence (which contains the 1-bp deletion we should find), we are able to map 623 reads, 244 reads more. In the following IGV plot, we can see that the average depth in ~300x, and the first bp, that pandora could not map, has a count of 213 reads, all calling G, no indels or errors:

This is a very good example to improve the mapping on the edge of the genes. We have the SAM files to debug, but it would be even better to have the 4 other mapping debugging files too (we can easily rerun the command line looking at the @PG header line of the pandora SAM). Reads are illumina and they perfectly map to the edge of the gene

The text was updated successfully, but these errors were encountered:

mbhall88 · 2022-12-20T03:49:27Z

Nice work as always @leoisl. FYI, I have moved that directory to /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_without_debug as it will get overwritten next time I run drprg.

I have also copied the current drprg index (which contains PRG) to /hps/nobackup/iqbal/mbhall/pandora_issue316/index as this might also be overwritten soon.

Regarding getting the extra debugging files, I know you can just rerun pandora separately; as you say, you can just use the command in the header. But I have just rerun drprg with the --debug flag, which uses the debugging files flag in pandora, and you can find that in /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug

If you want to reproduce any of this, it was done using this container docker://quay.io/mbhall88/drprg:c7d848e

With command

index=/hps/nobackup/iqbal/mbhall/pandora_issue316/index
reads=/hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz
outdir=/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug
sample=SRR6824314
uri="docker://quay.io/mbhall88/drprg:c7d848e"
cd /hps/nobackup/iqbal/mbhall/pandora_issue316
singularity exec -B /hps/scratch "$uri" drprg predict --sample $sample -v --ignore-synonymous -I -f 0.1 --max-gaps 0.3 -d 3 -b 0.01 -g 5 -K 0.51 -i "$reads" -o "$outdir" -x "$index" -t 2 --debug

Here is the log showing the commands that were run in case you want to reproduce any of them (this is using the pandora executable in the container, which is v0.10.0-alpha.0)

[2022-12-20T03:44:38Z DEBUG] Cli { verbose: true, threads: 2, cmd: Predict(Predict { pandora_exec: None, makeprg_exec: None, mafft_exec: None, index: "/hps/nobackup/iqbal/mbhall/pandora_issue316/index", input: "/hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz", outdir: "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug", sample: Some("SRR6824314"), is_illumina: true, min_allele_freq: 0.1, max_gaps: 0.3, ignore_synonymous: true, filterer: Filterer { min_covg: 3, max_covg: 2147483647, min_strand_bias: 0.01, min_gt_conf: 5.0, max_indel: None, min_frs: 0.51 }, pandora_min_cluster_size: 10, debug: true }) }
[2022-12-20T03:44:38Z DEBUG] Index is valid
[2022-12-20T03:44:38Z INFO ] Discovering variants...
[2022-12-20T03:44:38Z DEBUG] Running pandora discover...
[2022-12-20T03:46:25Z DEBUG] Ran: /usr/bin/pandora discover -g 4411532 -v -o /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/discover -t 2 -w 14 -k 15 -c 10 -I -K /hps/nobackup/iqbal/mbhall/pandora_issue316/index/dr.prg /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/query.tsv
[2022-12-20T03:46:25Z DEBUG] Updating the PRGs with novel variants
[2022-12-20T03:46:25Z DEBUG] 8 genes to update with novel variants: ["pncA", "inhA", "ethA", "gid", "rpoB", "gyrA", "embA", "embB"]
[2022-12-20T03:46:25Z DEBUG] Updating MSA for pncA
[2022-12-20T03:46:28Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/pncA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/pncA.fa
[2022-12-20T03:46:28Z DEBUG] Updating MSA for inhA
[2022-12-20T03:46:29Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/inhA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/inhA.fa
[2022-12-20T03:46:29Z DEBUG] Updating MSA for ethA
[2022-12-20T03:46:29Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/ethA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/ethA.fa
[2022-12-20T03:46:29Z DEBUG] Updating MSA for gid
[2022-12-20T03:46:31Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/gid.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/gid.fa
[2022-12-20T03:46:31Z DEBUG] Updating MSA for rpoB
[2022-12-20T03:46:32Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/rpoB.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/rpoB.fa
[2022-12-20T03:46:32Z DEBUG] Updating MSA for gyrA
[2022-12-20T03:46:33Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/gyrA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/gyrA.fa
[2022-12-20T03:46:33Z DEBUG] Updating MSA for embA
[2022-12-20T03:46:34Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/embA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/embA.fa
[2022-12-20T03:46:34Z DEBUG] Updating MSA for embB
[2022-12-20T03:46:34Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/embB.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/embB.fa
[2022-12-20T03:46:57Z DEBUG] Ran: /usr/bin/make_prg from_msa -t 2 -L 5 -N 5 -v --force -o updated -i /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/update_msas
[2022-12-20T03:46:57Z DEBUG] make_prg update successfully ran
[2022-12-20T03:46:57Z DEBUG] Indexing updated PRG with pandora index...
[2022-12-20T03:46:58Z DEBUG] Ran: /usr/bin/pandora index -t 2 -w 14 -k 15 /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/updated.dr.prg
[2022-12-20T03:46:58Z INFO ] Genotyping reads against the panel with pandora
[2022-12-20T03:48:17Z DEBUG] Ran: /usr/bin/pandora map --genotype --local -v -o /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug -g 4411532 --vcf-refs /hps/nobackup/iqbal/mbhall/pandora_issue316/index/genes.fa -t 2 -w 14 -k 15 -c 10 -I /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/updated.dr.prg /hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz
[2022-12-20T03:48:17Z INFO ] Successfully genotyped reads
[2022-12-20T03:48:17Z INFO ] Making predictions from variants...
[2022-12-20T03:48:18Z DEBUG] Predictions written to VCF "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/SRR6824314.drprg.bcf"
[2022-12-20T03:48:18Z DEBUG] Loading the panel...
[2022-12-20T03:48:18Z DEBUG] Loaded panel
[2022-12-20T03:48:18Z INFO ] Prediction report written to JSON file "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/SRR6824314.drprg.json"
[2022-12-20T03:48:18Z INFO ] Done!

Let me know if you want/need anything else

leoisl · 2022-12-20T11:59:55Z

Thanks for this Michael! I can see the debugging files are being created when running pandora discover, but it is also important to generate these files when running pandora map post discover, as this is when we would like to debug why we are not mapping to the deletion allele.

I created another dir alongside your predict_with_debug containing the minimal input to reproduce this issue: /hps/nobackup/iqbal/mbhall/pandora_issue316/map_with_debug_minimal_input. It has just the VCF ref for pncA, the update PRG for pncA, and the reads that minimap2 mapped to pncA. I copied the command line to run pandora map from your runs, you can see it in the command_line.sh. 0 reads mapped to the first complex pncA site, but this minimal input is better to run in my local in debug mode as it runs in 0.2 seconds. I will try to understand better why we have this bug, and how hard would be to solve it, but I am a bit on and off in these festive days!

mbhall88 · 2022-12-20T23:32:52Z

Oh my bad, I'll add the debug stuff to map as well for future. No rush, as you say, it's holiday season

This was referenced Jan 6, 2023

Notice partial gene deletion that spans start codon mbhall88/drprg#24

Closed

Expose CLI parameter for maximum coverage on a single k-mer #263

Open

mbhall88 mentioned this issue Jan 19, 2023

Strange genotype selection #322

Closed

leoisl mentioned this issue Feb 6, 2023

Pandora heavily undermapping #325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate pandora failing to map reads at the edge of a gene #316

Investigate pandora failing to map reads at the edge of a gene #316

leoisl commented Dec 20, 2022

mbhall88 commented Dec 20, 2022

leoisl commented Dec 20, 2022

mbhall88 commented Dec 20, 2022

Investigate pandora failing to map reads at the edge of a gene #316

Investigate pandora failing to map reads at the edge of a gene #316

Comments

leoisl commented Dec 20, 2022

mbhall88 commented Dec 20, 2022

leoisl commented Dec 20, 2022

mbhall88 commented Dec 20, 2022