-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate pandora failing to map reads at the edge of a gene #316
Comments
Nice work as always @leoisl. FYI, I have moved that directory to I have also copied the current drprg index (which contains PRG) to Regarding getting the extra debugging files, I know you can just rerun pandora separately; as you say, you can just use the command in the header. But I have just rerun drprg with the If you want to reproduce any of this, it was done using this container With command
Here is the log showing the commands that were run in case you want to reproduce any of them (this is using the pandora executable in the container, which is v0.10.0-alpha.0)
Let me know if you want/need anything else |
Thanks for this Michael! I can see the debugging files are being created when running I created another dir alongside your |
Oh my bad, I'll add the debug stuff to map as well for future. No rush, as you say, it's holiday season |
This is an example from drprg, Michael's data. Pandora discover and map output here:
/hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314
.We are looking at the
pncA
gene here. It is a gene that has lots of variants at the start (20+, complex structure on the left), and then gets simple. Most of the variants are alleles with ~500 bps, but there is one that is just 1 bp, i.e. a long deletion with respect to all other alleles. The sample should genotype towards this 1-bp allele.In the PRG file (omitted here), this complex structure is described in the first site, which spans 10250 bps of the PRG. At position 10251 we are past this complex structure (i.e. we are at the red 29-bp node). Looking at the pandora SAM file (
/hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.filtered.sam
), we are able to map 379 reads, but all of the mappings start after position 10250, i.e. past the complex structure. Thus, we don't map any reads to the complex structure, and to the 1-bp deletion we should genotype to.If we map the sample reads using
minimap2
against the gene's denovo sequence (which contains the 1-bp deletion we should find), we are able to map 623 reads, 244 reads more. In the following IGV plot, we can see that the average depth in ~300x, and the first bp, that pandora could not map, has a count of 213 reads, all callingG
, no indels or errors:This is a very good example to improve the mapping on the edge of the genes. We have the SAM files to debug, but it would be even better to have the 4 other mapping debugging files too (we can easily rerun the command line looking at the
@PG
header line of the pandora SAM). Reads are illumina and they perfectly map to the edge of the geneThe text was updated successfully, but these errors were encountered: