Issue while choosing the reference path for genotyping #329

AmayAgrawal · 2023-05-26T14:48:19Z

Hi,

I am facing an issue regarding the reference path that pandora uses for genotyping the variants. It is basically using the less frequent supported path instead of most frequent supported path as a reference. Below I will try to explain it in a simple way:

Suppose I am using 100 strains for my analysis. First, I did the pan-geome analysis and use the MSA's to build the pan-genome reference graphs (PRG). Next, used these PRG's to genotype the variants in these 100 strains using pandora. Now suppose for a pan-genome graph of a particular loci (let's say gene A) at a particular position (let's say 300), we have 3 differents paths that are possible. Among these 3 paths, If I understand correctly, the path which is supported by majority strains out of 100 strains should be chosen as reference, but actually it was not the case. Due to this, suppose the SNP which I was looking for (let's say C 300 T), in which 'C' is ref and 'T' is alt allele, actually pandora chooses 'T' as ref and 'C' as alt allele. I saw in one of the issues that is currently open that Pandora heavily undermappes (#325). Can it the be the case that it is choosing less frequent path due to this or maybe I am understanding something incorrectly?

iqbal-lab · 2023-05-26T14:55:21Z

yes, this is possible. Pandora needs to make a "global" choice, of a path from one end of the gene to the other. Sometimes the data is such that there are lots of reads forcing a path one way across the graph, and this takes a path "a long way away vertically" from a bubble deep in the graph, where there is a lot of coverage for one allele. If there is no way to make a single path consistent with all of that, it does what it can based on dynamic programming.

Suppose the MSA looks like
xxxxxAxxxxxx
xxxxxCxxxxx
xxyyyyyyyyxx
If there is very low coverage on the x's and lots on the y, you get forced onto the bottom path, and the A/C choice becomes irrelevant/ignored.

It's hard to comment more without concrete data; i expect it's not pandora undermapping, but can't tell
Would you like to share more details?

AmayAgrawal · 2023-05-31T18:50:24Z

Hi,
I have uploaded a zip folder at this drive link (https://nubes.helmholtz-berlin.de/s/R8SHBsT8yDmeca4) which contains all the necessary files required to regenerate the issue that I am talking about. This zip folder contains a 'README' file, which explains all the steps and files that are present in this zip folder.

Let me know if you have any more questions from my side

iqbal-lab · 2023-12-22T23:34:57Z

Omg we have not replied to you! So sorry @AmayAgrawal , we will return to this after the Xmas vacation

AmayAgrawal · 2024-01-03T09:55:49Z

No worries. It would be nice if you can look at this now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while choosing the reference path for genotyping #329

Issue while choosing the reference path for genotyping #329

AmayAgrawal commented May 26, 2023

iqbal-lab commented May 26, 2023

AmayAgrawal commented May 31, 2023

iqbal-lab commented Dec 22, 2023

AmayAgrawal commented Jan 3, 2024

Issue while choosing the reference path for genotyping #329

Issue while choosing the reference path for genotyping #329

Comments

AmayAgrawal commented May 26, 2023

iqbal-lab commented May 26, 2023

AmayAgrawal commented May 31, 2023

iqbal-lab commented Dec 22, 2023

AmayAgrawal commented Jan 3, 2024