Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with minfreq #74

Open
wantingwei opened this issue Feb 12, 2024 · 5 comments
Open

problem with minfreq #74

wantingwei opened this issue Feb 12, 2024 · 5 comments

Comments

@wantingwei
Copy link

I am recently running snpgenie within pool analysis, using command snpgenie.pl --minfreq=0.01 --snpreport=${sample}_snpgenie.vcf --vcfformat=2 --slidingwindow=30 --fastafile=./seqs/MN908947_3.fasta -- gtffile=./seqs/MN908947_3.gff.txt. I want to exclude all the variants below 1% frequency so using minfreq=0.01, however I realized all the mutations with frequency >0.99 have been excluded. I am not sure why this is happening, can someone help me figure it out. I attached the screen shot of the script out put and my vcf file.
Screenshot 2024-02-12 at 1 16 17 PM
Screenshot 2024-02-12 at 1 17 36 PM

@singing-scientist
Copy link
Contributor

Greetings, Wanting! Thanks for this question. In fact, this is the desired behavior. For example, suppose there is a polymorphic site with C=9999 and T=1. The frequency of T is 1 / (9999 + 1) = 0.0001, or 0.01%, and it doesn't matter whether it's the REF or ALT allele. If C is the REF allele, then the VCF will report ALT as T with AF=0.0001; alternatively, if T if the REF allele, then the VCF will report ALT as C with AF=0.9999. However, in both cases there is an allele (T) with a frequency below your cut-off, and it will be eliminated. Put another way, the frequency filter will eliminate an allele regardless of whether it is REF or ALT. Note that which allele is REF is often arbitrary (e.g., a standard reference sequence against which variants were called). Let me know if this makes sense.

If there's a reason you wanted to filter out low-frequency ALT alleles only (but not low-frequency REF alleles), you could filter your VCF file on your own before submitting it to SNPGenie, and then simply not use --minfreq. However, in most circumstances I don't think it makes sense to give low-frequency REF alleles a 'free pass' but eliminate low-frequency ALT alleles. But, it's possible your situation is an exception!

Hope this helps, let me know!
Chase

@wantingwei
Copy link
Author

Thank you for response so fast. Just want to confirm that I understand correctly if C is the REF with 0.99% and T is ALT 0.01%. The T will be filtered out but C will remain in the calculation.

@singing-scientist
Copy link
Contributor

My pleasure! It's correct that, in this example, T will be filtered out and C will not be filtered out. However, regarding calculation, it depends what you mean. If you mean π, there is no longer a 'calculation', i.e., the site will no longer be polymorphic so its π = 0. Let me know!

@wantingwei
Copy link
Author

Gotcha. I am trying to compare piN/piS to access selective pressure for my sample. I am sequencing omicron samples, that call variants based on wuhan-1, in that case should I consider those "fixed or 99%" mutation. Or it would be more reasonable to filter the vcf without lower variant and using default minfreq in the program.

@singing-scientist
Copy link
Contributor

Understood! If you're interested in constraint then πn/πs is the way to go. To me, your question is more about how to make sure to quality filter variants — using minfreq is just one overly simple way to do that. Indeed, when you use a minfreq cutoff of 0.01, you are effectively saying you believe that anything with AF < 1% isn't real, i.e., is a sequencing error.

I see you're working with Tom — if you have questions about study design and QC that's probably beyond the ambit of SNPGenie and GitHub, but feel free to send an email to us! Of interest, we just published a program called VCFgenie to perform QC filtering of iSNVs that you may wish to use:

Let me know!
C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants