Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pysam.pileup low max depth leads to FP #63

Closed
ArthurDondi opened this issue Aug 21, 2024 · 5 comments
Closed

pysam.pileup low max depth leads to FP #63

ArthurDondi opened this issue Aug 21, 2024 · 5 comments

Comments

@ArthurDondi
Copy link
Contributor

pysam.pileup max_depth argument is by default 8000, however for RNA-seq there are edge cases as below, where a germline intronic SNP in a strongly expressed gene will appear as a false-positive somatic SNV:

Capture d’écran (170)

Top row is the cancer cells sample, mutated with 14% VAF and 19% CCF (max depth in exon ~17000) and the bottom row is the non-cancer cells sample with 4% VAF and 6% CCF (max depth in exon ~70000). This looks like a germline SNP.

With the current max_depth=8000 I got a PASS mutation because only a fraction of the 475 reads are processed :
chr11 61965525 61965525 G A PASS Cancer AAGGG AACAC 1 25 15 4 4 0.16 0.2667 0.0001 0.0 2 2 1;115;0.1465 1;92;0.1055 . PASS DP|NC|CC|BC|BQ|BCf|BCr 94|81|1:0:0:79:0:0|1:0:0:92:0:0|93:0:0:8556:0:0|0:0:0:14:0:0|1:0:0:78:0:0 25|15|4:0:0:13:0:0|4:0:0:21:0:0|372:0:0:1953:0:0|0:0:0:2:0:0|4:0:0:19:0:0

If I raise it to, say, max_depth=200000 it's a Cell_type_noise, as it should:
chr11 61965525 61965525 G A,A Cell_type_noise NonCancer,Cancer AAGGG AACAC 1 479,44 239,26 19,6 15,5 0.0397,0.1364 0.0628,0.1923 0.0017,0.0 0.0,0.0 2 2 1;498;0.3121 1;245;0.2041 . BetaBin_problem,PASS DP|NC|CC|BC|BQ|BCf|BCr 479|239|15:1:0:233:0:0|19:1:0:453:0:0|1767:93:0:42129:0:0|0:0:0:48:0:0|19:1:0:405:0:0 44|26|5:0:0:25:0:0|6:0:0:37:0:0|558:0:0:3441:0:0|0:0:0:3:0:0|6:0:0:34:0:00

pysam.pileup occurs in BaseCellCounter.py and SingleCellGenotype.py. Let me know if you're interested in a PR. There is also a filter bug in basecallingstep1 as signaled in #16.

@velten-lab
Copy link

Thanks for spotting this! I agree that it's important to document that SComatic uses a default max_depth of 8000, and to add command line options to BaseCellCounter.py and SingleCellGenotype.py that allow to modify max_depth. We're working with (very deeply sequenced) mission bio data, and increasing max depth made a day-and-night difference for us.

@ArthurDondi
Copy link
Contributor Author

For their defense, it's a samtools/pysam default behaviour that is quite unexpected (at least to me).

It can have a big impact on very deep sequencing but also in low-depth + high-cell count when using SingleCellGenotype.py

@isidroc
Copy link
Contributor

isidroc commented Aug 29, 2024

Dear ArthurDondi , Thanks for spotting this issue. Yes, a PR would be great, thanks a lot

@Francesc-Muyas
Copy link
Collaborator

Dear users,
Thanks for this interesting finding. We will accept the PR in the coming days.

Thanks again for your feedback,
Fran

@isidroc
Copy link
Contributor

isidroc commented Sep 15, 2024

We have now accepted the PR. Thanks!

@isidroc isidroc closed this as completed Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants