Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local confidence invalid #244

Open
Ln9052 opened this issue Dec 15, 2023 · 3 comments
Open

Local confidence invalid #244

Ln9052 opened this issue Dec 15, 2023 · 3 comments

Comments

@Ln9052
Copy link

Ln9052 commented Dec 15, 2023

When I use Stitch to read the results.zip of pNovo v3.1.5 and run new_PNovo_FLAG_H_20ppm_ALC90.txtan error occurs:image, It seems that Stitch encountered an issue in recognizing the length of the sequence in the pNovo result. In the segment "15903 │ …4966.14966.4.0.dta cGYWRQRWVVRGFCbLNFSSM 0.155942 9.26071 0.427735,0.266975,0.173732,0.131206,0.101629,0.0852…", the length of the sequence is truly 21, but Stitch recognized it as 20 and reported an error, indicating a discrepancy with the number of local confidences. Could you please help to resolve this issue? Thank you.

douweschulte added a commit that referenced this issue Dec 15, 2023
@douweschulte
Copy link
Member

To be perfectly frank with you the parsing of pNovo sequences is a bit patchy. I do not know what they mean with their sequences and the local confidence does not match the length in many a occasion. Because there is no documentation for their output I cannot devise the true meaning of the sequences. So the fix I just made is setting this error to a warning and to ignore the local confidence for the peptides where the length does not match, this is not a proper fix but makes it possible to use the data.

Some more insight in how I understand pNovo peptides, if you know their format a bit better feel free to add to my list (examples use ProForma notation).

  1. In the param file are modified amino acids (can be uppercase/lowercase/digit), eg a = A[mod]
  2. Any modified amino acid at the start indicates an N-terminal modification, meaning the modification stays but the amino acid is ignored, eg aCC => [mod]-CC
  3. Any modified amino acid at the very end has to be ignored in its entirety, eg CCa => CC, CCaa => CCA[mod]
  4. Any modified amino acid in the middle of the sequence is the amino acids and its modification, eg CaC => CA[mod]C

This above set parsed the files I made with pNovo correctly, but it does not seem to work for your file. I tried a couple iterations of these rules (not ignoring the AA for Ntemr, not ignored the Cterm) but I could not find rules that work for all peptides.

@MengTingHe2023
Copy link

Hi, as far as I know, the latest update pNovo(v3.1.5)addressed the issue of cumulative fixed modifications during multiple searches in the GUI interface. Additionally, it fixed the anomaly related to amino acid case sensitivity in searches. The sequences and the local confidence could match the length now. I think you're right not ignoring the AA for Nterm, nor ignored the Cterm. The documentation for their output results can be referenced as follows. You can find this in the pNovo installation package: pNovo 3 User Guide.pdf.
微信图片_20230706110140

@douweschulte
Copy link
Member

Thanks for your input! I think I would need to take a full day to dive into the format again, because I never got the lengths of the sequence to match with the length of the local confidences before (3.1.4 I think). So maybe this is fixed with 3.1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants