Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid indices in make_index_files.R assignment of non-HLA transcript sequences #7

Open
jfass opened this issue Jul 5, 2022 · 3 comments

Comments

@jfass
Copy link

jfass commented Jul 5, 2022

I'm working in a conda virtual environment, using R 4.0.5. dplyr gets updated to 1.0.9 when installing hlaseqlib, and gives a warning msg about "group_by" and a change as of dplyr 1.0.0 ... which may or may not be related to this issue (I'm not well versed in the tidyverse). When stepping through the make_index_files.R to troubleshoot, it's the assignment of transcript_no_hla that fails, with a message about invalid indices (sorry I don't have the exact text in front of me). I solved with classic R:

transcripts_no_hla <- transcripts[ which( names( transcripts ) %in% transcripts_db$transcript_id[ which( !( transcripts_db$gene_name %in% hladb_genes ) ) ] ) ]

... hopefully that gives you an idea of what the problem could be? I'm using a gencode v21 protein_coding trancripts fasta and annotation for main chromosomes. Before isolating the problematic step, I tried truncating the ENS[GT] id's to get rid of the ".#" (version) in both annotation and fasta, and removing all non-protein_coding annotations from the gtf file, but neither changed the error. So I think the problem is in some changed dplyr syntax, maybe, and the versions I'm using.

@jfass
Copy link
Author

jfass commented Jul 5, 2022

Here's the warning (doesn't seem to cause a problem once I've substituted the command above):

Processing locus DRB4
Processing locus DRB5
Warning message:
Problem while computing `data = map(locus, ~hla_compile_index(., imgt_db))`.
ℹ The `...` argument of `group_indices()` is deprecated as of dplyr 1.0.0.
  Please `group_by()` first 
Reading transcript annotations...
writing index files...
Done!

@youknow16
Copy link

I'm getting a similar error except it just exits the program:

  Warning message:
  Problem while computing `data = map(locus, ~hla_compile_index(., imgt_db))`.
  ℹ The `...` argument of `group_indices()` is deprecated as of dplyr 1.0.0. Please `group_by()` first 
  Reading transcript annotations...
  Error: subscript contains invalid names
  Execution halted

@jfass
Copy link
Author

jfass commented Aug 17, 2022

Yah I think I was seeing execution die as well, with the "invalid indices" message. Have you had a chance to try my fix above? There's probably a better, (older) tidyverse way to do it, but ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants