Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaEuk doesn't work with databases created using the commands metaeuk createdb or mmseqs createdb #721

Closed
amizeranschi opened this issue Dec 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@amizeranschi
Copy link
Contributor

amizeranschi commented Dec 2, 2024

The current pipeline version (3.2.1) doesn't support running MetaEuk when supplying a database (via ) created with the commands metaeuk createdb or mmseqs createdb ran on a manually downloaded FASTA with protein sequences. The MetaEuk module is searching for a file called <prefix>.version, which isn't created automatically by the previous commands.

Relevant discussion on Slack: https://nfcore.slack.com/archives/CE9MS66BS/p1733150568097039?thread_ts=1733144901.017309&cid=CE9MS66BS (on #mag, 2nd Dec)

@amizeranschi amizeranschi added the bug Something isn't working label Dec 2, 2024
@jfy133
Copy link
Member

jfy133 commented Dec 3, 2024

Ugh this looks like it's related to mmseqs which I really don't like atm for similar reasons 😬 , I will investigate ASAP

@jfy133 jfy133 self-assigned this Dec 3, 2024
@jfy133
Copy link
Member

jfy133 commented Jan 22, 2025

Reading through the slack thread, this seems to be more like a mmseqs2 issue more than mag, but I will test it otherwise to see if I can reproduce:

  1. ✅ Running test_adapterremoval which includes auto-download of metaEuk databases (Giving a FASTA file of protein sequences to --metaeuk_db )
  2. ✅ Downloading an mmseqs2 database with mmseqs2 databases Kalamari kalamari/kalamari tmp, and giving that to --metaeuk_db -> Downloading results in Cannot open kalamari/kalamari.source for writing, i hate mmseqs so much... will finish it tomorrow -> OK this worked for me no issues: nextflow run ../main.nf -profile test_adapterremoval,docker --outdir ./results-test_adapterremoval-downloaddb --metaeuk_db ~/cache/databases/mmseqs/kalamari/ -resume
  3. Specifying the database name with --metaeuk_mmseqs_db -> nextflow run ../main.nf -profile test_adapterremoval,docker --outdir ./results-test_adapterremoval-downloaddb --metaeuk_mmseqs_db Kalamari -resume --metaeuk_db false

All of the tests above worked. Admittedly I've used the smallest database (Kalamari) offered MMSeqs and the Yeast genome, however all database variants worked fine.

So I suspect this isss something problematic with mmseqs download and/or UniRef100 and not something to do with the pipeline.

@jfy133
Copy link
Member

jfy133 commented Jan 23, 2025

Final test completed wiht ✅ - so unable to replicate.

If it happens again please reopen @amizeranschi

@jfy133 jfy133 closed this as completed Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants