Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing COBRA_retrieved_for_joining contig file #30

Open
Vini2 opened this issue Mar 15, 2024 · 8 comments
Open

Missing COBRA_retrieved_for_joining contig file #30

Vini2 opened this issue Mar 15, 2024 · 8 comments

Comments

@Vini2
Copy link

Vini2 commented Mar 15, 2024

Hello Cobra authors!

Thanks for building this tool.

I keep getting this error of a contig file not being created.

Traceback (most recent call last):
  File "/home/mall0133/miniconda3/envs/cobra/bin/cobra-meta", line 10, in <module>
    sys.exit(main())
  File "/home/mall0133/miniconda3/envs/cobra/lib/python3.8/site-packages/cobra.py", line 1747, in main
    '\t'.join([contig, str(header2len[contig]), summarize(contig), query2current[contig]]) + '\n')
  File "/home/mall0133/miniconda3/envs/cobra/lib/python3.8/site-packages/cobra.py", line 575, in summarize
    b = count_seq('COBRA_retrieved_for_joining/{0}_retrieved.fa'.format(item))  # number of retrieved contigs
  File "/home/mall0133/miniconda3/envs/cobra/lib/python3.8/site-packages/cobra.py", line 482, in count_seq
    a = open(fasta_file, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'COBRA_retrieved_for_joining/NODE_1990_length_17225_cov_13.396421_retrieved.fa'

Every time I try re-running, it fails at this stage with a FileNotFoundError for a different contig.

This is my Cobra command. I'm using the latest version (version 1.2.3).

cobra-meta -f contigs.fasta -q query_contigs.fasta -c coverage.tsv -m sorted_reads.bam -a metaspades -mink 21 -maxk 127

My query_contigs.fasta file contains about 2300 contig sequences. I've also attached the log file of the run.
log.txt

Any advice on how to fix this error and get Cobra running will be appreciated.

Thanks!

@linxingchen
Copy link
Owner

Hi,

Sorry to hear that you have problem running COBRA.

We noticed this issue and have been working on it.

Could you please let me know (1) did you re-run on the same sample? (2) every time the FileNotFoundError issue is for a different contig? (3) did you meet this error for other samples?

Thank you.

Best,
LINXING

@Vini2
Copy link
Author

Vini2 commented Mar 15, 2024

Hi @linxingchen,

Thanks for the quick reply!

(1) I removed the Cobra output folder and re-ran on the same sample.
(2) Yes, I've run Cobra 3 times and every time it gives a different contig: NODE_21535_length_4721_cov_9.622333, NODE_1990_length_17225_cov_13.39642 and NODE_13930_length_6078_cov_11.714670.
(3) I haven't tested on any other sample yet.

Thanks!

@linxingchen
Copy link
Owner

hi @Vini2 thanks for the information.

If possible please share me the "potential_join_path" file so I can have a look if these three are in the same path. We have been working on this but please give us more time to fix it. Thank you.

@Vini2
Copy link
Author

Vini2 commented Mar 15, 2024

Hi @linxingchen,

Here is the file you requested.
COBRA_potential_joining_paths.txt

Let me know if you need further details.

Thanks for taking a look at this issue.

@linxingchen
Copy link
Owner

Thank you.

These three contigs are not in the same path, I have no idea why the file does not existed. I am wondering if you could run another sample and see if it will happen again.

@Vini2
Copy link
Author

Vini2 commented Mar 18, 2024

Hi @linxingchen,

I did a bit of debugging and I think some edge cases cause the errors.

This time I got that NODE_30571_length_3853_cov_16.548864_retrieved.fa could not be found.

In line 1747 when calling summarize(contig), the error appears in line 575 which is in the else block within the else block. You have

item = is_subset_of(contig)

Then it tries to count sequences from item.

Here contig is NODE_111132_length_1779_cov_9.005448 which is a subset of NODE_30571_length_3853_cov_16.548864 (or the other way around). Now item is NODE_30571_length_3853_cov_16.548864 which is an extended partial query. Hence, it does not get retrieved for joining (the file is not created).

I came across some more edge cases which I couldn't test out in detail.

How would you recommend running these contigs? I'm not sure how to check if a contig is a subset or not. Would running them one by one be better? Appreciate any suggestions.

I'll try running on another sample as well.

Thanks!

@linxingchen
Copy link
Owner

Hi @Vini2,

Thanks for taking a deep look at the issue. Sorry for my delayed reply.

Did you still run the previous same sample?

You will avoid the error if you run them one by one. However, at the end of day, I have to fix this issue, hopefully in the next week (stuck by grant proposal for now).

Best,
LINXING

@linxingchen
Copy link
Owner

Hi @Vini2,

Sorry for my delayed reply on this, @Hocnonsense re-wrote most parts of the script, it will be great if you could try it (enclosed) and see if your issue has been resolved or not.

Thank you.

cobra_Hocnonsense.py.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants