Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The role of cobra in classic virome pipeline #27

Open
xjhzjucas opened this issue Mar 3, 2024 · 4 comments
Open

The role of cobra in classic virome pipeline #27

xjhzjucas opened this issue Mar 3, 2024 · 4 comments

Comments

@xjhzjucas
Copy link

Hi Linxing:
Thank you for developing this nice tool! I am curious about what's the functions does COBRA have in the classic virome pipeline as it's a new software.
For example, I used MEGAHIT to get contigs from metagenome reads, and then I used geNomad to identified the viral contigs from the total contigs, and then if I use COBRA in the follow step (i.e. put the geNomad results:<prefix>_summary/<prefix>_virus.fna as the input of COBRA) , does COBRA help me to bin the identified viral contigs together to get a higher completeness here? Can it work before or after geNomad well? I read about that COBRA can identify more circular viral genome and huge phage. Can I consider COBRA as a binner tool or a circular/huge phages identifier?
Thanks!

@linxingchen
Copy link
Owner

linxingchen commented Mar 3, 2024

Hi Linxing: Thank you for developing this nice tool! I am curious about what's the functions does COBRA have in the classic virome pipeline as it's a new software. For example, I used MEGAHIT to get contigs from metagenome reads, and then I used geNomad to identified the viral contigs from the total contigs, and then if I use COBRA in the follow step (i.e. put the geNomad results:<prefix>_summary/<prefix>_virus.fna as the input of COBRA) , does COBRA help me to bin the identified viral contigs together to get a higher completeness here? Can it work before or after geNomad well? I read about that COBRA can identify more circular viral genome and huge phage. Can I consider COBRA as a binner tool or a circular/huge phages identifier? Thanks!

Hi, thank you for your interest in COBRA. COBRA will not bin any contigs/scaffolds, however it joins contigs/scaffolds together to get longer sequences (thus higher completeness). You could use the predicted viral contigs/scaffolds (for example, from genomad) as the queries for COBRA to work on, but keep in mind that these queries must be from the same sample, and the -f/-fasta input of COBRA must be all the contigs/scaffolds from the corresponding assembly (no length filtering). Please let me know if you have other concerns.

Cheers,
LINXING

@xjhzjucas
Copy link
Author

Thank you for your help. According to your suggestions and based on my understandings, COBRA can handle the geNomad's result viral contigs within each single sample to joins contigs together to get higher completeness so that more huge/circular viral contigs will be showed, is that right? And I am a little confused about " all the contigs/scaffolds from the corresponding assembly (no length filtering)", does this means I should all of the geNomad virus.fna in a sample without length filtering?Thanks!

@linxingchen
Copy link
Owner

You misunderstood. -q/--query is for queries, -f/--fasta is for all the contigs/scaffolds. The queries are those contigs/scaffolds you want COBRA to join. You should not filter length for -f/--fasta. Check "Input files" here for details.

@yan1365
Copy link

yan1365 commented May 25, 2024

Hi Linxing,

Thank you for your excellent tool. I have a question: do the query sequences need to have the same name as the original contig in the contig file? I am asking because the viral identification tool usually renames the contigs, and when I ran Cobra, it indicated that all my query contigs are not in the whole contig file. For example, it says "Query k141_148452||full is not in your whole contig fasta file, please check!". However, I checked and found that the contig "k141_148452" is indeed in my whole contig file.

If I need to rename the viral contigs to their original names, what should I do if two or more viral sequences were identified from the same contig? Or should we use the original contig from which the sequences were identified as the input?

It would be nice if you could provide more detailed examples of how Cobra is used for the virome analysis.

Thanks,
Ming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants