Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to define a3m file name based on protein header? #219

Open
flyark opened this issue Feb 28, 2024 · 1 comment
Open

option to define a3m file name based on protein header? #219

flyark opened this issue Feb 28, 2024 · 1 comment

Comments

@flyark
Copy link

flyark commented Feb 28, 2024

I am currently using 1.5.2 localcolabfold. After executing colabfold_search, I found that all the a3m file names are numerical.

0.a3m
1.a3m
2.a3m
...
x.a3m

Inside of the individual a3m file, I can see this via "head" in the terminal in order to see the protein name only.

ak $ head *.a3m -n 2
==> 0.a3m <==
#124,403	1,1
>Protein_1	Protein_1

==> 2.a3m <==
#124,729	1,1
>Protein_3	Protein_3

My fasta file used for sequence search looks like this.

>Protein_1 & Protein_2
MTKTKGEKINKSAINEVVTRECTIHLAKRVHNIGFKKRAPRAIKEIRKFAEREMGTTDVRIDTRLNKHIWSKGIRSTPFRIRVRLARRRNDDEDSPNKLYTYVTYVPVSTFKNLQTENVESSDD:MDNSGNNRYELLFM
>Protein_3 & Protein_4
MVKVKCSELRIKDKKELTKQLDELKNELLSLRVAKVTGGAPSKLSKIRVVRKAIARVYIVMHQKQKENLRKVFKNKKYKPLDLRKKKTRAIRKALSPRDANRKTLKEIRKRSVFPQRKFAVKA:MQLAEKHIVALMFPAIDVSTFTFVSGVKFYIMDNSGNNRYELLFMDDDDSSGLAQPQIAAVVAAPKKPEPAKAPKAPKSKSEKENKPVVAARKANAPVAKNASPVKGG

When protein header in the fasta file contains &, it seems only first protein/gene name is used for saving protein header in a3m file.

If I use the fasta file for MSA via colabfold_batch (remote server), I can get file names like the below.
Protein_1___Protein_2.a3m
Protein_3___Protein_4.a3m

Is there a way to have file name like this when colabfold_search is used for MSA?

And is there example fasta file for multimer analysis? If protein headers are unusual, I can refer to the example fasta file.

@huiwenke
Copy link

You can use MineProt toolkit (https://github.com/huiwenke/MineProt/wiki/Toolkit-manual#colabfoldtransformpy) to rename them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants