Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AlphaFold database #438

Open
sirius777coder opened this issue Nov 13, 2022 · 14 comments
Open

Support for AlphaFold database #438

sirius777coder opened this issue Nov 13, 2022 · 14 comments

Comments

@sirius777coder
Copy link

Hi,

I think Biotite is a really useful tool when I deal with a lot of biological data. Do your team has any plan to integrate AlphaFold database into the biotite API?

@padix-key
Copy link
Member

Hi, currently it is not planned to add a direct interface to the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/). To my knowledge this database has no documentation for a REST API to search AlphaFold structures and fetch them. The RCSB PDB started to intgrate AlphaFold structures itself (https://www.rcsb.org/docs/general-help/computed-structure-models-and-rcsborg), so probably download of AlphaFold structures will be managed via biotite.database.rcsb in the foreseeable future.

@sirius777coder
Copy link
Author

OK, look forward to seeing this!

@BradyAJohnston
Copy link

BradyAJohnston commented Jun 25, 2023

AlphaFoldDB officially released docs and better support for their API a few days ago. I was going to have a go at it, but I don't really know my way around APIs much at all. Thought I would share the links to potentially have it integrated into biotite.

https://www.ebi.ac.uk/about/news/updates-from-data-resources/alphafold-database-ux-update/

https://alphafold.ebi.ac.uk/api-docs

@dacarlin
Copy link

I would be happy to contribute an interface to the AlphaFold DB, I am familiar with the APIs. Is this something the project would welcome? I would be a first time contributor to biotite (long time user)

@padix-key
Copy link
Member

padix-key commented Jul 11, 2023

Yes, indeed! In the moment there is already a contributor that was working in this in #465. However, the last commit is multiple months ago, so I expect the contributor cancelled the project. I would like to give the author another week to respond. Otherwise I would close his PR and give way to you to work on this feature.

I would imagine an interface that is similar to biotite.database.pdb (at least for fetch()). Although for license reasons you cannot simply copy the code from #465, you can still take advantage of the discussion as guideline.

@dacarlin
Copy link

That all sounds good to me 👍 Agree that the interface for biotite.database.pdb is a perfect model. How about I check back next week for an updated status?

@padix-key
Copy link
Member

There was no response, yet, so I closed #465. Hence you can start if you like.

@padix-key
Copy link
Member

Now @jonfunk21 also approved, that you may take on the issue.

@dacarlin
Copy link

dacarlin commented Jul 21, 2023 via email

@dacarlin
Copy link

I’ve implemented fetching and included a few basic tests so far (see here https://github.com/dacarlin/biotite/blob/add-alphafold-db/tests/database/test_alphafold.py).

Shall I open a WIP merge request as I wrap up the changes and get the code ready for your comments? Currently, I have my changes in a branch that is in a recent fork

@padix-key
Copy link
Member

padix-key commented Jul 28, 2023

Looks already quite good, feel free to open a PR. I wonder if the tests pass, since there are two issues, if I am not wrong.

content = file_response.text should not work for bcif because it is a binary format.

from .check import assert_valid_response should give an error because there is no check.pymodule in this subpackage.

@dacarlin
Copy link

dacarlin commented Aug 17, 2023

Yes, thanks! You are definitely correct, I ran into issues before I finished writing the tests which led me down a rabbit hole.

Briefly, I created a new Conda environement using the environment.yml but I still got some errors from Cython that are preventing me from installing with pip install . and pip install -e .. I put the error log in this gist https://gist.github.com/dacarlin/bac823ef62aef6d3df7261ad8dd6d76a in case it's helpful. I ended up adding explicit casts to int in 4-5 lines of src/biotite/structure/io/pdb/hybrid36.pyx to get the install to work.

Good news is now I can write proper tests!

@padix-key
Copy link
Member

I suppose you probably have the brand new Cython 3.0 installed? I just saw that the environment.yml does not restrict the upper bound of the Cython version so you probably have installed version 3.0, that introduced some breaking language changes. It quite relieves me to hear, that only a few lines of code seem to require fixing. I will create an issue for the changes introduced by Cython 3.0.

@padix-key padix-key mentioned this issue Aug 19, 2023
3 tasks
@dacarlin
Copy link

Glad that it's an easy fix. This is the output of conda env export --from-history in case it helps troubleshoot

name: biotite-dev
channels:
  - defaults
dependencies:
  - sphinx-gallery=0.11.1
  - tantan
  - matplotlib[version='>=3.3']
  - networkx[version='>=2.0']
  - muscle=3
  - setuptools[version='>=30.0']
  - mafft
  - sphinx[version='>=5.0']
  - numpy[version='>=1.15']
  - numpydoc[version='>=0.8']
  - requests[version='>=2.12']
  - sra-tools
  - dssp
  - wheel[version='>=0.30']
  - autodock-vina
  - scipy[version='>=1.8.0']
  - pytest[version='>=5.2']
  - scikit-learn[version='>=0.18']
  - pydot[version='>=1.4']
  - mdtraj[version='>=1.9.3']
  - msgpack-python[version='>=0.5.6']
  - clustalo
  - cython[version='>=0.29']
  - viennarna[version='>=2.5.0']
  - python=3.9
  - sphinxcontrib-bibtex[version='>=2.3']
  - pip[version='>=10.0']
  - cmake

Meanwhile, I'll create a PR for this feature branch 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants