| Documentation |
modularized go ingest
- Python >= 3.10
- Poetry
Upon creating a new project from the cookiecutter-monarch-ingest
template, you can install and test the project:
cd go-ingest
make install
make test
There are a few additional steps to complete before the project is ready for use.
-
Create a new repository on GitHub.
-
Enable GitHub Actions to read and write to the repository (required to deploy the project to GitHub Pages).
- in GitHub, go to Settings -> Action -> General -> Workflow permissions and choose read and write permissions
-
Initialize the local repository and push the code to GitHub. For example:
cd go-ingest git init git remote add origin https://github.com/<username>/<repository>.git git add -A && git commit -m "Initial commit" git push -u origin main
- Edit the
download.yaml
,transform.py
,transform.yaml
, andmetadata.yaml
files to suit your needs.- For more information, see the Koza documentation and kghub-downloader.
- Add any additional dependencies to the
pyproject.toml
file. - Adjust the contents of the
tests
directory to test the functionality of your transform.
- Update this
README.md
file with any additional information about the project. - Add any appropriate documentation to the
docs
directory.
Note: After the GitHub Actions for deploying documentation runs, the documentation will be automatically deployed to GitHub Pages.
However, you will need to go to the repository settings and set the GitHub Pages source to thegh-pages
branch, using the/docs
directory.
This project is set up with several GitHub Actions workflows.
You should not need to modify these workflows unless you want to change the behavior.
The workflows are located in the .github/workflows
directory:
test.yaml
: Run the pytest suite.create-release.yaml
: Create a new release once a week, or manually.deploy-docs.yaml
: Deploy the documentation to GitHub Pages (on pushes to main).update-docs.yaml
: After a release, update the documentation with node/edge reports.
Once you have completed these steps, you can remove the Setting Up a New Project section from this README.md
file.
cd go-ingest
make install
# or
poetry install
Note that the
make install
command is just a convenience wrapper aroundpoetry install
.
Once installed, you can check that everything is working as expected:
# Run the pytest suite
make test
# Download the data and run the Koza transform
make download
make run
This project is set up with a Makefile for common tasks.
To see available options:
make help
Download the data for the go_ingest transform:
poetry run go_ingest download
To run the Koza transform for go-ingest:
poetry run go_ingest transform
To see available options:
poetry run go_ingest download --help
# or
poetry run go_ingest transform --help
To run the test suite:
make test
The Gene Ontology Annotation Database compiles high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB), RNA molecules from RNACentral and protein complexes from the Complex Portal.
Manual annotation is the direct assignment of GO terms to proteins, ncRNA and protein complexes by curators from evidence extracted during the review of published scientific literature, with an appropriate evidence code assigned to give an assessment of the strength of the evidence. GOA files contain a mixture of manual annotation supplied by members of the Gene Ontology Consortium and computationally assigned GO terms describing gene products. Annotation type is clearly indicated by associated evidence codes and there are links to the source data.
There is a ReadMe.txt file that explains the different annotation files available. The ingested Gene Annotation File (GAF) is a 17 column tab-delimited file. The file format conforms to the specifications demanded by the GO Consortium and therefore GO IDs and not GO term names are shown.
Biolink captured
- biolink:Gene
- id (NCBIGene Entrez ID)
-
biolink:MolecularActivity
- id (GO ID)
-
biolink:BiologicalProcess
- id (GO ID)
-
biolink:CellularComponent
- id (GO ID)
-
biolink:Pathway
- id (GO ID)
-
biolink:PhysiologicalProcess
- id (GO ID)
Associations
- biolink:FunctionalAssociation
- id (random uuid)
- subject (gene.id)
- predicate (related_to)
- object (go_term.id)
- negated
- has_evidence
- publications
- aggregating_knowledge_source (["infores:monarchinitiative"])
- primary_knowledge_source
OR
-
biolink:MacromolecularMachineToMolecularActivityAssociation:
- id (random uuid)
- subject (gene.id)
- predicate (related_to)
- object (go_term.id)
- negated
- has_evidence
- publications
- aggregating_knowledge_source (["infores:monarchinitiative"])
- primary_knowledge_source
-
biolink:MacromolecularMachineToBiologicalProcessAssociation:
- id (random uuid)
- subject (gene.id)
- predicate (participates_in)
- object (go_term.id)
- negated
- has_evidence
- publications
- aggregating_knowledge_source (["infores:monarchinitiative"])
- primary_knowledge_source
-
biolink:MacromolecularMachineToCellularComponentAssociation:
- id (random uuid)
- subject (gene.id)
- predicate (located_in)
- object (go_term.id)
- negated
- has_evidence
- publications
- aggregating_knowledge_source (["infores:monarchinitiative"])
- primary_knowledge_source
Possible Additional Gene to Gene Ontology Term Association?
- biolink:GeneToGoTermAssociation:
- id (random uuid)
- subject (gene.id)
- predicate (related_to)
- object (go_term.id)
- negated
- has_evidence
- publications
- aggregating_knowledge_source (["infores:monarchinitiative"])
- primary_knowledge_source
Ashburner et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000 May;25(1):25-9. The Gene Ontology Consortium. The Gene Ontology knowledgebase in 2023. Genetics. 2023 May 4;224(1):iyad031
This project was generated using monarch-initiative/cookiecutter-monarch-ingest.
Keep this project up to date using cruft by occasionally running in the project directory:cruft updateFor more information, see the cruft documentation