-
Notifications
You must be signed in to change notification settings - Fork 14
Functional Annotation of MAGs (or Contigs)
Here we will assume that you already have a mags.qza
artifact to work with. The run-time estimates are based on the dereplicated (TODO: link to dereplication tutorial) version of the mags.qza
artifact from the Generate MAGs from Reads tutorial.
The process of functionally annotating your MAGs or contigs will involve several actions, more or less depending on what reference databases you want to use. The process can be broadly conceptualized into 3 stages:
- Downloading or building the reference database
- Searching your contigs or MAGs for homologs against a reference database
- Functionally annotating the search hits against the eggNOG database
Homologs can be found by comparing sequences to reference databases (e.g. Diamond, HMMER, and MMseqs2). Below you will find several options for fetching or constructing one of these reference databases. Feel free to choose the one that best fits your needs.
You must choose whether to download the complete Diamond reference database, only a portion of it (i.e. for a given taxon), or whether to create a custom database from user-provided protein sequences.
Use the fetch-diamond-db
action to download and save the full Diamond reference database.
⚠️ At least 18 GB of free storage space is required to run this action. Runtime: 17 minutes
qiime moshpit fetch-diamond-db \
--o-diamond-db diamond_db.qza
Use the fetch-eggnog-proteins
action to download and save the eggNOG protein database.
⚠️ At least 18 GB of free storage space in your machine is required to run this. First, we must download the eggNOG protein sequence database.
qiime moshpit fetch-eggnog-proteins \
--o-eggnog-proteins eggnog_proteins.qza
Now we can use this database to construct a Diamond database for a specific taxon, using the build-eggnog-diamond-db
action. The --p-taxon
parameter specifies the taxon ID number for which to build the database (here 2 = Bacteria).
qiime moshpit build-eggnog-diamond-db \
--i-eggnog-proteins eggnog_proteins.qza \
--p-taxon 2 \
--o-diamond-db diamond_db.qza \
--verbose
If you want the resulting Diamond database to have taxonomy features, first download the NCBI taxonomy database using the fetch-ncbi-taxonomy
action.
⚠️ At least 30 GB of free storage space is required to run this action.
qiime moshpit fetch-ncbi-taxonomy \
--o-taxonomy taxonomy.qza
--verbose
If you don't want taxonomy features just skip this step.
Now if you chose this option it's because you have a protein reference database that you would like to use to construct the Diamond database. Collect (if you have not already) all of your sequences in the same fasta file and import it into a Qiime2 artifact with the FeatureData[ProteinSequence]
semantic type.
qiime tools import \
--input-path my_proteins.fasta \
--output-path my_proteins.qza \
--type "FeatureData[ProteinSequence]"
--verbose
Now, construct a Diamond reference database using the build-custom-diamond-db
action.
If you decided to include taxonomy information in your database (i.e. the optional step above) don't forget to include a
--i-taxonomy taxonomy.qza
line in the command below.
qiime moshpit build-custom-diamond-db \
--i-seqs my_proteins.qza \
--o-diamond-db diamond.qza \
--verbose
Use the fetch-eggnog-hmmer-db
action to construct a HMMER database for a specific taxon. The --p-taxon-id parameter specifies the taxon ID number for which to build the database (here 2 = Bacteria).
⚠️ At least 80 GB of free storage space is required to run this action. Runtime: TODO
qiime moshpit fetch-eggnog-hmmer-db \
--p-taxon-id 1100069 \
--output-dir hmmr_db_1100069 \
--verbose
Search for hologosues by checking your sequences against a reference database.
Aprox runntime: 60 minutes
Search for homologs in your MAGs or contigs by comparing them against a Diamond database. Do this by using the eggnog-diamond-search
action.
qiime moshpit eggnog-diamond-search \
--i-sequences mags.qza \
--i-diamond-db diamond_db.qza \
--o-eggnog-hits hits.qza \
--o-table table.qza \
--parallel \
--p-num-partitions 5 \
--p-num-cpus 7 \
--verbose
Search for homologs in your MAGs or contigs by comparing them against a HMMER database. Do this by using the eggnog-hmmer-search
action.
qiime moshpit eggnog-hmmer-search \
--i-fastas hmmr_db_2/fastas.qza \
--i-idmap hmmr_db_2/idmap.qza \
--i-pressed-hmm-db hmmr_db_2/pressed_hmm_db.qza \
--i-sequences mags.qza \
--o-eggnog-hits hits.qza \
--o-table table.qza \
--p-num-cpus 7 \
--p-num-partitions 4 \
--parallel \
--verbose
Fetch the eggNOG database with the fetch‐eggnog‐db action.
⚠️ At least 80 GB of storage space is required to run this action.
qiime moshpit fetch-eggnog-db \
--o-eggnog-db eggnog.db \
--verbose
Annotate the hits from the previous stage against the eggNOG database with the eggnog-annotate
action.
qiime moshpit eggnog-annotate \
--i-eggnog-hits hits.qza \
--i-eggnog-db eggnog_db.qza \
--o-ortholog-annotations annotations.qza \
--verbose