Pipeline to assign body length-dry biomass allometry equations and other functional traits to a taxonomic name based on taxonomic hierarchy in freshwater invertebrates. However, the name-matching pipeline can technically be used for any taxonomic group.
The package is WIP. It will, hopefully, be available on CRAN soon.
You can try to install it from GitHub, but as it's a WIP, it
may or may not work
depending on the tides, temperature, the color of your socks and what
you had for dinner the day before yesterday.
To date, we transitively depend on terra, which may require additional installation steps (GDAL) on your OS. Please see the corresponding install section of their README first.
Now, if you feel lucky, try:
install_github("haganjam/InvTraitR")
The package exports a single function get_trait_from_taxon
:
get_trait_from_taxon(
data, # data.frame with at least five columns: target taxon, life stage, latitude (dd), longitude (dd) and body size (mm) if trait == "equation"
target_taxon, # character string with the column name containing the taxon names
life_stage, # character string with the column name containing the life stages
latitude_dd, # character string with the column name containing the latitude in decimal degrees
longitude_dd, # character string with the column name containing the longitude in decimal degrees
body_size, # character string with the column name containing the body size data if trait = "equation"
workflow = "workflow2", # options are "workflow1" or "workflow2" (default = "workflow2)
max_tax_dist = 3, # maximum taxonomic distance acceptable between the target and the taxa in the database (default = 3)
trait = "equation", # trait to be searched for (default = "equation")
gen_sp_dist = 0.5 # taxonomic distance between a genus and a species(default = 0.5)
)
See the docs for more details.
companion_scripts
contains all the scripts used to create access and analyse the database. The different folders hold scripts for different tasks. The numbers of the folders and the numbers of the scripts within the folders indicate in which order the scripts should be run.
There is one helper script which contains a customised plotting theme that is used throughout the analyses performed:
helper-plot-theme.R
The data cleaning folder holds scripts that are used to clean the raw data that was compiled in excel files. The raw data files are then saved as .rds files and stored in the database folder.
There are three scripts in this folder. The first is the script that we use to create the higher-level taxonomic graphs. This works by first harmonising all taxon names in the equation database to three different taxonomic backbones: COL, GBIF and ITIS. Once the names are harmonised, we extract either the family or order of each taxon name. Descendent taxa from all unique families and orders are then extracted and compiled into igraph objects that describe how the different taxon names i.e. species, genera, families, orders etc. relate to each other. These igraph objects are exported as .rds files and stored in the database folder.
01_create_taxon_databases.R
The second script is used to add biogeographical realm, major habitat type and ecoregion information to each equation in the database using the latitude and longitude data associated with each equation and Abell et al.'s (2008) global ecoregion map:
02_set_freshwater_ecoregion_data.R
The third script is a helper function used to generate the higher-level taxonomic graphs:
helper-taxon-matrix-function.R
This folder contains the script where we test the accuracy of our method for matching names to appropriate equations. Specifically, we compare the biomass generated by selecting equations in the database to actual measured biomass that we compiled from the literature along with biomass generated from equations selected by experts.
First, we use a script to prepare the test data that we compiled from the literature all the files of which are stored in the database folder:
01_prep_test_data.R
Next, we use these test data to test the accuracy of workflow2 which is our automated method for selecting appropriate equations based on a taxonomic name and the geographic/habitat similarity with the equations in the database.
Third, scripts 3 and 4 are used to examine the sources of error variation that we get from workflow2:
03_analyse_error_variation.R
04_model_error_variation.stan
The final script contains helper functions used in the analyses:
helper-miscellaneous.R
This script is used to examine the taxonomic and geographical coverage of the equations in our database.
01_descrobe_database.R
There's a devcontainer setup included. If you use VSC you should be prompted to open the project in a container automatically.
devtools
are bundled with the devcontainer. Load library(devtools)
and you
have load_all()
, test()
and check()
ready at hand.
We use renv
to provide reproducibility as far as it gets with R.
Use renv::snapshot()
after changing dependencies, renv::restore()
to install declared versions
of the dependencies and renv::update()
to update to latest CRAN versions (before pushing to CRAN).
The database files will be put into an appdata dir (given by rappdirs
) when executing
tests or when people load the actual package. If you made changes to the DB files and need
to update the files in the appdata dir there's the utility function update_user_db()
.