GPF: Genotypes and Phenotypes in Families

The Genotype and Phenotype in Families (GPF) system manages large databases of genetic variants and phenotypic measurements obtained from collections of families and individual family members.

The main application of the system has been in managing the data gathered from the Simons Simplex Collection, a collection of ~2,600 families with one child diagnosed with autism.

Information on how to use GPF can be found in the GPF documentation.

Development

We recommend using Anaconda environment for creation of GPF development environment. In the steps below, we use the mamba package manager.

Install GPF dependencies

Create a conda gpf environment with all of the conda package dependencies from environment.yml and dev-environment.yml files. From gpf root directory run:

mamba env create --name gpf --file ./environment.yml
mamba env update --name gpf --file ./dev-environment.yml

To use this environment, you need to activate it using the following command:

conda activate gpf

The following commands are going to install GPF dae and wdae packages for development usage. (You need to install GPF packages in the development gpf conda environment.)

for d in dae wdae dae_conftests; do (cd $d; pip install -e .); done

Additional GPF genotype storages

There are some additional genotype storages that are not included in the default GPF installation and if you plan to use or develop features for these genotype storages you need to install their dependencies.

Apache Impala genotype storage

To use ore develop features for GPF impala genotype storage you need some additional dependencies installed. From gpf root directory update your gpf conda environment using:

mamba env update --name gpf --file ./impala_storage/impala-environment.yml

and install the gpf_impala_storage package using:

pip install -e impala_storage

Apache Impala2 genotype storage

To use ore develop features for GPF impala genotype storage you need some additional dependencies installed. From gpf root directory update your gpf conda environment using:

mamba env update --name gpf --file ./impala2_storage/impala2-environment.yml

and install the gpf_impala2_storage package using:

pip install -e impala2_storage

GCP genotype storage

If you want support for genotype storage on Google Cloud Platform (GCP) using the Google BigQuery for querying variants you need to install more dependencies in your development environment:

mamba env update --name gpf --file ./gcp_storage/gcp-environment.yml

and install gcp_genotype_storage package using:

pip install -e gcp_storage

To run the tests you need to authenticate for seqpipe-gcp-storage-testing project:

gcloud config list project

[core]
project = seqpipe-gcp-storage-testing

Your active configuration is: [default]

using

gcloud auth application-default login

To run the GCP storge tests you should enter into the gpf/gcp_storage directory and run:

py.test -v gcp_storage/tests/

To run the intergration tests use:

py.test -v ../dae/tests/ --gsf gcp_storage/tests/gcp_storage.yaml

Pre-commit lint check hook

A git pre-commit hook for lint checking with Ruff is included. To install it, run the following command from the repository's directory:

cp pre-commit .git/hooks

To bypass the pre-commit hook, use the following flag when committing:

git commit --no-verify

Name		Name	Last commit message	Last commit date
Latest commit History 14,851 Commits
.github		.github
build-env		build-env
build-scripts @ 41fb5d4		build-scripts @ 41fb5d4
dae		dae
dae_conftests		dae_conftests
docs		docs
external_demo_annotator		external_demo_annotator
external_vep_annotator		external_vep_annotator
federation		federation
gcp_storage		gcp_storage
impala2_storage		impala2_storage
impala_storage		impala_storage
integration		integration
scripts		scripts
wdae		wdae
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
Dockerfile.seqpipe		Dockerfile.seqpipe
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
build.sh		build.sh
build_cleanup.sh		build_cleanup.sh
build_setup.sh		build_setup.sh
build_tests.sh		build_tests.sh
coveragerc		coveragerc
create_docker_network.sh		create_docker_network.sh
depconvert.sh		depconvert.sh
dev-environment.yml		dev-environment.yml
docker-compose.yaml		docker-compose.yaml
environment.yml		environment.yml
mypy.ini		mypy.ini
pre-commit		pre-commit
pylintrc		pylintrc
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPF: Genotypes and Phenotypes in Families

Development

Install GPF dependencies

Additional GPF genotype storages

Apache Impala genotype storage

Apache Impala2 genotype storage

GCP genotype storage

Pre-commit lint check hook

About

Releases

Packages

Contributors 20

Languages

License

iossifovlab/gpf

Folders and files

Latest commit

History

Repository files navigation

GPF: Genotypes and Phenotypes in Families

Development

Install GPF dependencies

Additional GPF genotype storages

Apache Impala genotype storage

Apache Impala2 genotype storage

GCP genotype storage

Pre-commit lint check hook

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 20

Languages

Packages