Bystro Open Source

At Bystro, we believe natural language is the right interface for genetic and proteomic analysis. We are buildign the world's first LLM-powered natural language analysis engine that takes your questions about complex genetic and proteomic datasets, and converts them into statistical answers with easy to understand summaries and visualizations.

This is our open-source repo of machine learning methods for high dimensional statistics, as well as some applications in genomics and proteomics.

This work is the basis for the Bystro natural language analysis platform for genetics & proteomics. See https://bystro.io

Machine Learning Methods

We are working hard on cutting edge algorithms, and haven't found much time for documentation. More detailed descriptions coming soon, but until then, a brief summary is found below:

Covariance Matrix Estimation and Hypothesis Testing

from bystro.covariance import *

Regularized covariance matrix estimation methods well suited for smaller sample size regimes where n << p
Covariance matrix hypothesis tests, like the 2 sample covariance test (from bystro.random_matrix_theory.rmt4ds_cov_test import two_sample_cov_test)

Random Matrix Theory Methods

from bystro.random_matrix_theory import *

Random Matrix Theory modules that are foundational for significance tests, such as our two_sample_cov_test

Stochastic Gradient Langevin

from bystro.stochastic_gradient_langevin import *

Implementation of Stochastic Gradient Langevin algorithm in https://www.ics.uci.edu/~welling/publications/papers/stoclangevin

Fair Machine Learning and Supervised PPCA / Variational Principal Component Regression

from bystro.supervised_ppca import *

supervised_ppca is a collection of generative methods:

Probabilistic PCA (PPCA)
Supervised PPCA (also know as Variationl Principal Component Regression): Novel method for network analysis that is able to pick up dynamics of interest in low variance components. Also competitive with Elastic Net in a regression context, without shrinking covariates (instead shrinks them in latent space). See our recent publication: https://arxiv.org/abs/2409.02327
Adversarial Probabilistic PCA: Fair ML method that removes the influence of M sensitive variables, from high dimensional data

Applications in Proteomics

Description coming soon

Applications in Genetics

Description coming soon

Publications

Talbot et al. arXiv, 2024

Kotlar et al, Genome Biology, 2018

Installing Bystro Python library

To install the Bystro Python package, run:

pip install --pre bystro

The Bystro ancestry CLI score tool (bystro-api ancestry score) parses VCF files to generate dosage matrices. This requires bystro-vcf, a Go program which can be installed with:

# Requires Go: install from https://golang.org/doc/install
go install github.com/bystrogenomics/[email protected]

Bystro is compatible with Linux and MacOS. Windows support is experimental. If you are installing on MacOS as a native binary (Arm), you will need to install the following additional dependencies:

brew install cmake

Please refer to INSTALL.md for more details.

Installing the Bystro Annotator

Please refer to INSTALL.md for instructions on how to install the Bystro annotator.

Name		Name	Last commit message	Last commit date
Latest commit History 1,054 Commits
.github		.github
.vscode		.vscode
config		config
docs		docs
go		go
install		install
perl		perl
python		python
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.initialize_conda_env.sh		.initialize_conda_env.sh
API.md		API.md
BUILD.md		BUILD.md
CONTRIBUTING.md		CONTRIBUTING.md
Changes.md		Changes.md
Dockerfile.perl		Dockerfile.perl
Dockerfile.python		Dockerfile.python
FIELDS.md		FIELDS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
TUTORIAL.md		TUTORIAL.md
dev-startup.yml		dev-startup.yml
install-apt.sh		install-apt.sh
install-rpm.sh		install-rpm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bystro Open Source

Machine Learning Methods

Covariance Matrix Estimation and Hypothesis Testing

Random Matrix Theory Methods

Stochastic Gradient Langevin

Fair Machine Learning and Supervised PPCA / Variational Principal Component Regression

Applications in Proteomics

Applications in Genetics

Publications

Installing Bystro Python library

Installing the Bystro Annotator

About

Releases 5

Packages

Contributors 10

Languages

License

bystrogenomics/bystro

Folders and files

Latest commit

History

Repository files navigation

Bystro Open Source

Machine Learning Methods

Covariance Matrix Estimation and Hypothesis Testing

Random Matrix Theory Methods

Stochastic Gradient Langevin

Fair Machine Learning and Supervised PPCA / Variational Principal Component Regression

Applications in Proteomics

Applications in Genetics

Publications

Installing Bystro Python library

Installing the Bystro Annotator

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 10

Languages

Packages