Skip to content
@ArtifactDB

ArtifactDB

Metadata and file store for arbitrary data objects

Storage for analysis-ready artifacts

About

This organization contains various repositories for the ArtifactDB project, a storage system for analysis-ready artifacts. It aims to provide easy access to datasets and analysis results across multiple programming frameworks such as R and Python. The ArtifactDB system was originally developed at @Genentech to store the outputs of various genomics analysis pipelines (plus the associated metadata); scientists can then pull these artifacts into their analysis environments for further processing.

For users

R

The alabaster suite implements functions to save and read Bioconductor objects via language-agnostic file formats. This is the workhorse of our R-based data serialization pipelines, managing the conversion of various objects into files for long-term storage.

The gypsum R package implements an interface to the gypsum REST API. This handles the upload of files and associated metadata to cloud storage for large-scale distribution.

Python

The dolomite suite implements functions to save and read Bioconductor objects via language-agnostic file formats. This is equivalent to alabaster for Python and is based heavily on the classes from the BiocPy project.

The gypsum Python package implements an interface to the gypsum REST API. This is equivalent to the R package of the same name.

For developers

The gypsum worker implements a REST API for storing and serving artifacts via the Cloudflare stack. This uses R2 for storage and Workers to handle authenticated uploads via flexible permission schemes.

The Gobbler manages artifacts across users on a shared filesystem such as those used in HPC clusters. This is effectively an on-premise version of gypsum that is simpler and more efficient for local applications.

The SewerRat provides a search index for metadata on a shared filesystem. This enables HPC users to find and share objects within an organization.

The takane library contains language-agnostic specifications for all Bioconductor object types. These are enforced by validator functions written in C++, which are used by both alabaster and dolomite to verify compliance.

Other links

The scRNAseq R package and its Python counterpart use gypsum to store takane representations of various single-cell datasets.

The celldex R package and its Python counterpart use gypsum to store takane representations of cell type reference datasets.

Popular repositories Loading

  1. uzuki uzuki Public archive

    Safely saving R lists to JSON

    C++ 4

  2. alabaster.base alabaster.base Public

    Base methods for the alabaster client framework

    R 3

  3. alabaster.spatial alabaster.spatial Public

    Save and load SpatialExperiment objects to file

    R 2

  4. zircon-R zircon-R Public

    R interface for AritfactDB APIs

    R 2

  5. dolomite-base dolomite-base Public

    Save Bioconductor objects in Python.

    Python 2

  6. BiocObjectSchemas BiocObjectSchemas Public archive

    JSON schemas for Bioconductor objects

    1 1

Repositories

Showing 10 of 72 repositories
  • SewerRat Public

    Indexing user-defined directories on a shared filesystem

    ArtifactDB/SewerRat’s past year of commit activity
    Go 0 MIT 0 0 0 Updated Jan 19, 2025
  • .github Public

    Profile page for the organization.

    ArtifactDB/.github’s past year of commit activity
    0 0 0 0 Updated Jan 17, 2025
  • dolomite-mae Public
    ArtifactDB/dolomite-mae’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 24, 2024
  • dolomite-sce Public

    Save and load SingleCellExperiments from file.

    ArtifactDB/dolomite-sce’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 24, 2024
  • dolomite-se Public

    Save and load SummarizedExperiments in Python

    ArtifactDB/dolomite-se’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 24, 2024
  • dolomite-ranges Public

    Save and load genomic ranges in Python

    ArtifactDB/dolomite-ranges’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 24, 2024
  • dolomite-matrix Public

    Save and load matrices in Python

    ArtifactDB/dolomite-matrix’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 24, 2024
  • dolomite-base Public

    Save Bioconductor objects in Python.

    ArtifactDB/dolomite-base’s past year of commit activity
    Python 2 MIT 0 1 0 Updated Dec 24, 2024
  • prebuilt-hdf5 Public

    Prebuild HDF5 binaries for static linking

    ArtifactDB/prebuilt-hdf5’s past year of commit activity
    Shell 0 0 0 0 Updated Dec 24, 2024
  • dolomite-schemas Public

    Vendored schemas for Bioconductor objects

    ArtifactDB/dolomite-schemas’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Dec 23, 2024

Top languages

Loading…

Most used topics

Loading…