Skip to content

elixir-cloud-aai/crate-db

Repository files navigation

ELIXIR Crate DB

License Python 3.11 Development Status GitHub contributors Ruff

Introduction

Various research artifacts are produced daily around the world, but managing them is a tedious task for researchers. Researchers prefer to spend more time on new discoveries rather than finding efficient ways to store, retrieve, or share their data. This project aims to alleviate these challenges by providing a robust backend system for managing RO-Crates—standardized packages for research data and metadata.

User Story

The end user (research publisher/scientist) will be able to manage any of the RO-Crates developed for personal or public research. The main functions include CRUD operations on RO-Crates and publishing them to Zenodo. This project can be summarized as an "Efficient RO-Crate Management System". Future enhancements include developing a frontend component to further simplify these operations.

Project Goals & Milestones

Goals

  • Develop a scalable and extensible Flask app using FOCA.
  • Implement efficient search functionality using ElasticSearch.
  • Enable access controls over the application.
  • Create endpoints for publishing/unpublishing RO-Crates to/from Zenodo.
  • Package the project as a Python dependency.
  • Write examples, tests, and documentation.

Installation

Requirements

Ensure you have the following software installed:

  • Docker (19.03.4, build 9013bf583a)
  • docker-compose (1.25.5)
  • Git (2.17.1)

Indicated versions were used for developing/testing. Other versions may or may not work. Please let us know if you encounter any issues with versions newer than the listed ones.

Deploy app

Clone repository:

git clone https://github.com/elixir-cloud-aai/crate-db.git

Build and run services in detached/daemonized mode:

docker-compose up -d --build

Implementation Details

Abstract

The project focuses on developing a backend system for managing RO-Crates via API endpoints. The system provides functionalities for storing, retrieving, publishing, and unpublishing RO-Crates.

Flow

  • Storage: Researchers submit RO-Crate data and metadata via API endpoints. The system validates and stores the data securely.
  • Retrieval: Researchers query the system via API endpoints to locate specific RO-Crates. The system returns the data in the specified format.
  • Publishing/Unpublishing: Integration with Zenodo allows researchers to publish/unpublish RO-Crates. API endpoints facilitate these processes.

Architecture

Three main components will be used:

  1. MinIO: Serves as an object storage system for storing zipped RO-Crates.
  2. MongoDB: Stores metadata and zipped RO-Crates.
  3. RO-Crate Microservice: Central package interacting with other components

Why MinIO?

MinIO is chosen for its high performance, scalability, cloud-native compatibility, security features, and cost-effectiveness. It supports object storage, horizontal scaling, and built-in encryption, making it ideal for our project's needs.

Technical Details

Technologies

  • FOCA: Enables fast development of OpenAPI-based HTTP API microservices in Flask.
  • MongoDB and MinIO: Used for storage.
  • Zenodo APIs: Used for publishing research artifacts.

Challenges

  1. Complexity of RO-Crate Structure: Designing efficient storage and retrieval mechanisms.
  2. Integration with External Services: Handling authentication, authorization, and error handling.
  3. Optimizing Performance: Ensuring efficient uploads, downloads, and search operations.
  4. Security and Data Integrity: Protecting sensitive data and ensuring integrity during transmission and storage.

Future Prospects

Future development will focus on thorough implementation of CRUD operations, emphasizing simplicity and robustness. Stretch goals include advanced operations and developing a frontend component for enhanced user interaction and accessibility. This project aims to evolve into a comprehensive solution for managing RO-Crates efficiently and effectively.

Contributing

This project is a community effort and lives off your contributions, be it in the form of bug reports, feature requests, discussions, ideas, fixes, or other code changes. Please read these guidelines if you want to contribute. And please mind the code of conduct for all interactions with the community.

Versioning

The project adopts the semantic versioning scheme for versioning. Currently, the service is in alpha stage, so the API may change and even break without further notice.

License

This project is covered by the Apache License 2.0 also shipped with this repository.

Contact

Crate-DB is part of ELIXIR Cloud & AAI, a multinational effort at establishing and implementing FAIR data sharing and promoting reproducible data analyses and responsible data handling in the life sciences.

If you have suggestions for or find issue with this app, please use the issue tracker. If you would like to reach out to us for anything else, you can join our [Slack board][badge-url-chat], start a thread in our Q&A forum, or send us an email.

GA4GH logo ELIXIR logo ELIXIR Cloud & AAI logo

About

Microservice for handling RO-Crates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published