Skip to content

Latest commit

 

History

History
33 lines (20 loc) · 3.2 KB

README.md

File metadata and controls

33 lines (20 loc) · 3.2 KB

Project 9: Disseminating FAIR Machine Learning Models via BioModels

Abstract

Machine learning (ML) models are widely used as tools in life science and medical research. However, ML models are scattered across various resources including personal websites, git-hub, bitbucket, and supplementary material, making it difficult to find, access, and reuse them. We propose to extend BioModels (https://www.ebi.ac.uk/biomodels) to support FAIR dissemination of ML models in biomedical sciences. BioModels is an ELIXIR deposition database of biomedical mechanistic models, hosted at EMBL-EBI and accessed by about 51,000 unique users (IPs) annually. BioModels’s infrastructure was recently enhanced to support version-controlled dissemination and curation of a wide range of modelling frameworks and formats, providing capabilities to host and disseminate ML models. We propose to engage with the ML modellers during BioHackathon to support the dissemination of their models to BioModels. We will semantically enrich models with controlled vocabularies such as Disease and Gene Ontologies adapting the existing metadata-support and curation guidelines in BioModels. ML models can be linked with the model data hosted within EMBL-EBI and other ELIXIR nodes through cross-references using BioModels qualifiers. Using the metadata, the sophisticated search engine of BioModels will allow users to easily find and download ML models. We will use this BioHackathon to perform a pilot work on FAIR model dissemination via BioModels. Firstly, we will engage with ML modellers and identify minimal and essential metadata standards for ML models, and adapt the existing interoperable COMBINE metadata framework to implement it. We will semantically enrich the existing 16 ML models in BioModels (https://www.ebi.ac.uk/biomodels/search?query=submitter_keywords%3AMachine+Learning+Model&domain=biomodels_all). Following on, we will solicit ML model submissions from the ELIXIR ML modelling community. We will also import and annotate publicly available key ML models to extend the collection in BioModels. Through this pilot work, we will demonstrate the proof of the concept to disseminate metadata-rich, data and tools cross-referenced FAIR ML models via BioModels.

Topics

Bioschemas Data Platform Machine learning Tools Platform

Project Number: 9

Lead(s)

Rahuman S Malik Sheriff ([email protected])

Expected outcomes

Establish minimal metadata standard (Version 1) to enhance findability of ML model, based on Bioschema and/or BioModels Qualifiers. The minimal metadata will cover broader aspects including the biology of the model, ML method, data and tools used, features, inputs, and output of the model. Identification of key ontologies and extension of COMBINE standard and BioModels SOP to annotate ML models Annotation of existing ML models established metadata standards in BioModels. External submission of ML models from the ELIXIR community to BioModels Publicly available pilot collection of FAIR ML models in BioModels

Expected audience

Skill sets in participants*: Machine learning modelling (any approach), Ontologies, metadata, bioschema, coding (R, Python, etc) (*at least any two of these skills)

Number of expected hacking days: 4