Skip to content

Latest commit

 

History

History

Project 6: Building a robust and reproducible assembly and annotation pipeline for non-model eukaryote genomes

BioHackrXiv repo: https://github.com/abueg/2022-biohackathon-bhxiv

Abstract

The European Reference Genome Atlas (ERGA) has gathered a wide community to generate reference genome assemblies for diverse eukaryote species. To this end, sequencing platforms have already generated large datasets for several species, which now require extensive bioinformatic analyses. Our project aims to build an assembly and annotation pipeline, in collaboration with the Vertebrate Genomes Project (VGP), to enable newcomers to the field to integrate heterogeneous sequencing datasets (PacBio HiFi, Nanopore, Illumina and Hi-C reads) and generate high-quality chromosome-level assemblies and gene sets. In addition, we will test new tools to identify efficient assembly and annotation strategies. Implementing this pipeline within the Galaxy framework will help streamlining the process, while also facilitating its access to biologists with limited access to High Performance Computing resources, as eukaryote genomes typically require large computational resources. This pipeline will also serve as a tutorial to convey technical skills and good practices in genome assembly and annotation. Working to establish these pipelines will also be vital to this community as they will help serve as standardized and reproducible protocols.

Topics

Biodiversity Galaxy Tools Platform

Project Number: 6

Lead(s)

Expected outcomes

Expected audience

  • Researchers working on genome assembly, annotation, or quality control
  • Participants interested in creating Galaxy tool wrappers / bioconda packages

Number of expected hacking days: 4