Skip to content

Latest commit

 

History

History
executable file
·
189 lines (140 loc) · 12.4 KB

DataScience.md

File metadata and controls

executable file
·
189 lines (140 loc) · 12.4 KB

NB :: Not all the Data sets are freely available. Also includes Data Management (Research Data Management, Clinical Research Data, Metadata, Library data, Computational reproducibility, etc..)


SOFTWARE

NOTE: This is a list of Julia language packages that automate the loading process for specific datasets. To use the datasets you may use these packages or write your own Julia package or modify existing ones.

  • CommonCrawl.jl :: Interface to common crawl dataset on Amazon S3.
  • DataDeps.jl: BinDeps for Data. Read the demo blog post.
  • Faker.jl :: A package that generates fake data.
  • FaceDatasets.jl :: A package for easy access to face-related datasets.
  • Maker.jl :: A tool like make for data analysis in Julia.
  • ModelerToolbox.jl :: Utilities for working with many different versions/parameterizations of models.
  • NetflixPrize.jl :: Julia package for handling the Netflix Prize data set of 2006.
  • PublicSuffix.jl :: Julia Interface for working with the Public Suffix List.
  • PubMedMiner.jl :: Return and analyze a PubMed/Medline search using MESH descriptors and their corresponding UMLS concept.
  • RDatasets.jl :: Julia package for loading many of the datasets available in R.
  • Socrata.jl :: An API wrapper for accessing the Socrata Open Data API and importing data into a DataFrame. Socrata is an open data platform used by many local and State governments as well as by the Federal Government in USA.
  • UCIMLRepo.jl :: A small package to allow for easy access and download of datasets from UCI ML repository.
  • WorldBankData.jl :: The World Bank provides free access to data about development at data.worldbank.org.

Data Science

  • jplyr.jl :: Data manipulation facilities for Julia.
  • Julia-data-science :: Notebooks on DS basics with Julia and why it is suitable for data science.

Research Data Management

Biomedical Research

  • REDCap.jl :: A Julia frontend for the REDCap API available under the MIT license, that supports both importing and exporting records, as well as deletion from the REDCap Database. It also includes functions for surveys and report generation.

ACTUARIAL SCIENCE

Econometrics

Finance


AstroPhysics

  • sndatasets :: Download and normalize published supernova photometric data.

BIOLOGY

Genome

  • ChromosomeMappings :: This repository contains chromosome/contig name mappings between UCSC <-> Ensembl <-> Gencode for a variety of genomes.
  • Download Gene data (via ftp) which integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
  • Saccharomyces Genome Database
  • Genome Project Database.
  • RefSeqGene defines genomic sequences to be used as reference standards for well-characterized genes and is part of the Locus Reference Genomic (LRG) Project.
  • The 3000 Rice Genomes Project Data, GigaScience Database and Journal, and [blog article in BMC](See also: http://blogs.biomedcentral.com/gigablog/2014/05/29/publish-data-fight-world-hunger/).
  • NCBI's Sequence Read Archive (SRA)
  • DataLad :: aims to provide access to scientific data available from various sources (e.g. lab or consortium web-sites such as Human connectome; data sharing portals such as OpenFMRI and CRCNS) through a single convenient interface and integrated with your software package managers (such as APT in Debian). Although initially targeting neuroimaging and neuroscience data in general, it will not be limited by the domain and a wide range of contributions are welcome. Get the source code on github.

Worms, Virus, Nematodes

  • The central MANUELA database, a.k.a. _M_eiobenthic _A_nd _N_ematode biodiversity _U_nravelling _E_cological and _L_atitudinal _A_spects database is compiled by capturing the available data on meiobenthos on a broad European scale.
  • Nematodes DB from the Blaxter Lab, based on analyses of ESTs or GSSs from neglected taxa using the PartiGene suite of programmes.
  • Nematode Transcriptome Analyses.
  • WormBase :: Species genomes with standardized sequence and annotations.

Genetics-Medicine

  • NCBI Resources for Genetics and Medicine.
  • HIV-1, Human Protein Interaction Database :: A database of known interactions of HIV-1 proteins with proteins from human hosts. It provides annotated bibliographies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data.

Medical Imaging

Molecular Biology

  • SASBDB ::Small Angle Scattering Biological Data Bank.

Neuroscience

  • Codeneuro-Datasets :: Shared data sets for collaborating, testing, and benchmarking.
  • MindResearchRepository.jl :: Access data sets from the Mind Research Repository.
  • OpenfMRI.org :: A project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data.
  • Neuroscience Databases list.
  • Neurovault :: A place where researchers can publicly store and share unthresholded statistical maps produced by MRI and PET studies.

Pharma


CHEMISTRY

Crystallography


DATA

DATA-DataScience

DATA-General

  • awesome-public-datasets :: A collection of large-scale public datasets on the Internet.

  • common-workflow-language :: Repository for CWL Specfications.

  • datasets :: Original data or Aggregated / cleaned / restructured existing datasets. Released under Creative Commons Attribution-ShareAlike 4.0 International License.

  • Freebase :: A community-curated database of well-known people, places, and things.

  • Wikidata :: A free linked database that acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others, that can be read and edited by both humans and machines.

  • World Bank Open Data :: Free and open access to data about development in countries around the globe.

DATA-Research

  • Registry of Research Data Repositories :: provides researchers, funding organisations, libraries and publishers with over 1,000 listed research data repositories from all over the world making it the largest and most comprehensive online catalog of research data repositories on the web.

DATA-Scientific



Gender Violence


MACHINE LEARNING

  • Machine learning datasets :: A list of the biggest machine learning datasets from across the web.
  • Celeb-DF :: A New Dataset for DeepFake Forensics that contains real and DeepFake synthesized videos having similar visual quality on par with those circulated online. The Celeb-DF dataset includes 408 original videos collected from YouTube with subjects of different ages, ethic groups and genders, and 795 DeepFake videos synthesized from these real videos.
  • UCI Machine Learning Repository

MATH

  • Juliaset.jl :: Generate Julia set images. This is created primarily as an example for JuliaBox hosted REST APIs.

PHYSICS


VIDEO

  • Databrary :: A video data library for developmental science. Share videos, audio files, and related metadata. The source code is on github.