Reproducible research and computing

Best practices for quantitative research including management and sharing of data and computational methods. This repo helps to understand the foundational skills that everyone using computers in the pursuit of scientific research should have. In recent years, the concern for reproducibility in computational science has gained traction. Research group has been pushing for several years the adoption of better standards for reproducible research. These standards include treating scientific software as a core intellectual product, adding automation to our data handling and analysis, and the open sharing of our digital objects.

This repo introduce to the tools and techniques that is consider fundamental for responsible use of computers in scientific research.

Data Acquisition

Choose an experimental design (if appropriate)
Consider population structure and confounding
Ensure you’ll have data available for validation in the future
Balance time and monetary budgets against the number of data points generated or conditions tested

Carry out power calculations

If limited to a particular data set(s) that don’t provide information needed to answer your research question, you may need to change your research question or hypothesis

Data Provenance

Data Source(s)

Record who carried out each experiment, at what date and time, how to contact them
Record the model and make information for any equipment used

Data Storage and Management

If you expect your project to include big data, by any definition - volume, velocity, or variety - plan up front for how you’ll manage it
If working with big data, consider a distributed file system and parallel processing, as well as streamed processing that can save progress, start, and stop as needed
If your data change frequently, include an explicit annotation process, automated if possible, that captures where new files are coming from, at what time, and who’s responsible for them
If you’re working with many different kinds of data, consider using a data store such as the Open Science Data Framework that provides detailed typing, metadata, and cross-file integration

Analysis Plan

Consider what could go wrong and how can you avoid it
Ensure you have the resources in place (necessary software, sufficient computing power) to carry out the whole project
Ensure re-running an entire analysis will not be difficult to carry out

Analysis Infrastructure

Choose a formal scientific workflow environment
Find and install software (text editor, development environment, task tracking,documentation, and communication tools) that will facilitate your day-to-day work
Set up a revision control repository for the project (GitHub, BitBucket, etc.)
Create a directory and folder structure
Store files in a consistent and intuitive manner
Include a README file that contains information about all other files in your repository

General guidelines

Command line utilities in Unix/Linux
An open-source scientific software ecosystem (personal favorite is Python's)
Software version control (The distributed kind: git)
Good practices for scientific software development: code hygiene and testing
Knowledge of licensing options for sharing software

Data Analysis

Provenance Tracking Infrastructure
Ensure you have infrastructure for at least the following three aspects of your research
Analysis commands you can run as part of your main workflow
Data that is produced by these, or other, commands
Lab notebook for everything else
Use a supplemental provenance annotation system and/or environment for formal data provenance

JSON, XML, W3C PROV, Open Provenance Model
Taverna, Open Science Data, MyGrid

Workflow Documentation

Practice literate programming and extensively document all code
Choose easily interpretable variable names
Make use of version control
GitHub, Google Docs, Etherpad, etc.
Make code and workflow as automated as possible
Use driver scripts
Use platforms like R Markdown, Jupyter notebook, etc.
Incorporate positive and negative controls throughout the analysis
Consider using a shared file service ( Dropbox, Google Drive, OneDrive, amozon webservices etc.)

Reproducible research:

Authors provide all the necessary data and the computer codes to run the analysis again, re-creating the results.

Replication:

Arriving at the same scientific findings as another study, collecting new data (possibly with different methods) and completing new analyses. A full replication study is sometimes impossible to do, but reproducible research is only limited by the time and effort we are willing to invest.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
image		image
version_control		version_control
Branching.md		Branching.md
Design_workflow.rst		Design_workflow.rst
Git_Basic.rst		Git_Basic.rst
Git_and_Github.ipynb		Git_and_Github.ipynb
Literate_programming.rst		Literate_programming.rst
README.mdown		README.mdown
Workflow_Management.rst		Workflow_Management.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image

image

version_control

version_control

Branching.md

Branching.md

Design_workflow.rst

Design_workflow.rst

Git_Basic.rst

Git_Basic.rst

Git_and_Github.ipynb

Git_and_Github.ipynb

Literate_programming.rst

Literate_programming.rst

README.mdown

README.mdown

Workflow_Management.rst

Workflow_Management.rst

Repository files navigation

Reproducible research and computing

Data Acquisition

Carry out power calculations

Data Provenance

Data Source(s)

Data Storage and Management

Analysis Plan

Analysis Infrastructure

General guidelines

Data Analysis

Workflow Documentation

Reproducible research:

Replication:

About

Releases

Packages

Languages

sudpaul/Reproducible_research

Folders and files

Latest commit

History

Repository files navigation

Reproducible research and computing

Data Acquisition

Carry out power calculations

Data Provenance

Data Source(s)

Data Storage and Management

Analysis Plan

Analysis Infrastructure

General guidelines

Data Analysis

Workflow Documentation

Reproducible research:

Replication:

About

Topics

Resources

Stars

Watchers

Forks

Languages