Before running the data analysis pipeline, we need to install the following tools:
-
fastqc: A quality control tool for high-throughput sequence data.
- Mac: Install using Homebrew -
brew install fastqc
- Mac: Install using Homebrew -
-
trimmomatic: A tool for trimming and filtering raw sequencing data.
- Download from: Trimmomatic Website
-
hisat2: A fast and sensitive alignment program for mapping next-generation sequencing reads to a reference genome.
- Download from: Hisat2 Website
-
sra toolkit: A suite of tools for accessing and working with data in the Sequence Read Archive (SRA).
- Install according to instructions from: SRA Toolkit GitHub
-
sam toolkit: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
- Download from: Sam Toolkit
To use the data analysis pipeline, follow these steps to download and prepare the data:
-
Go to the NCBI SRA website: NCBI SRA and find the datasets of interest.
-
Download the data in SRA format. For example, if the accession number is
SRR23683396
, you can download it using the following command in the directory where you want to store the data:prefetch SRR23683396 fastq_dump SRR23683396.sra
Finally, adjust the variable names and the commands in the script, and run
./scripts/pipeplie.sh