Skip to content

Commit

Permalink
add pig fasta and gtf
Browse files Browse the repository at this point in the history
  • Loading branch information
hoelzer committed Feb 1, 2024
1 parent 094be4d commit bfce097
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 4 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ nextflow pull hoelzer-lab/rnaflow -r <RELEASE>
nextflow run hoelzer-lab/rnaflow --reads input.csv --autodownload hsa --pathway hsa --max_cores 6 --cores 2
```

with `--autodownload <hsa|mmu|mau|eco>` [build-in species](#build-in-species), or define your own genome reference and annotation files in CSV files:
with `--autodownload <hsa|mmu|ssc|mau|eco>` [build-in species](#build-in-species), or define your own genome reference and annotation files in CSV files:

```bash
nextflow run hoelzer-lab/rnaflow --reads input.csv --genome fastas.csv --annotation gtfs.csv --max_cores 6 --cores 2
Expand Down Expand Up @@ -258,10 +258,11 @@ You can add a [build-in species](#build-in-species) to your defined genomes and
We provide a small set of build-in species for which the genome and annotation files are automatically downloaded from [Ensembl](https://www.ensembl.org/index.html) with `--autodownload xxx`. Please let us know, we can easily add other species.
| Species | three-letter shortcut | Genome | Annotation |
| Species | three-letter shortcut | Annotation | Genome |
| ------------ | --------------------- | ----------------------------------- | --------------------------------------------- |
| Homo sapiens | `hsa` <sup>*</sup> | Homo_sapiens.GRCh38.98 | Homo_sapiens.GRCh38.dna.primary_assembly |
| Mus musculus | `mmu` <sup>*</sup> | Mus_musculus.GRCm38.99 | Mus_musculus.GRCm38.dna.primary_assembly |
| Sus scrofa | `ssc` <sup>*</sup> | Sus_scrofa.Sscrofa11.1.111 | Sus_scrofa.Sscrofa11.1.dna.toplevel |
| Mesocricetus auratus | `mau` <sup>*</sup> | Mesocricetus_auratus.MesAur1.0.100 | Mesocricetus_auratus.MesAur1.0.dna.toplevel |
| Escherichia coli | `eco` | Escherichia_coli_k_12.ASM80076v1.45 | Escherichia_coli_k_12.ASM80076v1.dna.toplevel |
Expand Down Expand Up @@ -518,6 +519,7 @@ Input:
- hsa [Ensembl: Homo_sapiens.GRCh38.dna.primary_assembly | Homo_sapiens.GRCh38.98]
- eco [Ensembl: Escherichia_coli_k_12.ASM80076v1.dna.toplevel | Escherichia_coli_k_12.ASM80076v1.45]
- mmu [Ensembl: Mus_musculus.GRCm38.dna.primary_assembly | Mus_musculus.GRCm38.99.gtf]
- ssc [Ensembl: Sus_scrofa.Sscrofa11.1.dna.toplevel | Sus_scrofa.Sscrofa11.1.111 ]
- mau [Ensembl: Mesocricetus_auratus.MesAur1.0.dna.toplevel | Mesocricetus_auratus.MesAur1.0.100]
--species Specifies the species identifier for downstream path analysis. (DEPRECATED)
If `--include_species` is set, reference genome and annotation are added and automatically downloaded. [default: ]
Expand Down
5 changes: 3 additions & 2 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@ if (params.nanopore) {
}


Set species = ['hsa', 'eco', 'mmu', 'mau']
Set autodownload = ['hsa', 'eco', 'mmu', 'mau']
Set species = ['hsa', 'eco', 'mmu', 'mau', 'ssc']
Set autodownload = ['hsa', 'eco', 'mmu', 'mau', 'ssc']
Set pathway = ['hsa', 'mmu', 'mau']

if ( params.profile ) { exit 1, "--profile is WRONG use -profile" }
Expand Down Expand Up @@ -924,6 +924,7 @@ def helpMSG() {
- hsa [Ensembl: Homo_sapiens.GRCh38.dna.primary_assembly | Homo_sapiens.GRCh38.98]
- eco [Ensembl: Escherichia_coli_k_12.ASM80076v1.dna.toplevel | Escherichia_coli_k_12.ASM80076v1.45]
- mmu [Ensembl: Mus_musculus.GRCm38.dna.primary_assembly | Mus_musculus.GRCm38.99.gtf]
- ssc [Ensembl: Sus_scrofa.Sscrofa11.1.dna.toplevel | Sus_scrofa.Sscrofa11.1.111 ]
- mau [Ensembl: Mesocricetus_auratus.MesAur1.0.dna.toplevel | Mesocricetus_auratus.MesAur1.0.100]${c_reset}
${c_dim}--species Specifies the species identifier for downstream path analysis. (DEPRECATED)
If `--include_species` is set, reference genome and annotation are added and automatically downloaded. [default: $params.species]
Expand Down
6 changes: 6 additions & 0 deletions modules/annotationGet.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ process annotationGet {
gunzip -f Mus_musculus.GRCm38.99.gtf.gz
mv Mus_musculus.GRCm38.99.gtf ${species}.gtf
"""
else if (species == 'ssc')
"""
wget ftp://ftp.ensembl.org/pub/release-111/gtf/sus_scrofa/Sus_scrofa.Sscrofa11.1.111.gtf.gz
gunzip -f Sus_scrofa.Sscrofa11.1.111.gtf.gz
mv Sus_scrofa.Sscrofa11.1.111.gtf.gz ${species}.gtf
"""
else if (species == 'eco')
"""
wget ftp://ftp.ensemblgenomes.org/pub/release-45/bacteria//gtf/bacteria_90_collection/escherichia_coli_k_12/Escherichia_coli_k_12.ASM80076v1.45.gtf.gz
Expand Down
10 changes: 10 additions & 0 deletions modules/referenceGet.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,16 @@ process referenceGet {
gunzip -f Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
mv Mus_musculus.GRCm38.dna.primary_assembly.fa ${species}.fa
"""
else if (species == 'ssc')
"""
# Primary assembly contains all toplevel sequence regions excluding haplotypes and patches.
# This file is best used for performing sequence similarity searches where patch and haplotype
# sequences would confuse analysis. If the primary assembly file is not present, that
# indicates that there are no haplotype/patch regions, and the 'toplevel' file is equivalent.
wget ftp://ftp.ensembl.org/pub/release-111/fasta/sus_scrofa/dna/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa.gz
gunzip -f Sus_scrofa.Sscrofa11.1.dna.toplevel.fa.gz
mv Sus_scrofa.Sscrofa11.1.dna.toplevel.fa ${species}.fa
"""
else if (species == 'eco')
"""
wget ftp://ftp.ensemblgenomes.org/pub/release-45/bacteria//fasta/bacteria_90_collection/escherichia_coli_k_12/dna/Escherichia_coli_k_12.ASM80076v1.dna.toplevel.fa.gz
Expand Down

0 comments on commit bfce097

Please sign in to comment.