Documentation update: Running the rnaseq pipeline on prokaryotic samples #1084

MatthiasZepper · 2023-10-05T11:11:35Z

Description of feature

The documentation of the rnaseq pipeline that refers to running the pipeline on prokaryotic samples is unfortunately completely outdated. It specifies the required settings for featureCounts, even though that tool has been superseded by salmon for transcript quantification about 10 pipeline releases ago!

On Slack, Marine Cambon has already kindly written up what is needed to run the more recent versions of the pipeline successfully with prokaryotic RNA-seq. However, somebody needs to update the pipeline documentation accordingly. Since this requires only Markdown edits, I think it is a suitable task for the Hackathon?

For some general recommendations on how to write good technical documentation, see the website of the Diátaxis framework.

The text was updated successfully, but these errors were encountered:

d4straub · 2025-01-24T10:01:28Z

As of now (3.18.0) the documentation didnt change much.
Also, I couldn't run the pipeline following these suggestions recently. I was using data for E. coli strain BW25113 (4522 genes, 4419 CDS, 121 exon) of the NCBI RefSeq annotation from https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000750555.1/.

The following seems important:

only "exon" are measured, therefore all features of interest need to have an "exon"
there may not be any special signs or maybe even not space in column number 2 (and often is)
gene_id are neccessary for each "exon" entry

The simplest solution I found for now was using the annotation in gff format and convert & simplify with gffread. Additionally gene_id have to be added then. This can be achieved as below.

gffread genomic.gff --keep-exon-attrs -F -T --force-exons | sed '/gene_id/!s/transcript_id\([^;]*\)/&; \0/;/transcript_id.*transcript_id/s/transcript_id/gene_id/' > genomic_gffread_force-exons_geneid.gtf

A similar discussion of the problem can be found in slack https://nfcore.slack.com/archives/CE8SSJV3N/p1679577061847429?thread_ts=1677835193.447669&cid=CE8SSJV3N

This solution should be applicable broadly, but I am not sure, but if that is indeed helpful then it might make sense to add that info to the docs.

MatthiasZepper added first-timers-only Good for newcomers enhancement labels Oct 5, 2023

MatthiasZepper added this to Hackathon: November 2023 Oct 5, 2023

MatthiasZepper mentioned this issue Oct 5, 2023

Facilitate running the rnaseq pipeline on prokaryotic samples #1085

Open

JGawra assigned JGawra and unassigned JGawra Oct 16, 2023

MatthiasZepper removed this from Hackathon: November 2023 May 16, 2024

MatthiasZepper added this to Hackathon: May 2024 May 16, 2024

MatthiasZepper moved this to To do in Hackathon: May 2024 May 16, 2024

MatthiasZepper added this to Hackathon October 2024 Oct 21, 2024

MatthiasZepper moved this to Todo in Hackathon October 2024 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation update: Running the rnaseq pipeline on prokaryotic samples #1084

Documentation update: Running the rnaseq pipeline on prokaryotic samples #1084

MatthiasZepper commented Oct 5, 2023

d4straub commented Jan 24, 2025 •

edited

Loading

Documentation update: Running the rnaseq pipeline on prokaryotic samples #1084

Documentation update: Running the rnaseq pipeline on prokaryotic samples #1084

Comments

MatthiasZepper commented Oct 5, 2023

Description of feature

d4straub commented Jan 24, 2025 • edited Loading

d4straub commented Jan 24, 2025 •

edited

Loading