docs: updated info on DMP and folder structure

brickmanlab · Jul 12, 2023 · 071a967 · 071a967
1 parent d76f71b
commit 071a967
Show file tree

Hide file tree

Showing 2 changed files with 36 additions and 26 deletions.
diff --git a/docs/assets/file_naming_convention.tsv b/docs/assets/file_naming_convention.tsv
@@ -11,8 +11,8 @@ Multiqc report	QC aggregated report	<assayID\>_YYYYMMDD.multiqc	multiqc	RNA_2020
 Count matrix	final count matrix	<assayID\>_cm_aligner_YYYYMMDD.tsv	tsv	RNA_cm_salmon_20200101.tsv
 DEA	differential expression analysis results	DEA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv	tsv	DEA_treat-untreat_LFC1_p01_20200101.tsv
 DBA	differential binding analysis results	DBA_<condition1-condition2\>_LFC<absolute_threshold\>_p<pvalue decimals\>_YYYYMMDD.tsv	tsv	DBA_treat-untreat_LFC1_p01_20200101.tsv
-MAplot	MA plot	MAplot_<condition1-condition2\>_YYYYMMDD.jpeg	jpeg	MAplot_treat-untreat_20200101.tsv
-Heatmap plot	Heatmap plot of anything	heatmap_<type\>_YYYYMMDD.jpeg	jpeg	Heatmap_sample-cor_20200101.tsv
-Volcano plot	Volcano plot	volcano_<condition1-condition2\>_YYYYMMDD.jpeg	jpeg	volcano_treat-untreat_20200101.tsv
+MAplot	MA plot	MAplot_<condition1-condition2\>_YYYYMMDD.jpeg	jpeg	MAplot_treat-untreat_20200101.jpeg
+Heatmap plot	Heatmap plot of anything	heatmap_<type\>_YYYYMMDD.jpeg	jpeg	Heatmap_sampleCor_20200101.jpeg
+Volcano plot	Volcano plot	volcano_<condition1-condition2\>_YYYYMMDD.jpeg	jpeg	volcano_treat-untreat_20200101.jpeg
 Venn diagram	Venn diagram	venn_<type\>_YYYYMMDD.jpeg	jpeg	venn_consensus_20200101.jpeg
 Enrichment table	Enrichment results		tsv
diff --git a/docs/rdm-guidelines.md b/docs/rdm-guidelines.md
@@ -1,16 +1,15 @@
-# Research Data Management Guidelines
+# Research Data Management Guidelines for NGS
 
 This section provides guidelines for effective research data management within our lab. By adopting these guidelines, we aim to improve data organization and naming conventions, leading to enhanced data governance and research efficiency. The guidelines include the following steps:
 
 1. Adhere to folder structure and naming conventions for `Assays` and `Projects` folders.
 2. Add relevant metadata to a `metadata.yml` in each folder
-3. Create a database from metadata files in `Assays` and `Projects` folders.
-4. Visualize database with a [Panel python app](https://panel.holoviz.org/).
-5. `Projects` folders will be version controlled with Github and the [Brickman organization](https://github.com/brickmanlab).
-6. `Projects` reports will be displayed under the [Brickman organization GitHub Pages](https://gbrickmanlab.github.io).
-7. `Projects` will be syncronized and archived in [Zenodo](https://zenodo.org/), which will give a DOI that can be used in a publication.
-8. NGS `Assays` folder will be uploaded to GEO, with the information provided in the metadata file.
-9. Create a Data Management Plan template that it is prefilled with repetitive information using [DMPonline](https://dmponline.deic.dk/)
+3. Create a database from metadata files in `Assays` and `Projects` folders and browse it with a [Panel python app](https://panel.holoviz.org/).
+4. `Projects` folders will be version controlled with Github and the [Brickman organization](https://github.com/brickmanlab).
+5. `Projects` reports will be displayed under the [Brickman organization GitHub Pages](https://gbrickmanlab.github.io).
+6. `Projects` will be syncronized and archived in [Zenodo](https://zenodo.org/), which will give a DOI that can be used in a publication.
+7. NGS `Assays` folder will be uploaded to GEO, with the information provided in the metadata file.
+8. Create a Data Management Plan template that it is prefilled with repetitive information using [DMPonline](https://dmponline.deic.dk/)
 
 ## 1. Folder structure and organization
 
@@ -87,7 +86,7 @@ The project folder should be named after a unique identifier, such as:
 <Project-ID>_YYYYMMDD
 ```
 
-`<Project-ID>` should be the surname of the owner of the project folder and the publication year, e.g. `JARH_etal_20230101`.
+`<Project-ID>` should be the initials of the owner of the project folder and the publication year, e.g. `JARH_et_al_20230101`.
 
 #### **Folder structure**
 
@@ -98,23 +97,33 @@ The project folder should be named after a unique identifier, such as:
 │  ├── external
 │  └── processed
 ├── documents
+│  └── Non-sensitive_NGS_research_project_template.docx
 ├── notebooks
+│  └── 01_data_analysis.rmd
 ├── README.md
 ├── reports
-│  └── figures
+│  ├── figures
+│  │  └── 01_data_analysis
+│  └── 01_data_analysis.html
 ├── requirements.txt
 ├── results
-└── scripts
+│  └── 01_data_analysis/
+├── scripts
+├── description.yml
+└── metadata.yml
 ```
 
 - **data**: folder that contains symlinks or shortcuts to where the data is, avoiding copying and modification of original files.
-- **documents**: folder containing word documents, slides or pdfs related to the project, such as explanations of the data or project, papers, etc.
+- **documents**: folder containing word documents, slides or pdfs related to the project, such as explanations of the data or project, papers, etc. It also contains your [Data Management Plan](#8-create-a-data-management-plan).
+    - *Non-sensitive_NGS_research_project_template.docx*. This is a prefilled Data Management Plan based on the Horizon Europe guidelines.
 - **notebooks**: folder containing Jupyter, R markdown or Quarto notebooks with the actual data analysis. Using annotated notebooks is ideal for reproducibility and readability purposes. Notebooks should be labeled numerically in order they were created e.g. `00_preprocessing`
 - **README.md**: detailed description of the project in markdown format.
 - **reports**: notebooks rendered as html/docx/pdf versions, ideal for sharing with colleagues and also as a formal report of the data analysis procedure.
-- *figures*: figures produced upon rendering notebooks. The figures will be saved under a subfolder named after the notebook that created them. This is for provenance purposes so we know which notebook created which figures.
+    - *figures*: figures produced upon rendering notebooks. The figures will be saved under a subfolder named after the notebook that created them. This is for provenance purposes so we know which notebook created which figures.
 - **results**: results from the data analysis, such as tables with differentially expressed genes, enrichment results, etc. These results should be saved under a subfolder named after the notebook that created them. This is for provenance purposes so we know which notebook created which results.
 - **scripts**: folder containing helper scripts needed to run data analysis or reproduce the work of the folder
+- **description.yml**: short description of the project.
+- **metadata.yml**: metadata file for the assay describing different keys ([see below](#22-project-metadata-fields)).
 
 <!-- **Note**: maybe we can make an environment folder or put inside the scripts folder a Dockerfile that should give you an environment where you can reproduce the results of the folder? -->
 
@@ -131,6 +140,8 @@ After project is done and published, it will be moved to `NGS_data`.
 ### 1.5 General naming conventions and more info
 
 - date format: `YYYYMMDD`
+- authors: initials
+- file and folder names: **No use of spaces**. Field sections are separated by underscores `_`. Words in each section are written in camelCase. For example: `field1_word1Word2.txt`.
 
 [Transcriptomics metadata standards and fields](https://faircookbook.elixir-europe.org/content/recipes/interoperability/transcriptomics-metadata.html#analysis-metadata)
 
@@ -155,13 +166,11 @@ In development.
 
 {{ read_table('./assets/project_metadata.tsv') }}
 
-## 3. Data management catalogue
+## 3. Data catalogue and browser
 
 @SLundregan is in the process of building a prototype for `Assay`, using the metadata contained in all `description.yml` and `metadata.yml` files in the assay folder.
 This will be in the form of an SQLite database that that is easily updatable by running a helper script.
 
-## 4. Database browser
-
 @SLundregan is also working on a browsable database using [Panel python app](https://panel.holoviz.org/).
 The app will display the latest version of the SQLite database. Clicking on an item from the database
 will open a tab containing all available metadata for the assay.
@@ -171,29 +180,29 @@ making it easy to fill up the info for the metadata and GEO submission ([see bel
 
 In the future, you could ideally visualize an analysed single cell RNAseq dataset by opening [Cirrocumulus](https://cirrocumulus.readthedocs.io/en/latest/) session.
 
-## 5. `Projects` version control
+## 4. `Projects` version control
 
 All projects should be version controlled using GitHub under the [Brickman organization](https://github.com/brickmanlab/). After creating a cookiecutter template, initiate a git repository on the folder. The Git repository can stay private until it is ready for publication.
 
-## 6. `Projects` GitHub pages
+## 5. `Projects` GitHub pages
 
 Using GitHub pages, it is possible to display your data analyses (or anything related to the project) inside the `Projects` folder so that they are open to the public in a html format.
 This is great for transparency and reproducibility purposes. This can be done after the paper has been made public (it is not possible to do with a private repository without paying).
 
 **Info on how this is done should be put here**
 
-## 7. `Project` archiving in Zenodo
+## 6. `Project` archiving in Zenodo
 
 Before submitting, link the repository to [Zenodo](https://zenodo.org/) and then create a Git release. This release will be caught by Zenodo and will give you a DOI that you can submit along the manuscript.
 
-## 8. Data upload to GEO
+## 7. Data upload to GEO
 
 The raw data from NGS experiments will be uploaded to the [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo/).
 Whenever a new Assay folder is created, the data owner must fill up the required documentation and information needed to make the GEO submission as smooth as possible.
 
 <!-- **Note**: Probably we should start using Annotare  https://www.ebi.ac.uk/fg/annotare/login/ -->
 
-## 9. Create a Data Management Plan
+## 8. Create a Data Management Plan
 
 !!! quote "From the University of Copenhagen RDM team"
     A Data Management Plan (DMP) is a planning tool that helps researchers to establish good practices for working with physical material and data in a research project. A DMP covers all relevant aspects of research data management throughout the project.
@@ -203,5 +212,6 @@ Whenever a new Assay folder is created, the data owner must fill up the required
     - comply with relevant legislation, policies, and funder requirements.
     - document agreements related to the collection, usage, and dissemination of research data between project partners or between student and supervisor.
 
-We are currently working on a DMP template that it is prefilled with repetitive information using [DMPonline](https://dmponline.deic.dk/) and the Horizon Europe guidelines.
-This template will contain all the necessary information regarding common practices that we will use, the repositories we use for NGS, etc.
+We are have written a DMP template that it is prefilled with repetitive information using [DMPonline](https://dmponline.deic.dk/) and the Horizon Europe guidelines. This template contains all the necessary information regarding common practices that we will use, the repositories we use for NGS, etc. The template is part of the `project` folder template, under `documents`. You can check the file [here](https://github.com/brickmanlab/ngs-template/blob/master/project/%7B%7B%20cookiecutter.project%20%7D%7D/documents/Non-sensitive_NGS_research_project_template.docx).
+
+The Horizon Europe template is mostly focused on digital data and so, it is maybe not the best option regarding the needs of the Brickman Lab, due to the fact that it is mostly a wet lab with some bioinformatics. We will start working on another DMP based on the KU template, which is designed for both physical and digital data.