Skip to content

Commit

Permalink
modified preamble section.
Browse files Browse the repository at this point in the history
  • Loading branch information
Seo Kim committed Jan 17, 2025
1 parent 53ae3e4 commit d97af95
Showing 1 changed file with 23 additions and 18 deletions.
41 changes: 23 additions & 18 deletions jupyter-book/preamble.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,53 +12,53 @@ Interested in contributing? Please have a look at https://github.com/theislab/si

## Introduction

The human body is a complex machine that heavily relies on the basic units of life - cells. Cells can be separated into different types, which even undergo transitions during development, under disease or when regenerating. This cellular heterogeneity is reflected in their morphology, function, and gene expression profiles. Strong disruptions causing deregulations of the cell types influence the entire system causing potentially even serious diseases like cancer{cite}`Macaulay2017`. It is therefore vital to understand how cells behave in a normal state and under perturbations to improve our understanding of the entire cellular systems.
The human body is a complex machine that heavily relies on the basic units of life - cells. These cells exhibit remarkable diversity, varying in type and function, and can undergo significant transitions during development, in response to disease, or during regenerative processes. This cellular heterogeneity manifests in their structure, function, and gene expression profiles. Disruptions to this delicate balance can lead to systemic dysregulation, contributing to severe conditions such as cancer{cite}`Macaulay2017`. Therefore, understanding how cells behave under normal and perturbated states is critical for improving our knowledge of the entire cellular systems.

This monumental task is approached in different ways of which the most promising one is to profile cells at the individual level. So far, each cells' transcriptome was primarily examined in a process known as single-cell RNA sequencing. With recent advances in single-cell genomics, it is now possible to enrich the transcriptome information with spatial, chromatin accessibility or protein information. These advances generate not only insight into complex regulatory mechanisms, but also go along with additional complexity for data analysts.
To address this challenge, researchers employ a variety of strategies, with one of the most promising being profiling cells at the individual level. Traditionally, each cells' transcriptome was primarily examined in a process known as single-cell RNA sequencing. However, recent advancements in single-cell genomics now enable the integration of transcriptome data with spatial, chromatin accessibility, or protein-level information. These developments not only enhance our understanding of complex regulatory mechanisms but also introduce additional challenges in data analysis.

Nowadays, data analysts are facing a vast analysis tool landscape with a collection of more than 1000 computational single-cell analysis methods. It is becoming increasingly challenging to navigate this large range of different tools to generate sound results which are at the forefront of science.
Currently, analysts are faced with an overwhelming array of computational tools—over 1,000 methods dedicated to single-cell analysis alone. Navigating this extensive landscape to produce reliable, cutting-edge results poses a significant challenge.

## What this book covers

The goal of this book is to teach newcomers and advanced professionals alike, the **best practices** of single-cell sequencing analysis.
This book will teach you the most common analysis steps ranging from preprocessing to visualization to statistical evaluation and beyond. A read through the entire book will enable you to analyze unimodal and multimodal single-cell sequencing data on your own. The guidelines and recommendations in this book are not only tailored to teach you how to do single-cell analysis in general, but how to do them right. We base our suggestions on external benchmarks and reviews whenever possible. Finally, we consider this book to be a living resource for single-cell data analysts which can easily be updated when the recommendations change.
This book aims to guide both beginners and experienced professionals in **best practices** of single-cell sequencing analysis. It provides a comprehensive overview of essential analytical steps, from preprocessing to visualization and statistical evaluation and beyond. By following this book, readers will acquire the skills to independently analyze unimodal and multimodal single-cell sequencing data.

The recommendations presented are grounded in external benchmarks and reviews wherever possible, ensuring the approaches taught are both effective and reliable. Additionally, this book is designed to serve as a dynamic resource, evolving with the field to reflect new insights and updated best practices.

## What this book does not cover

This book does not cover fundamental basics of biology or computer science, including programming. Furthermore, this book does not function as a complete collection of all analysis tools designed for a specific tasks. We especially highlight externally verified tools, which work best for the data at hand or methods which proved to be community-verified best practices. Whenever this is not possible, we only recommend workflows based on our extensive experience.
This book does not cover fundamental concepts of biology or computer science, including basic programming skills. Nor does it serve as an exhaustive catalog of all available tools for specific tasks. Instead, it emphasizes well-validated methods that have been externally benchmarked or recognized as community standards. When such external validation is unavailable, our recommendations are only based on our extensive practical experience.

## Structure of the book

Each chapter in this book corresponds to a different stage of a typical single-cell data analysis project. Generally, an analysis workflow would follow the order of the chapters with some flexibility concerning downstream analysis objectives. All of our chapters feature extensive lists of references, and we encourage readers to consult the primary sources for our statements. Our summaries cannot always capture the full reasoning for our recommendations, although we try to provide the required background whenever possible.
Each chapter corresponds to a distinct phase of a typical single-cell data analysis project. While an analysis workflow should generally follow the order of the chapters, flexibility is encouraged depending on specific downstream analysis objectives. Each chapter is supplemented with extensive references, and readers are encouraged to consult these primary sources for a deeper understanding. Although we strive to provide comprehensive background information, our summaries may not capture the full scope of reasoning behind every recommendation.

## Prerequisites

Bioinformatics is a challenging research field for newcomers as it requires knowledge in both biology and computer science. Single-cell is even more demanding as it combines many subfields and datasets are often large. This book cannot cover all prerequisites for computational single-cell analysis, we therefore recommend a coarse overview of various topics below. The following links might increase your learning experience throughout the book:
Bioinformatics is inherently multidisciplinary, requiring knowledge of both biology and computer science. Single-cell analysis is particularly demanding, as it integrates multiple subfields and often involves large datasets. While this book cannot cover all the necessary foundational knowledge for computational single-cell analysis, we recommend the following resources to enhance your learning experience:

- Basic Python programming. You should be familiar with control flow (loops, conditional statements, ...), basic data structures (lists, dictionaries, sets) and core functionality of the most used libraries such as Pandas and Numpy. If you are new to programming and Python we can highly recommend the free [Automate the boring stuff with Python](https://automatetheboringstuff.com/) book.
- **Basic Python programming**: Familiarity with control flow (e.g., loops, conditional statements), basic data structures (e.g., lists, dictionaries, sets), and key libraries such as Pandas and Numpy is essential. Newcomers can benefit from the free book [Automate the boring stuff with Python](https://automatetheboringstuff.com/).

- Basics of the AnnData and scanpy packages are beneficial, but not absolutely required. This book covers AnnData in sufficient detail to follow along and introduces the workflow of working with scanpy. However, we are not able to introduce all of scanpy's functionalities in the course of this book. If you are new to scanpy we strongly suggest to work through the [scanpy tutorials](https://scanpy.readthedocs.io/en/stable/tutorials.html) with the occasional glance to the [scanpy API reference](https://scanpy.readthedocs.io/en/stable/api.html).
- **AnnData and Scanpy**: While prior experience with these tools is helpful, it is not strictly required. This book provides a detailed introduction to AnnData and outlines the workflow for working with Scanpy. However, it does not cover the full range of Scanpy’s functionalities. To deepen your understanding, we recommend exploring the [scanpy tutorials](https://scanpy.readthedocs.io/en/stable/tutorials.html) and referring to the [scanpy API reference](https://scanpy.readthedocs.io/en/stable/api.html) as needed.

- If you are interested in multimodal data analysis, the basics of muon and MuData are recommended. This book covers MuData in greater detail, but only briefly introduces muon analogously to AnnData and scanpy. The excellent [muon tutorials](https://muon-tutorials.readthedocs.io/en/latest/) serve as a great introduction to multimodal data analysis with muon.
- **Multimodal Data Analysis**: For readers who have interest in multimodal data analysis, understanding tools like muon and MuData is beneficial. The [muon tutorials](https://muon-tutorials.readthedocs.io/en/latest/) provide a solid introduction to this field.

- Basic R programming. Familiarity with control flow and basic data structures suffices. If you are new to programming and R we recommend the free [R for data science](https://r4ds.had.co.nz/) book.
- **Basic R Programming**: Knowledge of control flow and basic data structures suffices. New learners can refer to [R for data science](https://r4ds.had.co.nz/) for a comprehensive introduction.

- Basics of biology. While we roughly introduce the generation of the data, we will not cover the fundamentals of {term}`DNA`, RNA and proteins. If you are completely new to molecular biology in general, it might be advisable to work through _Molecular Biology of the Cell_ by Bruce Alberts et al.
- **Basic Biology**: While this book offers a rough overview of data generation, it does not cover fundamental topics like {term}`DNA`, RNA, and proteins. _Molecular Biology of the Cell_ by Bruce Alberts et al. is a recommended resource for those new to molecular biology.

## Peer-review

Although most of our chapters have been reviewed by multiple authors and the editors of this book and external experts, this book has not been officially peer-reviewed. We therefore strongly encourage readers to review our chapters and to kindly provide constructive feedback. We are more than happy to adapt, improve or overhaul the content if necessary.
Although the content has been reviewed by multiple authors, editors, and external experts, this book has not undergone formal peer review. We encourage readers to provide constructive feedback to help refine and improve the material.

## Citation

If you found our content helpful for your research article please cite it as:
If you found our content helpful for your research, please cite it as:

> Heumos, L., Schaar, A.C., Lance, C. et al. Best practices for single-cell analysis across modalities. Nat Rev Genet (2023). https://doi.org/10.1038/s41576-023-00586-w
## Contributing

We would like to invite the community to further improve the tutorial and the teaching material.
Please read [contributing](https://github.com/theislab/single-cell-best-practices/blob/development/CONTRIBUTING.md) for further instructions.
We invite the community to contribute to the ongoing improvement of this tutorial and teaching material. Please read [contributing](https://github.com/theislab/single-cell-best-practices/blob/development/CONTRIBUTING.md) for further instructions.

In case of questions or problems, please get in touch by posting an issue in this repository.

Expand All @@ -68,7 +68,12 @@ PDF versions of this book are available on our [releases page](https://github.co

## Contact us

You can report issues and requests in our [issue tracker](https://github.com/theislab/single-cell-best-practices/issues). For speaking engagements or collaborations, send an email at anna DOT schaar AT helmholtz-munich DOT de or lukas DOT heumos AT helmholtz-munich DOT de.
You can report issues and requests in our [issue tracker](https://github.com/theislab/single-cell-best-practices/issues).

For inquiries, speaking engagements, or collaboration opportunities, please email:

- Anna Schaar: anna DOT schaar AT helmholtz-munich DOT de
- Lukas Heumos: lukas DOT heumos AT helmholtz-munich DOT de.

## License

Expand Down

0 comments on commit d97af95

Please sign in to comment.