Skip to content

Commit

Permalink
Minor LaTeX fixes (#311)
Browse files Browse the repository at this point in the history
  • Loading branch information
cicilapetitesorciere authored Jan 10, 2025
1 parent d133d58 commit 430059b
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions jupyter-book/preprocessing_visualization/normalization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"Up to this point, we removed low-quality cells, ambient RNA contamination and doublets from the dataset and the data is available as a count matrix in the form of a numeric matrix of shape cells x genes. These counts represent the capture, reverse transcription and sequencing of a molecule in the scRNA-seq experiment. Each of these steps adds a degree of variability to the measured count depth for identical cells, so the difference in gene expression between cells in the count data might simply be due to sampling effects. This means that the dataset and therefore the count matrix still contains widely varying variance terms. Analyzing the dataset is often challenging as many statistical methods assume data with uniform variance structure. \n",
"\n",
"```{admonition} Gamma-Poisson distribution\n",
"A theoretically and empirically established model for UMI data is the Gamma-Poisson distribution which implies a quadratic mean-variance relation with $Var[Y] = \\mu + \\alpha \\mu^2$ with mean $\\mu$ and overdispersion $\\alpha$. For $\\alpha=0$ this is the Poisson distribution and $\\alpha$ describes the additional variance on top of the Poisson. \n",
"A theoretically and empirically established model for UMI data is the Gamma-Poisson distribution which implies a quadratic mean-variance relation with $\\operatorname{Var}[Y] = \\mu + \\alpha \\mu^2$ with mean $\\mu$ and overdispersion $\\alpha$. For $\\alpha=0$ this is the Poisson distribution and $\\alpha$ describes the additional variance on top of the Poisson. \n",
"```\n",
"\n",
"The preprocessing step of \"normalization\" aims to adjust the raw counts in the dataset for variable sampling effects by scaling the observable variance to a specified range. Several normalization techniques are used in practice varying in complexity. They are mostly designed in such a way that subsequent analysis tasks and their underlying statistical methods are applicable. \n",
Expand Down Expand Up @@ -129,7 +129,7 @@
"\n",
"The shifted logarithm tackles this by \n",
"\n",
"$$f(y) = \\log(\\frac{y}{s}+y_0)$$ \n",
"$$f(y) = \\log\\left(\\frac{y}{s}+y_0\\right)$$ \n",
"\n",
"with $y$ being the raw counts, $s$ being a so-called size factor and $y_0$ describing a pseudo-count. The size factors are determined for each cell to account for variations in sampling effects and different cell sizes. The size factor for a cell $c$ can be calculated by \n",
"\n",
Expand Down

0 comments on commit 430059b

Please sign in to comment.