-
Notifications
You must be signed in to change notification settings - Fork 1
/
My Library.ris
128 lines (123 loc) · 11.2 KB
/
My Library.ris
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
TY - JOUR
TI - Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
AU - McMurdie, Paul J.
AU - Holmes, Susan
T2 - PLOS Computational Biology
AB - Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
DA - 2014/04/03/
PY - 2014
DO - 10.1371/journal.pcbi.1003531
DP - PLoS Journals
VL - 10
IS - 4
SP - e1003531
J2 - PLOS Computational Biology
LA - en
SN - 1553-7358
ST - Waste Not, Want Not
UR - https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531
Y2 - 2023/09/12/00:16:04
L1 - https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003531&type=printable
KW - Binomials
KW - DNA sequencing
KW - Experimental design
KW - Microbiome
KW - RNA sequencing
KW - Simulation and modeling
KW - Source code
KW - Statistical data
ER -
TY - JOUR
TI - Microbiome Datasets Are Compositional: And This Is Not Optional
AU - Gloor, Gregory B.
AU - Macklaim, Jean M.
AU - Pawlowsky-Glahn, Vera
AU - Egozcue, Juan J.
T2 - Frontiers in Microbiology
AB - Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they have an arbitrary total imposed by the instrument. However, many investigators are either unaware of this or assume specific properties of the compositional data. The purpose of this review is to alert investigators to the dangers inherent in ignoring the compositional nature of the data, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. We briefly introduce compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give guidance and point to resources and examples for the analysis of microbiome datasets using compositional data analysis.
DA - 2017///
PY - 2017
DP - Frontiers
VL - 8
SN - 1664-302X
ST - Microbiome Datasets Are Compositional
UR - https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224
Y2 - 2023/09/14/15:25:43
L1 - https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/pdf?isPublishedV2=False
ER -
TY - JOUR
TI - Consistent and correctable bias in metagenomic sequencing experiments
AU - McLaren, Michael R
AU - Willis, Amy D
AU - Callahan, Benjamin J
T2 - eLife
A2 - Turnbaugh, Peter
A2 - Garrett, Wendy S
A2 - Turnbaugh, Peter
A2 - Quince, Christopher
A2 - Gibbons, Sean
AB - Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.
DA - 2019/09/10/
PY - 2019
DO - 10.7554/eLife.46923
DP - eLife
VL - 8
SP - e46923
SN - 2050-084X
UR - https://doi.org/10.7554/eLife.46923
Y2 - 2023/09/14/15:28:15
L1 - https://europepmc.org/articles/pmc6739870?pdf=render
KW - 16S rRNA gene
KW - bias
KW - calibration
KW - metagenomics
KW - microbiome
KW - reproducibility
ER -
TY - JOUR
TI - Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
AU - Greenacre, Michael
AU - Martínez-Álvaro, Marina
AU - Blasco, Agustín
T2 - Frontiers in Microbiology
AB - Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.
DA - 2021///
PY - 2021
DP - Frontiers
VL - 12
SN - 1664-302X
ST - Compositional Data Analysis of Microbiome and Any-Omics Datasets
UR - https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398
Y2 - 2023/09/14/15:28:55
L1 - https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/pdf?isPublishedV2=False
ER -
TY - JOUR
TI - Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
AU - Holmes, Ian
AU - Harris, Keith
AU - Quince, Christopher
T2 - PLOS ONE
AB - We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.
DA - 2012/02/03/
PY - 2012
DO - 10.1371/journal.pone.0030126
DP - PLoS Journals
VL - 7
IS - 2
SP - e30126
J2 - PLOS ONE
LA - en
SN - 1932-6203
ST - Dirichlet Multinomial Mixtures
UR - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030126
Y2 - 2023/09/14/15:29:28
L1 - https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0030126&type=printable
KW - Community structure
KW - Crohn's disease
KW - Inflammatory bowel disease
KW - Machine learning algorithms
KW - Metagenomics
KW - Obesity
KW - Probability distribution
KW - Ulcerative colitis
ER -