MetaboFlow - the development of standardised workflows for processing metabolomics data to aid reproducible data sharing and big data initiatives

This pipeline describes the processing and the analysis of Ionomics data. This paper describes a possible application.

Section 0: Import the R package

The workflow is wrapped within the IonFlow R package, and it consists of four sections, respectively

Pre-processing
Exploratory analysis
Clustering which also includes the GO Slim annotation and the GO terms enrichment
Network analysis

devtools::install_github("AlinaPeluso/MetaboFlow", subdir="IonFlow")
library(IonFlow)

Section 1: Pre-processing

The pre-processing section is required first as it produces in output the cleaned dataset to be used in the other sections. There is no specific order on how to run the other sections.

The pre-processing section aims to free the data from unreliable samples which will probably lead to wrong outputs. In such way, effective data pre-processing methods are applied to avoid the effects of noisy and unreliable data.

This section requires as input the raw data frame, e.g. ion's concentrations. It is also possible to define a set of ion's standard deviation, as these are possibly computed accounting for some control genes. Note that the latter is an optional input i.e if not provided the standard deviations from the data would be computed to perform the data standardisation (see Section Standardisation).

Inspect the raw data

data(IonData)

We illustrate the Ionomics workflow with ICP-MS data of yeast intracellular ion concentrations measured for 1454 single-gene haploid knockouts (data from Danku, 2009). Ions measured include Ca44, Cd111, Co59, Cu65, Fe56, K39, Mg25, Mn55, Mo95, Na23, Ni60, P31, S34, Zn66. Values of concentration are in ppm and have been adjusted using optical density measurements. Intracellular concentrations are measured for two, four or eight replicas of each mutant depending on the knock-out. Knock-out YDL227C mutant is measured multiple times in every batch as control strain.

Run the pre-processing function

pre_proc <- PreProcessing(data=IonData,stdev=pre_defined_sd)

The concentration values for the raw data ion can be summarised as follow.

pre_proc$stats.raw_data

	Ion	Min	1st Quartile	Median	Mean	3rd Quartile	Max	Variance
1	Ca	0.449	31.73	40.44	45.071	51.015	902.568	829.525
2	Cd	0.174	0.866	0.988	1.002	1.121	2.512	0.051
3	Co	0.007	0.142	0.16	0.162	0.184	0.702	0.001
4	Cu	0.587	1.344	1.586	1.717	1.831	327.79	16.91
5	Fe	0.002	5.527	7.295	9.469	9.332	6624.526	5154.611
6	K	284.273	2060.619	2495.265	2492.765	2879.551	17777.452	534784.375
7	Mg	115.63	546.325	679.275	642.578	753.678	3838.479	31947.598
8	Mn	0.02	0.982	1.206	1.197	1.38	7.339	0.106
9	Mo	0.158	0.656	0.934	1.109	1.327	60.879	1.855
10	Na	0.184	128.831	185.25	196.545	247.747	892.968	9027.944
11	Ni	0.074	0.982	1.258	1.693	1.543	2323.058	565.618
12	P	1194.953	3833.9	4514.476	4289.109	4952.98	21695.748	1151197.05
13	S	20.592	434.61	512.845	529.493	605.436	5484.638	37212.137
14	Zn	7.659	14.785	16.549	17.114	18.334	2221.586	511.804

There is a very high variability of the knockouts across ions and within the batches. There are no missing values in the data.

Outlier detection

We define a lower outer fence: Q1 - 3*IQ and a upper outer fence: Q3 + 3*IQ where Q1 and Q3 are the first and the third quantile of the distribution, respectively. A point beyond the outer fence is considered an extreme outlier.

The outliers are split across ions as follows.

pre_proc$stats.outliers

	Ion	not_outlier	outlier	outlier(%)
1	Ca	9694	305	0.22
2	Cd	9950	49	0.04
3	Co	9966	33	0.02
4	Cu	9870	129	0.09
5	Fe	9833	166	0.12
6	K	9980	19	0.01
7	Mg	9991	8	0.01
8	Mn	9984	15	0.01
9	Mo	9909	90	0.06
10	Na	9965	34	0.02
11	Ni	9849	150	0.11
12	P	9991	8	0.01
13	S	9923	76	0.05
14	Zn	9888	111	0.08

Median batch correction

First we take the logarithmm of the concentration value. Then, the data are scaled to the median taken for each ion within each batch.

pre_proc$stats.median_batch_corrected_data

	Ion	Min	1st Quartile	Mean	3rd Quartile	Max	Variance
1	Ca	-4.311	-0.124	0.022	0.148	1.671	0.084
2	Cd	-1.749	-0.068	0.001	0.068	0.735	0.025
3	Co	-2.178	-0.059	-0.02	0.054	0.629	0.033
4	Cu	-0.681	-0.055	0.009	0.06	0.876	0.013
5	Fe	-7.332	-0.1	0.017	0.115	1.869	0.062
6	K	-1.96	-0.124	-0.044	0.1	0.938	0.068
7	Mg	-1.735	-0.061	-0.016	0.059	0.688	0.03
8	Mn	-3.71	-0.125	-0.023	0.101	0.856	0.052
9	Mo	-1.662	-0.163	-0.003	0.169	1.623	0.103
10	Na	-6.992	-0.26	-0.058	0.189	1.403	0.216
11	Ni	-2.422	-0.094	0.006	0.104	1.443	0.057
12	P	-1.059	-0.054	-0.009	0.052	0.571	0.017
13	S	-2.384	-0.086	-0.002	0.091	1.411	0.034
14	Zn	-0.416	-0.046	0.011	0.054	0.661	0.009

After outlier removal and the median batch correction of the logged concentrations (logConcentration_corr), the data looks as

pre_proc$plot.logConcentration_by_batch

Standardisation

After outlier removal and median batch correction we now standardise the ions' logged concentrations. For each set of knockouts across the ions's type, we normalise the concentrations by dividing for the ions' standard deviation. The ions' standard deviations can be estimated from the data, or a set of pre-defined ions' standard deviations can be used. The latter has been computed on the complete dataset (which includes also some gene controls). At the moment we do not use the pre-defined ion's concenrations to normalise our data.

The concentration values for each ion can be summarised as follow.

pre_proc$stats.standardised_data

	Ion	Min	1st Quartile	Median	Mean	3rd Quartile	Max	Variance
1	Ca	-4.135	-1.937	0.057	0.945	3.187	10.698	15.266
2	Cd	-2.894	-0.341	0.596	0.558	1.257	4.214	2.111
3	Co	-1.863	-0.383	0.167	0.568	1.371	4.942	2.629
4	Cu	-1.126	-0.643	-0.363	0.318	0.944	4.662	2.012
5	Fe	-1.789	-0.733	-0.48	-0.263	-0.006	2.043	0.724
6	K	-2.099	-0.734	0.005	0.285	0.818	5.242	2.307
7	Mg	-12.918	-1.082	0.608	0.236	2.425	9.631	12.37
8	Mn	-1.85	-0.377	0.93	0.917	2.06	4.133	2.39
9	Mo	-5.016	-0.672	1.354	1.522	3.409	11.099	10.04
10	Na	-4.029	-1.121	-0.156	-0.242	0.458	3.238	1.791
11	Ni	-1.476	-0.414	0.454	0.549	1.009	6.043	1.918
12	P	-3.697	-1.269	-0.115	-0.179	0.724	3.667	2.441
13	S	-1.345	-0.396	0.283	0.309	0.767	4.961	1.103
14	Zn	-1.969	-0.773	-0.476	-0.096	0.44	4.263	1.304

Symbolization

As we are working with the logConcentration_corr_norm we can consider a thresold based on a certain number of sigma (e.g. number-of-sigma thresold=3) to symbolizise the concentrations' profile of the knockouts as follow:

Symb=0 if -3<logConcentration_corr_norm<3
Symb=1 if logConcentration_corr_norm>=3
Symb=-1 otherwise

The choice of the thresold is arbitrary i.e. thresold greater than 3 can be chosen. The highest is the thresold, the highest is the concentration value taken as significant.

Aggregation of the knockout replicates

For each ion measure, we proceed to aggregate the data by taking the median value of the knockout. For each ions we consider around 1,450 genes. And of which we can plot the z-score with the associated sigma as follow.

pre_proc$plot.logConcentration_z_scores

Final datasets

Three dataset are obtained as output. The first in the long format (genes as rows and ions as columns), and two in wide format and respectively one with the standardised ion's concentraction, and the other with the symbolised profiles of the knockouts.

Long format (aggregated knockout replicates):

head(pre_proc$dataR.long)

row_id	Knockout	Batch_ID	id	Ion	Concentration	logConcentration	logConcentration_corr	logConcentration_corr_norm	Symb
3851	YAL002W	19	3851	Ca	93.4	4.54	0.96	6.3	1
3852	YAL002W	19	3852	Ca	98.2	4.59	1.01	6.7	1
3853	YAL002W	19	3853	Ca	101.6	4.62	1.04	6.9	1
13850	YAL002W	19	3851	Cd	1.5	0.42	0.5	8.8	1
13851	YAL002W	19	3852	Cd	1.6	0.45	0.53	9.2	1
13852	YAL002W	19	3853	Cd	1.6	0.5	0.58	10.1	1

Long format (not aggregated knockout replicates):

head(pre_proc$data.long)

Knockout	Ion	logConcentration_corr_norm	Symb
YAL002W	Ca	6.68	1
YAL004W	Ca	-0.88	0
YAL005C	Ca	-0.15	0
YAL007C	Ca	-0.54	0
YAL008W	Ca	-0.88	0
YAL009W	Ca	-0.95	0

Wide format, standardised ion's concentraction:

head(pre_proc$data.wide)

Knockout	Ca	Cd	Co	Cu	Fe	K	Mg	Mn	Mo	Na	Ni	P	S	Zn
YAL002W	6.68	9.64	0.296	0.65	-0.056	-3.56	-1.3357	5.37	-2.7	3.94	-3.73	0.94	-1.24	0.472
YAL004W	-0.88	0.9	0.882	-0.59	1.017	0.76	1.5433	-2.45	0.81	-0.13	0.94	0.83	0.52	0.073
YAL005C	-0.15	-0.13	0.278	0.94	-0.638	1.3	0.809	-1.33	-0.4	1.79	0.88	0.27	-0.13	-0.081
YAL007C	-0.54	-0.2	-0.649	-0.59	0.803	1.41	-0.6495	1.42	3.24	-0.35	-0.68	0.29	-0.58	-0.424
YAL008W	-0.88	-0.56	-0.416	-1.3	-0.545	0.32	-0.5299	0.34	1.9	0.13	-0.65	-0.34	-0.83	-0.666
YAL009W	-0.95	0.91	-0.016	-0.84	-1.134	0.18	0.0087	1.06	4.91	0.74	-0.84	0.11	0.17	-0.884

Wide format, symbolised profiles:

head(pre_proc$data.wide_Symb)

Knockout	Ca	Cd	K	Mn	Mo	Na	Ni
YAL002W	1	1	-1	1	0	1	-1
YAL004W	0	0	0	0	0	0	0
YAL005C	0	0	0	0	0	0	0
YAL007C	0	0	0	0	1	0	0
YAL008W	0	0	0	0	0	0	0
YAL009W	0	0	0	0	1	0	0

Section 2: Exploratory analysis

This section provide a way to summarize the main characteristics of the data with visual methods. No input needed as this section is built on the output of the previous section.

exp_anal <- ExploratoryAnalysis(data=pre_proc$data.wide)

Pearson correlation

exp_anal$plot.Pearson_correlation

PCA

The aim of PCA is to reduce the dimensionality of the data while retaining as much information as possible. This is achieved by projecting the data into a new lower-dimensional space defined by the principal components (PC) that combine in a linear way the original (possibly correlated) variables (e.g. ions) in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, it means that each gene is assigned a score on each new PC dimension, and this score is calculated by appling weight to a a linear combination of the original variables. In our case, the data are centred but not further scaled as were normalised in the pre-processing stage thus the variance is homogeneous across variables. The algorithm is able to handle missing values.

exp_anal$plot.PCA_Individual

The weights of each of the original variables are stored in the so-called loading vectors associated to each PC.

Loadings (first 10) for PC1:

head(exp_anal$stat.loadings_PC1,10)

Knockout	value.var
YAL019W	0.27
YAL026C	0.27
YAL031C	0.27
YAL034C	0.26
YAL037W	0.24
YAL040C	0.23
YAL042W	0.21
YAL045C	0.19
YAL046C	0.15
YAL048C	0.13

Loadings (first 10) for PC2:

head(exp_anal$stat.loadings_PC2,10)

Knockout	value.var
YAL002W	-0.21
YAL004W	-0.18
YAL005C	-0.16
YAL007C	-0.15
YAL008W	-0.15
YAL009W	-0.13
YAL010C	-0.12
YAL013W	0.12
YAL014C	-0.12
YAL015C	-0.12

Heatmap

We employ an heatmap as a graphical representation of data where the knockout values contained are represented as colors. A dendrogram is also added to the left side (clustering of genes knockout) and to the top (clustering of ions).

exp_anal$plot.heatmap

Pairwise correlation map

We employ a correlation map to visualise the paiwise correlation coefficients across ions.

exp_anal$plot.pairwise_correlation_map

Regularized partial correlation network

We now inspect the (statistical) relationships between ions in the form of graphical models. The graph is made of n nodes (ions) connected by m edges (knockout) and the relationships between ions is visualised as weighted edges.

We compute a network of partial correlation coefficients. Such networks can also be termed concentration graphs (Cox & Wermuth, 1994) or Gaussian graphical models (Lauritzen, 1996). Each link in the network represents a partial correlation coefficient between two variables after conditioning on all other variables in the dataset. These coefficients range from -1 to 1 and encode the remaining association between two nodes after controlling for all other information possible, also known as conditional independence associations.

The connections are visualized using red lines indicating negative partial correlations, green lines indicating positive partial correlations, and wider and more saturated connections indicate partial correlations that are far from zero.

Whenever the partial correlation is exactly zero, no connection is drawn between two nodes, indicating that two variables are independent after controlling for all other variables in the network. This is of particular interest since such a missing connection indicates one of the two variables could not have caused the other. As such, whenever there is a connection present, it highlights a potential causal pathway between two variables.

exp_anal$plot.regularized_partial_correlation_network

Section 3: Clustering

# Inspect data for GO Slim annotation 
data(data_GOslim)
head(data_GOslim)

# Inspect data for GO Terms for enrichment
data(ORF2KEGG)
head(ORF2KEGG)

gene_clust <- GeneClustering(data=pre_proc$data.wide, data_Symb=pre_proc$data.wide_Symb)

Clustering

We compute the manhattan distances between the knockouts' symbolised profile to cluster genes having relative distances equal to 0. We proceed to investigate the clusters which have at least 10 genes.

gene_clust$stats.clusters

cluster_id	nGenes
2	747
3	30
7	28
15	29
16	37
18	46
25	19
30	27
39	13
55	20
90	20

We then compute a profile plot for each cluster.

gene_clust$plot.profiles

Go Slim annotation

We now highlight the the biological process, the cellular component, and the molecular function of the genes within each cluster. We retain the annotations that map at least 5% of the genes in the cluster. We do not include in the results the first cluster (Cluster 1) as this the "null" cluster which contains a mix of knockouts having no impact on the ions. The first five entries of each cluster can be access as follows.

lapply(gene_clust$stats.Kegg_Goslim_annotation, function(x) head(x,5))

Term	Ontology	Count	Percent
## Cluster 2 (747 genes)
cellular bud	Cellular component	39	5.2
cytoplasm	Cellular component	474	63.5
DNA binding	Molecular function	40	5.4
endomembrane system	Cellular component	102	13.7
endoplasmic reticulum	Cellular component	67	9
## Cluster 3 (30 genes)
carbohydrate metabolic process	Biological process	3	10
chromatin organization	Biological process	4	13.3
cofactor metabolic process	Biological process	2	6.7
cytoplasm	Cellular component	21	70
cytoplasmic vesicle	Cellular component	3	10
## Cluster 7 (28 genes)
ATPase activity	Molecular function	2	7.1
cell wall organization or biogenesis	Biological process	2	7.1
cellular amino acid metabolic process	Biological process	2	7.1
cellular bud	Cellular component	2	7.1
cellular response to DNA damage stimulus	Biological process	3	10.7
## Cluster 15 (29 genes)
cellular respiration	Biological process	5	17.2
cofactor metabolic process	Biological process	2	6.9
cytoplasm	Cellular component	26	89.7
generation of precursor metabolites and energy	Biological process	5	17.2
hydrolase activity	Molecular function	5	17.2
## Cluster 16 (37 genes)
carbohydrate metabolic process	Biological process	2	5.4
cell cortex	Cellular component	2	5.4
cellular response to DNA damage stimulus	Biological process	5	13.5
chromatin organization	Biological process	2	5.4
chromosome	Cellular component	4	10.8
## Cluster 18 (46 genes)
cell cortex	Cellular component	4	8.7
cellular amino acid metabolic process	Biological process	4	8.7
cellular bud	Cellular component	4	8.7
cytoplasm	Cellular component	36	78.3
endomembrane system	Cellular component	9	19.6
## Cluster 25 (19 genes)
carbohydrate metabolic process	Biological process	2	10.5
carbohydrate transport	Biological process	1	5.3
cell budding	Biological process	1	5.3
cell cortex	Cellular component	1	5.3
cell wall	Cellular component	1	5.3
## Cluster 30 (27 genes)
chromosome	Cellular component	3	11.1
cytoplasm	Cellular component	14	51.9
DNA binding	Molecular function	2	7.4
endomembrane system	Cellular component	3	11.1
endoplasmic reticulum	Cellular component	2	7.4
## Cluster 39 (13 genes)
cellular amino acid metabolic process	Biological process	1	7.7
cytoplasm	Cellular component	9	69.2
endomembrane system	Cellular component	2	15.4
endoplasmic reticulum	Cellular component	1	7.7
Golgi apparatus	Cellular component	1	7.7
## Cluster 55 (20 genes)
cellular amino acid metabolic process	Biological process	3	15
cellular response to DNA damage stimulus	Biological process	2	10
chromosome	Cellular component	2	10
cytoplasm	Cellular component	17	85
DNA recombination	Biological process	2	10
## Cluster 90 (20 genes)
cytoplasm	Cellular component	16	80
endomembrane system	Cellular component	3	15
Golgi apparatus	Cellular component	2	10
hydrolase activity	Molecular function	2	10
ion transport	Biological process	2	10

Go terms enrichment

We perform the enrichment of the genes in the clusters by employing the all GO terms annotation in the SGD online database. The first five entries of each cluster can be access as follows.

lapply(gene_clust$stats.Goterms_enrichment, function(x) head(x,5))

GO_ID	Description	Pvalue	Count	CountUniverse
## Cluster 2 (747 genes)
GO:0030427	site of polarized growth	0.0066	49	75
GO:0005886	plasma membrane	0.0177	84	141
GO:0009277	fungal-type cell wall	0.021	23	33
GO:0030312	external encapsulating structure	0.0245	24	35
GO:0005933	cellular bud	0.0287	41	65
## Cluster 3 (30 genes)
GO:1903293	phosphatase complex	0.0084	2	7
GO:0031984	organelle subcompartment	0.0084	7	117
GO:0005789	endoplasmic reticulum membrane	0.0176	5	75
GO:0031965	nuclear membrane	0.027	2	13
GO:0005798	Golgi-associated vesicle	0.0378	2	15
## Cluster 15 (29 genes)
GO:0031975	envelope	3.6E-15	20	140
GO:0005740	mitochondrial envelope	2.25E-14	16	94
GO:0005743	mitochondrial inner membrane	8.44E-14	14	66
GO:0005739	mitochondrion	1.287E-13	24	294
GO:0031090	organelle membrane	1.595E-08	19	268
## Cluster 16 (37 genes)
GO:0031083	BLOC-1 complex	0.000066	3	4
GO:0005657	replication fork	0.009434	2	6
GO:0044445	cytosolic part	0.046995	4	53
## Cluster 18 (46 genes)
GO:0005789	endoplasmic reticulum membrane	0.0097	7	75
GO:0005934	cellular bud tip	0.0135	4	29
GO:0031984	organelle subcompartment	0.0335	8	117
## Cluster 30 (27 genes)
GO:0005887	integral component of plasma membrane	0.011	3	25
GO:0044459	plasma membrane part	0.041	3	40
## Cluster 39 (13 genes)
GO:0098588	bounding membrane of organelle	0.033	4	177
GO:0044446	intracellular organelle part	0.035	8	623
## Cluster 55 (20 genes)
GO:0005759	mitochondrial matrix	3E-10	10	53
GO:0005739	mitochondrion	5.4E-10	17	294
GO:0000313	organellar ribosome	4.1057E-07	6	24
GO:0043233	organelle lumen	2.64737E-05	11	209
GO:0005763	mitochondrial small ribosomal subunit	0.000196584	3	9
## Cluster 90 (20 genes)
GO:0019005	SCF ubiquitin ligase complex	0.0025	2	6
GO:0000151	ubiquitin ligase complex	0.0208	2	17
GO:0005739	mitochondrion	0.028	8	294
GO:0031967	organelle envelope	0.043	4	113
GO:0031966	mitochondrial membrane	0.0473	4	105

Section 4: Network analysis

gene_net <- GeneNetwork(data=pre_proc$data.wide, data_Symb=pre_proc$data.wide_Symb)

The aim of the following network analysis is to group genes with same symbolic profile.

First we compute the Manhattan distance between al gene pairs. This measure is then used in the linkage algorithm (method=single) and then the unique partition is found by cutting the hierarchical tree at zero-distances.

For this analysis we filter the clusters as we do not consider the largest cluster as it contains genes with no phenotype as well as few smaller clusters of less than 10 genes.

We compute the empirical correlation matrix between genes (method = "pearson", use = "pairwise.complete.obs"), then we subset the correlation matrix based on the cluster filtering. Moreover, as we are interested only in positive correlations among genes we filter the correlation matrix based on an arbitrary correlation's thresold of 0.6 such that the clustering can be interpreted in terms of posivite correlation between gene profiles.

Network plot

We then generate the network from the described correlation matrix. And finally we can visualise the corresponding network plot.

gene_net$plot.pnet

Impact and betweeness scores

We are now interested to highlight the most central genes. To do so we can consider two metrics i.e. the impact and the betweeness. From the empirical correlation matrix between genes we can compute the betweenness measure as the fraction of shortest paths that pass through each gene (node). Next a measure of impact can be computed as the L2 norm (Euclidean distance) of each gene. These two centrality measures can be then used togheter to cluster the genes as follow.

gene_net$plot.impact_betweenees

We can also access to the impact and betweeness value as follow (only first 10 value shown).

head(gene_net$stats.impact_betweeness,10)

Knockout	Impact	Betweenness	Position	Cluster
YAL007C	4.2	67	Low impact, low betweenness	Cluster 3 (30 genes): Mo(+)
YAL009W	5.6	575	Low impact, high betweenness	Cluster 3 (30 genes): Mo(+)
YAL015C	4.1	1263	Low impact, high betweenness	Cluster 3 (30 genes): Mo(+)
YAL017W	4.3	108	Low impact, low betweenness	Cluster 3 (30 genes): Mo(+)
YAL020C	9	326	Low impact, low betweenness	Cluster 7 (28 genes): Na(-)
YAL039C	12.7	789	High impact, high betweenness	Cluster 15 (29 genes): K(-), Mn(-), Mo(-)
YAL042W	5.3	88	Low impact, low betweenness	Cluster 16 (37 genes): Cd(+)
YAL044C	7.3	253	Low impact, low betweenness	Cluster 18 (46 genes): Na(+)
YAL048C	13.3	76	High impact, low betweenness	Cluster 15 (29 genes): K(-), Mn(-), Mo(-)
YAL062W	3.6	378	Low impact, low betweenness	Cluster 18 (46 genes): Na(+)

We can also associate each cluster to low or high values of impact and betwenees based on the highest number of genes in that cathegory.

gene_net$stats.impact_betweeness_by_cluster

Cluster	Position	nGenes
Cluster 15 (29 genes): K(-), Mn(-), Mo(-)	High impact, low betweenness	18
Cluster 16 (37 genes): Cd(+)	Low impact, low betweenness	29
Cluster 18 (46 genes): Na(+)	Low impact, low betweenness	34
Cluster 25 (19 genes): Cd(+), Na(-)	Low impact, low betweenness	15
Cluster 3 (30 genes): Mo(+)	Low impact, high betweenness	18
Cluster 30 (27 genes): Mn(+), Ni(+), Na(-)	High impact, low betweenness	20
Cluster 39 (13 genes): S(-)	Low impact, low betweenness	10
Cluster 55 (20 genes): K(-), Mn(-)	High impact, low betweenness	15
Cluster 7 (28 genes): Na(-)	Low impact, low betweenness	26
Cluster 90 (20 genes): Mo(-)	Low impact, high betweenness	10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MetaboFlow - the development of standardised workflows for processing metabolomics data to aid reproducible data sharing and big data initiatives

Section 0: Import the R package

Section 1: Pre-processing

Inspect the raw data

Run the pre-processing function

Outlier detection

Median batch correction

Standardisation

Symbolization

Aggregation of the knockout replicates

Final datasets

Section 2: Exploratory analysis

Pearson correlation

PCA

Heatmap

Pairwise correlation map

Regularized partial correlation network

Section 3: Clustering

Clustering

Go Slim annotation

Go terms enrichment

Section 4: Network analysis

Network plot

Impact and betweeness scores

Files

README.md

Latest commit

History

README.md

File metadata and controls

MetaboFlow - the development of standardised workflows for processing metabolomics data to aid reproducible data sharing and big data initiatives

Section 0: Import the R package

Section 1: Pre-processing

Inspect the raw data

Run the pre-processing function

Outlier detection

Median batch correction

Standardisation

Symbolization

Aggregation of the knockout replicates

Final datasets

Section 2: Exploratory analysis

Pearson correlation

PCA

Heatmap

Pairwise correlation map

Regularized partial correlation network

Section 3: Clustering

Clustering

Go Slim annotation

Go terms enrichment

Section 4: Network analysis

Network plot

Impact and betweeness scores