-
Notifications
You must be signed in to change notification settings - Fork 166
infercnv tumor subclusters
By default, inferCNV operates at the level of whole samples, such as all cells defined as a certain cell type derived from a single patient. This is the fastest way to run inferCNV, but often not the optimal way, as a given tumor sample may have subpopulations with varied patterns of CNV.
By setting infercnv::run(analysis_mode='subclusters'), inferCNV will attempt to partition cells into groups having consistent patterns of CNV. CNV prediction (via HMM) would then be performed at the level of the subclusters rather than whole samples.
The view below shows differences obtained when performing HMM predictions at the level of whole samples as compared to subclusters.
TODO: show version w/ more cells, gives better resolution for subclusters.
The methods available for defining tumor subclusters will continue to be expanded. We've currently had best success with using hierarchical clustering based methods.
The parameters that impact the hierarchical clustering based tree partitioning include:
-
'infercnv::run(hclust_method='ward.D2') : the clustering method to use. All built-in R hclust methods are supported: ("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"). We find 'ward.D2' (default) to work best.
-
'infercnv::run(tumor_subcluster_partition_method='random_trees') : method used for partitioning the hierarchical clustering tree. Options include ('random_trees', 'qnorm'). These are described further below. Both methods rely on the 'infercnv::run(tumor_subcluster_pval=0.05)' setting for determining cut-points in the hierarchical tree.
This method was inspired by the SHC method. We utilize a non-parameteric method that involves comparing the hierarchical tree height to a null distribution of tree heights derived from trees involving randomly permuted genes. If the observed tree height is found to be statistically significant according to the 'infercnv::run(tumor_subcluster_pval=0.05)' setting, the tree is bifurcated. This procedure is then applied recursively to the split trees and splitting will continue to occur until a maximum recursion depth is reached, the clade under study has too few members, or the subtree height is found to not be significant under the corresponding null distribution.
An advantage of this method is that it will not partition a sample of cells if there's insufficient evidence for tumor heterogeneity. A disadvantage is that the method is relatively slow, given that it needs to perform 100 separate tree constructions at each tested bifurcation in order to generate a null distribution. However, parallelization is enabled and the 'infercnv::run(num_threads=4)' can be further increased to speed up the process.
This involves a parametric approach that cuts the hierarchical tree at the tree height quantile corresponding to the quantile of a normal distribution of the tree heights where the percentile = 'infercnv::run(tumor_subcluster_pval=0.05)'.
The advantage of this approach is that it is a fast approach for exploring groups of cells that may represent tumor heterogeneity instead of being restricted to running all cells through as a single sample. The disadvantage is that it will split the hierarchical tree even when there is no true statistical evidence for heterogeneity. It is really only a simple dynamic way of exploring potential heterogeneity during an inferCNV run.
TBD
- InferCNV Home
- Quick Start
- Installing inferCNV
- Running InferCNV
- Applying Noise Filters
- Predicting CNV via HMM
- Bayesian Mixture Model
- Tumor heterogeneity - define tumor subclusters
- Interpreting the Figure
- Inputs to InferCNV
- Outputs from InferCNV
- More inferCNV example data sets
- Using 10x data
- Interactively navigating data using the Next Generation Heatmap Viewer
- Extracting HMM features
- FAQ and common issues