Skip to content
GeorgescuC edited this page Jul 9, 2018 · 68 revisions

InferCNV: Inferring copy number variation from tumor single cell RNA-Seq data

InferCNV is used to explore tumor single cell RNA-Seq data to identify evidence for large-scale chromosomal copy number variations, such as gains or deletions of entire chromosomes or large segments of chromosomes. This is done by exploring expression intensity of genes across positions of the genome in comparison to the average or a set of reference 'normal' cells. A heatmap is generated illustrating the relative expression intensities across each chromosome, and it becomes readily apparent as to which regions of the genome are over-abundant or less-abundant as compared to normal cells (or the average, if reference normal cells are not provided).

InferCNV is one component of the TrinityCTAT toolkit focused on leveraging the use of RNA-Seq to better understand cancer transcriptomes. To find out more about Trinity CTAT please visit TrinityCTAT.

Quick Start

If you do not already have GMD installed, as it was archived in CRAN, you can install it by running the following in R:

packageurl <- "https://cran.r-project.org/src/contrib/Archive/GMD/GMD_0.3.3.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

If installing using command line, download the latest release of InferCNV. Then use the following command on command line.

R CMD install infercnv.tar.gz

If installing from directly within R, you can instead use the following command from within R.

library("devtools")
install_github("broadinstitute/inferCNV")

and then move into the inferCNV folder and call the following function:

cd inferCNV   

./scripts/inferCNV.R \
  --ref_groups "1:50,51:95" \
  --cutoff 1 \
  --noise_filter 0.2 \
  --output_dir quickstart \
  --ref example/normal_cells.csv \
  --vis_bound_threshold " -1,1" \
  example/oligodendroglioma.tp100k.expr.matrix \
  example/gencode_v19_gene_pos.txt 

This will run the test example data and should produce the figure at the bottom of this page.

Requirements

  • Python (2.X or 3.X)
  • R (tested in R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut")
  • R libraries required include: GMD, ape, RColorBrewer, optparse, logging

Expectations

Upstream

This tool works off of a matrix of single-cell RNA-Seq expression.

Given fastq files, you will need to first align your sequences to your reference of choice. If your sequences do NOT contain special barcodes (like molecular tags or cell barcodes), a standard splice aligner may be appropriate. If special barcodes do exist, you will need to use an appropriate pipeline that is aware of your library construction. Currently, there is no recommendation for a tool to generate expression from your aligned bams; traditional population based RNA-Seq tools are the current option.

The input matrix should have normalized abundance levels. For smart-seq protocols, this involves transcript length and sequencing depth normalization such as transcripts-per-million (TPM). For 3'-end tag counting protocols, the rough equivalent would be counts-per-million. Since per-cell measurements tend to have thousands of reads rather than millions of reads, it's often useful to instead use values as transcripts-per-100k (TP100k) reads instead of TPM.

Citation

Please use the following citation:

Anoop P. Patel, Itay Tirosh, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014 Jun 20: 1396-1401

This methodology was also used in:

Tirosh I et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016 Apr 8;352(6282):189-96

Tirosh I et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature. 2016 Nov 10;539(7628):309-313. PubMed PMID: 27806376; PubMed Central PMCID: PMC5465819.

Venteicher AS, Tirosh I, et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science. 2017 Mar 31;355(6332).PubMed PMID: 28360267; PubMed Central PMCID: PMC5519096.

Puram SV, Tirosh I, et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell. 2017 Dec 14;171(7):1611-1624.e24. PubMed PMID: 29198524; PubMed Central PMCID: PMC5878932.

Demo Example Figure

The following figure should be produced by the Quick Start instructions. This figure shows scRNA-Seq expression of oligodendroglioma with hallmark chr 1p and 19q deletions.

Next steps

Now that you've gotten the example to work, use the menu in the upper right to navigate to the more detailed descriptions and instructions for exploring your own data.

Clone this wiki locally