-
Notifications
You must be signed in to change notification settings - Fork 0
/
S1_Text.Rmd
113 lines (62 loc) · 7.84 KB
/
S1_Text.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: "Supplementary Text 1"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
library(knitr)
library(kableExtra)
library(tidyverse)
```
## Introduction
RNA-seq constitutes a relatively new approach to quantify HLA expression. In order to place our novel findings in the context of well established methods, and because many researchers consider qPCR as a gold standard for HLA expression, we provide an in-depth comparison of RNA-seq with previous qPCR studies.
Of course, our analyses have limitations: when comparing across studies we are often dealing with different individuals, cell types, and techniques. Thus, our efforts to compare our RNA-seq findings to those of published papers that used qPCR are at best a first approximation.
Although some efforts have been made to develop allele-specific primers (Pan et al, 2018), to our knowledge, these have not yet been used to obtain expression levels for population datasets. Thus, allele-based expression levels from qPCR (e.g., Kulkarni et al, 2013; Ramsuram et al, 2015; Ramsuram et al, 2017) are usually imputed from the locus-level expression levels (in the graphs included in those papers, locus-level expression is plotted twice, one point for each allele). Attempts have been made to improve the imputation with the use of a linear model of expression~genotype (e.g., Ramsuram et al, 2015). Thus, besides the technical and biological differences between studies, the imputed nature of allele-level estimates from qPCR provide another source of differences between our RNA-seq estimates with qPCR data at the HLA allele-level. Nevertheless, we designed a survey to compare HLA allele-level expression between RNA-seq and qPCR as rigorously as possible (see below), and to obtain a quantitative summary of the degree of agreement across these methods.
## Methods
### Data
Expression estimates for *HLA-A*, *HLA-B* and *HLA-C* reported by [Ramsuram et al (2015)](https://doi.org/10.1093/hmg/ddv158), [Ramsuram et al (2017)](http://www.jimmunol.org/content/198/6/2320), and [Kulkarni et al (2013)](https://doi.org/10.1073/pnas.1312237110), were kindly provided by corresponding authors.
### Comparison of expression levels between qPCR and RNA-seq
Because our study and the previously published papers with qPCR expression data involve different samples, they cannot be directly compared (e.g., using a correlation analysis). Instead, we asked whether the ordering of HLA lineages by their expression levels differs among studies. To do this, we applied a pairwise Mann-Whitney U test (with FDR correction for multiple testing) to pairwise comparissons between all pairs of lineages with $N \ge 10$ in both the qPCR and RNA-seq data. We restricted subsequent analyses to lineage pairs that showed significantly different expression for both qPCR and RNA-seq quantification.
## Results
### Comparison of lineage ordering across studies
A first impression of the relative ordering of lineages according to the mean expression level suggests substantial differences between RNA-seq and qPCR quantifications (Figure 1).
\hfill\break
```{r, fig.align='center'}
include_graphics("./compare_qpcr/FigSA.png")
```
**Figure 1.** Lineage-level expression in previous qPCR studies and RNA-seq (HLApers) for *HLA-A*, *HLA-B* and *HLA-C*. We included only lineages present in at least 10 individuals in both GEUVADIS RNA-seq data and qPCR data. Y-axis: Transcript per Million for RNA-seq, and $2^{-\Delta \Delta Ct}$ for qPCR. The alleles are ordered from left to right in increasing median expression values.
\hfill\break
The apparent lack of agreement among the methods in the ordering of alleles can be misleading. This is because the variation among individuals within the same lineage is often very high, so differences between methods in the ordering of lineages may be within the expected range of sampling variation. We thus restricted our analyses to pairs of lineages with $N \ge 10$, and with statistically different distributions of expression values according to a pairwise Mann-Whitney U test with FDR correction for multiple testing (in both qPCR and RNA-seq).
For *HLA-A*, *HLA-B* and *HLA-C* we found 33 instances where the expression was significantly different for a pair of lineages in both qPCR and RNA-seq datasets (Table 1).
\hfill\break
\hfill\break
\hfill\break
**Table 1.** Lineage pairs with significantly different expression for both qPCR and RNA-seq (pairwise Mann-Whitney test.
\begin{center}
```{r}
read_tsv("./compare_qpcr/results_significant.tsv") %>%
select(Lineage1 = a1, Lineage2 = a2, `pvalue (pcr)` = 3, `pvalue (hlapers)` = 4) %>%
kable(digits = 5)
```
\end{center}
\hfill\break
Of the significant pairs, for 21/33 cases the results were concordant (i.e. we found the same ordering of expression in our study and in qPCR-based studies). However, the results were markedly different among loci: for *HLA-C*, 13 out of 13 significant pairs were concordant, for *HLA-B*, 4 out 6 had the same ordering. On the other hand, for *HLA-A*, qPCR and RNA-seq results were concordant in only 4 out 14 pairs (Table 2).
\hfill\break
**Table 2.** Number of lineage pairs concordant between qPCR and RNA-seq, total of significant pairs tested, and percentage of concordance for *HLA-A*, *HLA-B* and *HLA-C*.
\begin{center}
```{r}
read_tsv("./compare_qpcr/summary_results.tsv") %>%
mutate(locus = paste0("HLA-", locus),
`Concordance (%)` = same_dir/n_significant * 100) %>%
rename(Locus = 1, `Concordant between qPCR and RNA-seq` = 2, `Total signif. pairs` = 3) %>%
kable(digits = 1)
```
\end{center}
\hfill\break
The high concordance between qPCR and RNA-seq seen for *HLA-C* and *HLA-B* can be exemplified referring to specific lineages: B\*35 is consistently more highly expressed than B\*07, B\*15 and B\*40 in both qPCR and RNA-seq, as are C\*04 and C\*06 in comparison with C\*03, C\*07 and C\*12. The high rate of discordance at *HLA-A*, on the other hand, is driven mainly by the top 2 alleles in qPCR (A\*24 and A\*01), which have low/moderate expression in RNA-seq, and the top 2 alleles in RNA-seq (A\*02 and A\*25), which have low/moderate expression in qPCR, and also by A\*03, whose homozygotes have expression levels close to zero in qPCR.
The low number of lineages with significantly different expression for *HLA-B* reflects the low variation of expression levels among *HLA-B* lineages in the qPCR data (7 pairs out of 55 were significant, whereas RNA-seq had 31), as well as the low sample sizes of some lineages in the qPCR data. For example, B\*13 and B\*51 are the most and the least expressed lineages in both qPCR and RNA-seq, but in the qPCR data there are only 11 individuals with B\*13 and 14 individuals with B\*51, lowering the power of the statistical test. Furthermore, the variability is so high that some individuals with the least expressed lineage (B\*51) have higher expression than some individuals with B\*13. As a consequence, whereas for RNA-seq these lineages have a significant difference in expression ($p < 4 \times 10^{-6}$), the difference is non-significant for qPCR ($p > 0.07$), excluding the lineage from the subsequent test of concordance.
## References
[Kulkarni et al. Genetic interplay between HLA-C and MIR148A in HIV control and Crohn disease. PNAS. 2013;110(51).](https://doi.org/10.1073/pnas.1312237110)
[Pan N et al. Quantification of classical HLA class I mRNA by allele-specific,real-time polymerase chain reaction for most Han individuals. HLA. 2019;91.](https://doi.org/10.1111/tan.13186)
[Ramsuram V et al. Epigenetic regulation of differential HLA-A allelic expression levels. Human Molecular Genetics. 2015;24(15).](https://doi.org/10.1093/hmg/ddv158)
[Ramsuram V et al. Sequence and Phylogenetic Analysis of the Untranslated Promoter Regions for HLA Class I Genes. The Journal of Immunology. 2017;198.](https://doi.org/10.4049/jimmunol.1601679)