4. scRNAseq.Rmd

---
title: "Mapping targets to single cells in plaques."
author: "[Sander W. van der Laan, PhD](https://swvanderlaan.github.io) | @swvanderlaan | s.w.vanderlaan@gmail.com"
date: "`r Sys.Date()`"
output:
  html_notebook:
    cache: yes
    code_folding: hide
    collapse: yes
    df_print: paged
    fig.align: center
    fig_caption: yes
    fig_height: 6
    fig_retina: 2
    fig_width: 7
    highlight: tango
    theme: lumen
    toc: yes
    toc_float:
      collapsed: no
      smooth_scroll: yes
mainfont: Arial
subtitle: Accompanying 'Plaque expression levels of HDAC9 in association with plaque vulnerability traits and secondary vascular events in patients undergoing carotid endarterectomy, an analysis in the Athero-EXPRESS Biobank.'
editor_options:
  chunk_output_type: inline
# bibliography: references.bib
# knit: worcs::cite_all
---

# General Setup

```{r setup, include=FALSE}
# We recommend that you prepare your raw data for analysis in 'prepare_data.R',
# and end that file with either open_data(yourdata), or closed_data(yourdata).
# Then, uncomment the line below to load the original or synthetic data
# (whichever is available), to allow anyone to reproduce your code:
# load_data()

# further define some knitr-options.
knitr::opts_chunk$set(fig.width = 12, fig.height = 8, fig.path = 'Figures/', 
                      warning = TRUE, # show warnings during codebook generation
                      message = TRUE, # show messages during codebook generation
                      error = TRUE, # do not interrupt codebook generation in case of errors, 
                                    # usually better for debugging
                      echo = TRUE,  # show R code
                      eval = TRUE)

ggplot2::theme_set(ggplot2::theme_minimal())
# pander::panderOptions("table.split.table", Inf)
library("worcs")
library("rmarkdown")

```

```{r echo = FALSE}
rm(list = ls())
```

```{r LocalSystem, echo = FALSE}
### Operating System Version
### MacBook Pro
ROOT_loc = "/Users/swvanderlaan"

### MacBook Air 
# ROOT_loc = "/Users/slaan3"

### General
GENOMIC_loc = paste0(ROOT_loc, "/OneDrive - UMC Utrecht/Genomics")
AEDB_loc = paste0(GENOMIC_loc, "/Athero-Express/AE-AAA_GS_DBs")
LAB_loc = paste0(GENOMIC_loc, "/LabBusiness")

PROJECT_loc = paste0(ROOT_loc, "/git/CirculatoryHealth/AE_20211201_YAW_SWVANDERLAAN_HDAC9")

# Genetic and genomic data
STORAGE_loc = paste0(ROOT_loc, "/PLINK")
AERNA_loc = paste0(STORAGE_loc, "/_AE_ORIGINALS/AERNA")
AESCRNA_loc = paste0(STORAGE_loc, "/_AE_ORIGINALS/AESCRNA/prepped_data")
AEGSQC_loc = paste0(STORAGE_loc, "/_AE_ORIGINALS/AEGS_COMBINED_QC2018")

### SOME VARIABLES WE NEED DOWN THE LINE
TRAIT_OF_INTEREST = "HDAC9" # Phenotype
PROJECTNAME = "HDAC9"

cat("\nCreate a new analysis directory...\n")
ifelse(!dir.exists(file.path(PROJECT_loc, "/",PROJECTNAME)), 
       dir.create(file.path(PROJECT_loc, "/",PROJECTNAME)), 
       FALSE)
ANALYSIS_loc = paste0(PROJECT_loc,"/",PROJECTNAME)

ifelse(!dir.exists(file.path(ANALYSIS_loc, "/PLOTS")), 
       dir.create(file.path(ANALYSIS_loc, "/PLOTS")), 
       FALSE)
PLOT_loc = paste0(ANALYSIS_loc,"/PLOTS")

ifelse(!dir.exists(file.path(PLOT_loc, "/QC")), 
       dir.create(file.path(PLOT_loc, "/QC")), 
       FALSE)
QC_loc = paste0(PLOT_loc,"/QC")

ifelse(!dir.exists(file.path(ANALYSIS_loc, "/OUTPUT")), 
       dir.create(file.path(ANALYSIS_loc, "/OUTPUT")), 
       FALSE)
OUT_loc = paste0(ANALYSIS_loc, "/OUTPUT")

ifelse(!dir.exists(file.path(ANALYSIS_loc, "/BASELINE")), 
       dir.create(file.path(ANALYSIS_loc, "/BASELINE")), 
       FALSE)
BASELINE_loc = paste0(ANALYSIS_loc, "/BASELINE")


setwd(paste0(PROJECT_loc))
getwd()
list.files()

```

```{r Source functions}
source(paste0(PROJECT_loc, "/scripts/functions.R"))
```

```{r}
ggplot2::theme_set(ggplot2::theme_minimal())
pander::panderOptions("table.split.table", Inf)
```

```{r loading_packages, message=FALSE, warning=FALSE}
install.packages.auto("pander")
install.packages.auto("readr")
install.packages.auto("optparse")
install.packages.auto("tools")
install.packages.auto("dplyr")
install.packages.auto("tidyr")
install.packages.auto("naniar")

# To get 'data.table' with 'fwrite' to be able to directly write gzipped-files
# Ref: https://stackoverflow.com/questions/42788401/is-possible-to-use-fwrite-from-data-table-with-gzfile
# install.packages("data.table", repos = "https://Rdatatable.gitlab.io/data.table")
library(data.table)

install.packages.auto("tidyverse")
install.packages.auto("knitr")
install.packages.auto("DT")
install.packages.auto("eeptools")

install.packages.auto("openxlsx")

install.packages.auto("haven")
install.packages.auto("tableone")
install.packages.auto("sjPlot")

install.packages.auto("BlandAltmanLeh")

# Install the devtools package from Hadley Wickham
install.packages.auto('devtools')

# for plotting
install.packages.auto("pheatmap")
install.packages.auto("forestplot")
install.packages.auto("ggplot2")

install.packages.auto("ggpubr")

install.packages.auto("UpSetR")

devtools::install_github("thomasp85/patchwork")

# for Seurat etc
install.packages.auto("org.Hs.eg.db")
install.packages.auto("mygene")
install.packages.auto("EnhancedVolcano")
```

```{r}

# Install the devtools package from Hadley Wickham
install.packages.auto('devtools')
# Replace '2.3.4' with your desired version
# devtools::install_version(package = 'Seurat', version = package_version('2.3.4'))
# install.packages("Seurat")
install.packages.auto("Seurat") # latest version
library("Seurat")

```

```{r Setting: Colors}

Today = format(as.Date(as.POSIXlt(Sys.time())), "%Y%m%d")
Today.Report = format(as.Date(as.POSIXlt(Sys.time())), "%A, %B %d, %Y")

### UtrechtScienceParkColoursScheme
###
### WebsitetoconvertHEXtoRGB:http://hex.colorrrs.com.
### Forsomefunctionsyoushoulddividethesenumbersby255.
###
###	No.	Color			      HEX	(RGB)						              CHR		  MAF/INFO
###---------------------------------------------------------------------------------------
###	1	  yellow			    #FBB820 (251,184,32)				      =>	1		or 1.0>INFO
###	2	  gold			      #F59D10 (245,157,16)				      =>	2		
###	3	  salmon			    #E55738 (229,87,56)				      =>	3		or 0.05<MAF<0.2 or 0.4<INFO<0.6
###	4	  darkpink		    #DB003F ((219,0,63)				      =>	4		
###	5	  lightpink		    #E35493 (227,84,147)				      =>	5		or 0.8<INFO<1.0
###	6	  pink			      #D5267B (213,38,123)				      =>	6		
###	7	  hardpink		    #CC0071 (204,0,113)				      =>	7		
###	8	  lightpurple	    #A8448A (168,68,138)				      =>	8		
###	9	  purple			    #9A3480 (154,52,128)				      =>	9		
###	10	lavendel		    #8D5B9A (141,91,154)				      =>	10		
###	11	bluepurple		  #705296 (112,82,150)				      =>	11		
###	12	purpleblue		  #686AA9 (104,106,169)			      =>	12		
###	13	lightpurpleblue	#6173AD (97,115,173/101,120,180)	=>	13		
###	14	seablue			    #4C81BF (76,129,191)				      =>	14		
###	15	skyblue			    #2F8BC9 (47,139,201)				      =>	15		
###	16	azurblue		    #1290D9 (18,144,217)				      =>	16		or 0.01<MAF<0.05 or 0.2<INFO<0.4
###	17	lightazurblue	  #1396D8 (19,150,216)				      =>	17		
###	18	greenblue		    #15A6C1 (21,166,193)				      =>	18		
###	19	seaweedgreen	  #5EB17F (94,177,127)				      =>	19		
###	20	yellowgreen		  #86B833 (134,184,51)				      =>	20		
###	21	lightmossgreen	#C5D220 (197,210,32)				      =>	21		
###	22	mossgreen		    #9FC228 (159,194,40)				      =>	22		or MAF>0.20 or 0.6<INFO<0.8
###	23	lightgreen	  	#78B113 (120,177,19)				      =>	23/X
###	24	green			      #49A01D (73,160,29)				      =>	24/Y
###	25	grey			      #595A5C (89,90,92)				        =>	25/XY	or MAF<0.01 or 0.0<INFO<0.2
###	26	lightgrey		    #A2A3A4	(162,163,164)			      =>	26/MT
###
###	ADDITIONAL COLORS
###	27	midgrey			#D7D8D7
###	28	verylightgrey	#ECECEC"
###	29	white			#FFFFFF
###	30	black			#000000
###----------------------------------------------------------------------------------------------

uithof_color = c("#FBB820","#F59D10","#E55738","#DB003F","#E35493","#D5267B",
                 "#CC0071","#A8448A","#9A3480","#8D5B9A","#705296","#686AA9",
                 "#6173AD","#4C81BF","#2F8BC9","#1290D9","#1396D8","#15A6C1",
                 "#5EB17F","#86B833","#C5D220","#9FC228","#78B113","#49A01D",
                 "#595A5C","#A2A3A4", "#D7D8D7", "#ECECEC", "#FFFFFF", "#000000")

uithof_color_legend = c("#FBB820", "#F59D10", "#E55738", "#DB003F", "#E35493",
                        "#D5267B", "#CC0071", "#A8448A", "#9A3480", "#8D5B9A",
                        "#705296", "#686AA9", "#6173AD", "#4C81BF", "#2F8BC9",
                        "#1290D9", "#1396D8", "#15A6C1", "#5EB17F", "#86B833",
                        "#C5D220", "#9FC228", "#78B113", "#49A01D", "#595A5C",
                        "#A2A3A4", "#D7D8D7", "#ECECEC", "#FFFFFF", "#000000")
### ----------------------------------------------------------------------------
```

# ERA-CVD 'druggable-MI-targets'

<!-- ![ERA-CVD logo]("Users/swvanderlaan/iCloud/Genomics/Projects/#Druggable-MI-Genes/Administration/ERA-CVD\ Logo_CMYK.jpg") -->

For the ERA-CVD 'druggable-MI-targets' project (grantnumber: 01KL1802) we performed two related RNA sequencing (RNAseq) experiments:

1)  conventional ('bulk') RNAseq using RNA extracted from carotid plaque samples, n ± 700. As of `r Today.Report` all samples have been selected and
RNA has been extracted; quality control (QC) was performed and we have a dataset of 635 samples.

2)  single-cell RNAseq (scRNAseq) of at least n = 40 samples (20 females, 20 males). As of `r Today.Report` data is available of 40 samples (3 females, 15 males), we are extending sampling to get more female samples.

Plaque samples are derived from carotid endarterectomies as part of the [Athero-Express Biobank Study](http:www/atheroexpress.nl) which is an ongoing study in the UMC Utrecht.

# Background

Here we map the `r TRAIT_OF_INTEREST` to single-cells from the plaques.

```{r targets for mapping}

library(openxlsx)

gene_list_df <- read.xlsx(paste0(PROJECT_loc, "/targets/Genes.xlsx"), sheet = "Genes")

target_genes <- unlist(gene_list_df$Gene)
target_genes

```

# Load data

First we will load the data:

-   scRNAseq experimental data and rename the cell types.
-   Athero-Express clinical data.

Here we load the latest dataset from our Athero-Express single-cell RNA experiment.

```{r LoadData}

# load(paste0(AESCRNA_loc, "/20210811.46.patients.KP.RData"))
# scRNAseqData <- seuset
# rm(seuset)
# 
# saveRDS(scRNAseqData, paste0(AESCRNA_loc, "/20210811.46.patients.KP.RDS"))

scRNAseqData <- readRDS(paste0(AESCRNA_loc, "/20210811.46.patients.KP.RDS"))

scRNAseqData

```

The naming/classification is based on a combination conventional markers. We do not claim to know the exact identity of each cell, rather we refer to cells as 'KIT+ Mast cells"-like cells. Likewise we refer to the cell clusters as 'communities' of cells that exhibit similar properties, *i.e.* similar defining markers (*e.g. KIT*).

We will rename the cell types to human readable names.

```{r Change cell cummunity names}
### change names for clarity
backup.scRNAseqData = scRNAseqData
# get the old names to change to new names
UMAPPlot(scRNAseqData, label = FALSE, pt.size = 1.25, label.size = 4, group.by = "ident")

```

```{r}
unique(scRNAseqData@active.ident)
```

```{r}
celltypes <- c("CD68+CD4+ Monocytes" = "CD68+CD4+ Mono", 
               "CD68+IL18+TLR4+TREM2+ Resident macrophages" = "CD68+IL18+TLR4+TREM2+ MRes", 
               "CD68+CD1C+ Dendritic Cells" = "CD68+CD1C+ DC",
               "CD68+CASP1+IL1B+SELL+ Inflammatory macrophages" = "CD68+CASP1+IL1B+SELL MInf",
               "CD68+ABCA1+OLR1+TREM2+ Foam Cells" = "CD68+ABCA1+OLR1+TREM2+ FC",
               
               # T-cells
               "CD3+ T Cells I" = "CD3+ TC I",
               "CD3+ T Cells II" = "CD3+ TC II", 
               "CD3+ T Cells III" = "CD3+ TC III", 
               "CD3+ T Cells IV" = "CD3+ TC IV", 
               "CD3+ T Cells V" = "CD3+ TC V", 
               "CD3+ T Cells VI" = "CD3+ TC VI", 
               "FOXP3+ T Cells" = "FOXP3+ TC",
               
               # Endothelial cells
               "CD34+ Endothelial Cells I" = "CD34+ EC I", 
               "CD34+ Endothelial Cells II" = "CD34+ EC II", 
               
               # SMC
               "ACTA2+ Smooth Muscle Cells" = "ACTA2+ SMC", 
               
               # NK Cells
               "CD3+CD56+ NK Cells I" = "CD3+CD56+ NK I",
               "CD3+CD56+ NK Cells II" = "CD3+CD56+ NK II",
               # Mast
               "CD68+KIT+ Mast Cells" = "CD68+KIT+ MC",
               
               "CD79A+ Class-switched Memory B Cells" = "CD79A+ BCmem", 
               "CD79+ Plasma B Cells" = "CD79+ BCplasma")

scRNAseqData <- Seurat::RenameIdents(object = scRNAseqData, 
                                       celltypes)
```

```{r Change cell cummunity names - new plot}
UMAPPlot(scRNAseqData, label = TRUE, pt.size = 1.25, label.size = 4, group.by = "ident",
         repel = TRUE)

```

## Clinical data

Loading the Athero-Express clinical data.

```{r LoadAEDB}

AEDB.CEA <- readRDS(file = paste0(OUT_loc, "/20220319.",TRAIT_OF_INTEREST,".AEDB.CEA.RDS"))

```


```{r }

# Baseline table variables
basetable_vars = c("Hospital", "ORyear", "Artery_summary",
                   "Age", "Gender", 
                   # "TC_finalCU", "LDL_finalCU", "HDL_finalCU", "TG_finalCU", 
                   "TC_final", "LDL_final", "HDL_final", "TG_final", 
                   # "hsCRP_plasma",
                   "systolic", "diastoli", "GFR_MDRD", "BMI", 
                   "KDOQI", "BMI_WHO",
                   "SmokerStatus", "AlcoholUse",
                   "DiabetesStatus", 
                   "Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs", 
                   "Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD", 
                   "Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
                   "Symptoms.Update2G", "Symptoms.Update3G", "indexsymptoms_latest_4g",
                   "restenos", "stenose",
                   "CAD_history", "PAOD", "Peripheral.interv", 
                   "EP_composite", "EP_composite_time", "EP_major", "EP_major_time",
                   "MAC_rankNorm", "SMC_rankNorm", "Macrophages.bin", "SMC.bin",
                   "Neutrophils_rankNorm", "MastCells_rankNorm",
                   "IPH.bin", "VesselDensity_rankNorm",
                   "Calc.bin", "Collagen.bin", 
                   "Fat.bin_10", "Fat.bin_40", "OverallPlaquePhenotype", "Plaque_Vulnerability_Index")

basetable_bin = c("Gender",  "Artery_summary",
                  "KDOQI", "BMI_WHO",
                  "SmokerStatus", "AlcoholUse",
                  "DiabetesStatus", 
                  "Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs", 
                  "Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD", 
                  "Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
                  "Symptoms.Update2G", "Symptoms.Update3G", "indexsymptoms_latest_4g",
                  "restenos", "stenose",
                  "CAD_history", "PAOD", "Peripheral.interv", 
                  "EP_major", "EP_composite", "Macrophages.bin", "SMC.bin",
                  "IPH.bin", 
                  "Calc.bin", "Collagen.bin", 
                  "Fat.bin_10", "Fat.bin_40", "OverallPlaquePhenotype", "Plaque_Vulnerability_Index")
# basetable_bin

basetable_con = basetable_vars[!basetable_vars %in% basetable_bin]
# basetable_con
```

## AESCRNA: baseline characteristics

### Preparation

```{r Baseline: creation}
metadata <- scRNAseqData@meta.data %>% as_tibble() %>% separate(orig.ident, c("Patient", NA))
scRNAseqDataMeta <- metadata %>% distinct(Patient, .keep_all = TRUE)

scRNAseqDataMetaAE <- merge(scRNAseqDataMeta, AEDB.CEA, by.x = "Patient", by.y = "STUDY_NUMBER", sort = FALSE, all.x = TRUE)
dim(scRNAseqDataMetaAE)

# Replace missing data 
# Ref: https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.html
require(naniar)

na_strings <- c("NA", "N A", "N / A", "N/A", "N/ A", 
                "Not Available", "Not available", 
                "missing", 
                "-999", "-99", 
                "No data available/missing", "No data available/Missing")
# Then you write ~.x %in% na_strings - which reads as “does this value occur in the list of NA strings”.

scRNAseqDataMetaAE %>%
  replace_with_na_all(condition = ~.x %in% na_strings)
```

```{r }
cat("====================================================================================================")
cat("SELECTION THE SHIZZLE")

cat("- sanity checking PRIOR to selection")
library(data.table)
require(labelled)
ae.gender <- to_factor(scRNAseqDataMetaAE$Gender)
ae.hospital <- to_factor(scRNAseqDataMetaAE$Hospital)
table(ae.gender, ae.hospital, dnn = c("Sex", "Hospital"), useNA = "ifany")

ae.artery <- to_factor(scRNAseqDataMetaAE$Artery_summary)
table(ae.artery, ae.gender, dnn = c("Sex", "Artery"), useNA = "ifany")

ae.ic <- to_factor(scRNAseqDataMetaAE$informedconsent)
table(ae.ic, ae.gender, useNA = "ifany")

rm(ae.gender, ae.hospital, ae.artery, ae.ic)


scRNAseqDataMetaAE.all <- subset(scRNAseqDataMetaAE,
                                 (Artery_summary == "carotid (left & right)" | Artery_summary == "other carotid arteries (common, external)" ) & # we only want carotids
                                   informedconsent != "missing" & # we are really strict in selecting based on 'informed consent'!
                                   informedconsent != "no, died" &
                                   informedconsent != "yes, no tissue, no commerical business" &
                                   informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" &
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" &
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" &
                                   informedconsent != "yes, no tissue, no health treatment" &
                                   informedconsent != "yes, no tissue, no questionnaires" &
                                   informedconsent != "yes, no tissue, health treatment when possible" &
                                   informedconsent != "yes, no tissue" &
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info" &
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" &
                                   informedconsent != "no, doesn't want to" &
                                   informedconsent != "no, unable to sign" &
                                   informedconsent != "no, no reaction" &
                                   informedconsent != "no, lost" &
                                   informedconsent != "no, too old" &
                                   informedconsent != "yes, no medical info, health treatment when possible" & 
                                   informedconsent != "no (never asked for IC because there was no tissue)" &
                                   informedconsent != "no, endpoint" &
                                   informedconsent != "nooit geincludeerd" & 
                                   informedconsent != "yes, no health treatment, no commercial business" & # IMPORTANT: since we are sharing with a commercial party
                                   informedconsent != "yes, no tissue, no commerical business" & 
                                   informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" & 
                                   informedconsent != "yes, no questionnaires, no health treatment, no commercial business" & 
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" & 
                                   informedconsent != "yes, no health treatment, no medical info, no commercial business" & 
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" & 
                                   informedconsent != "yes, no commerical business" & 
                                   informedconsent != "yes, health treatment when possible, no commercial business" & 
                                   informedconsent != "yes, no medical info, no commercial business" & 
                                   informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" & 
                                   informedconsent != "yes, no questionnaires, no commercial business" & 
                                   informedconsent != "yes, no questionnaires, health treatment when possible, no commercial business" & 
                                   informedconsent != "second informed concents: yes, no commercial business")
# scRNAseqDataMetaAE.all[1:10, 1:10]
dim(scRNAseqDataMetaAE.all)
# DT::datatable(scRNAseqDataMetaAE.all)

```

### Baseline

Showing the baseline table for the scRNAseq data in 39 CEA patients with
informed consent.

```{r Baseline: Visualize}
cat("===========================================================================================")
cat("CREATE BASELINE TABLE")

# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
scRNAseqDataMetaAE.all.tableOne = print(CreateTableOne(vars = basetable_vars, 
                                                  # factorVars = basetable_bin,
                                                  # strata = "Gender",
                                                  data = scRNAseqDataMetaAE.all, includeNA = TRUE), 
                                   nonnormal = c(), 
                                   quote = FALSE, showAllLevels = TRUE,
                                   format = "p", 
                                   contDigits = 3)[,1:2]

```

Writing the baseline table to Excel format.

```{r }
# Write basetable
require(openxlsx)
# write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AESCRNA.CEA.39pts.after_qc.IC_commercial.BaselineTable.xlsx"), 
#            format(scRNAseqDataMetaAE.all.tableOne, digits = 5, scientific = FALSE) , 
#            rowNames = TRUE, colNames = TRUE, overwrite = TRUE)

write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AESCRNA.CEA.32pts.after_qc.IC_academic.BaselineTable.xlsx"), 
           format(scRNAseqDataMetaAE.all.tableOne, digits = 5, scientific = FALSE) , 
           rowNames = TRUE, colNames = TRUE, overwrite = TRUE)

```

# AESCRNA

## Quality control

Here review the number of cells per sample, plate, and patients. And plot the
ratio's per sample and study number.

```{r QualityControl}
## check stuff
cat("\nHow many cells per type ...?")
sort(table(scRNAseqData@meta.data$SCT_snn_res.0.8))

# cat("\n\nHow many cells per plate ...?")
# sort(table(scRNAseqData@meta.data$ID))

# cat("\n\nHow many cells per type per plate ...?")
# table(scRNAseqData@meta.data$SCT_snn_res.0.8, scRNAseqData@meta.data$ID)

cat("\n\nHow many cells per patient ...?")
sort(table(scRNAseqData@meta.data$Patient))

cat("\n\nVisualizing these ratio's per study number and sample ...?")
UMAPPlot(scRNAseqData, label = TRUE, pt.size = 1.25, label.size = 4, group.by = "ident",
         repel = TRUE)
ggsave(paste0(PLOT_loc, "/", Today, ".UMAP.png"), plot = last_plot())
ggsave(paste0(PLOT_loc, "/", Today, ".UMAP.ps"), plot = last_plot())


# barplot(prop.table(x = table(scRNAseqData@active.ident, scRNAseqData@meta.data$Patient)), 
#         cex.axis = 1.0, cex.names = 0.5, las = 1,
#         col = uithof_color, xlab = "study number", legend.text = FALSE, args.legend = list(x = "bottom"))
# dev.copy(pdf, paste0(QC_loc, "/", Today, ".cell_ratios_per_sample.pdf"))
# dev.off()

# barplot(prop.table(x = table(scRNAseqData@active.ident, scRNAseqData@meta.data$ID)), 
#         cex.axis = 1.0, cex.names = 0.5, las = 2,
#         col = uithof_color, xlab = "sample ID", legend.text = FALSE, args.legend = list(x = "bottom"))
# dev.copy(pdf, paste0(QC_loc, "/", Today, ".cell_ratios_per_sample_per_plate.pdf"))
# dev.off()


```

## Visualisations

Let's project known cellular markers.

```{r Visualisation: tSNE Exploration}

UMAPPlot(scRNAseqData, label = FALSE, pt.size = 1.25, label.size = 4, group.by = "ident",
         repel = TRUE)

# endothelial cells
FeaturePlot(scRNAseqData, features = c("CD34"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("EDN1"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("EDNRA", "EDNRB"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("CDH5", "PECAM1"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("ACKR1"), cols =  c("#ECECEC", "#DB003F"))

# SMC
FeaturePlot(scRNAseqData, features = c("MYH11"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("LGALS3", "ACTA2"), cols =  c("#ECECEC", "#DB003F"))

# macrophages
FeaturePlot(scRNAseqData, features = c("CD14", "CD68"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("CD36"), cols =  c("#ECECEC", "#DB003F"))

# t-cells
FeaturePlot(scRNAseqData, features = c("CD3E"), cols =  c("#ECECEC", "#DB003F"))
FeaturePlot(scRNAseqData, features = c("CD4"), cols =  c("#ECECEC", "#DB003F"))
# FeaturePlot(scRNAseqData, features = c("CD8"), cols =  c("#ECECEC", "#DB003F"))

# b-cells
FeaturePlot(scRNAseqData, features = c("CD79A"), cols =  c("#ECECEC", "#DB003F"))

# mast cells
FeaturePlot(scRNAseqData, features = c("KIT"), cols =  c("#ECECEC", "#DB003F"))

# NK cells
FeaturePlot(scRNAseqData, features = c("NCAM1"), cols =  c("#ECECEC", "#DB003F"))

```

## Targets of interest:

We check whether the targets genes were sequenced using our method.

```{r list target genes}
length(target_genes)
target_genes

```

### Expression in cell communities

```{r Visualisation: preparation}

# target_genes_rm <- c("AC011294.3", "C6orf195", "C9orf53", "AL137026.1", "RP11-145E5.5",
#                      "ZNF32", "BCAM", "DUPD1", "PVRL2")
# 
# temp = target_genes[!target_genes %in% target_genes_rm]
# 
# target_genes_qc <- c(temp, "DUSP27", "NECTIN2")

target_genes_qc <- target_genes
target_genes_qc
```

```{r Visualisation: Targets Feature and Dot Plots, message=FALSE, warning=FALSE}
library(RColorBrewer)

p1 <- DotPlot(scRNAseqData, features = target_genes_qc,
        cols = "RdBu")

p1 + theme(axis.text.x = element_text(angle = 45, hjust=1, size = 5))

ggsave(paste0(PLOT_loc, "/", Today, ".DotPlot.Targets.png"), plot = last_plot())
ggsave(paste0(PLOT_loc, "/", Today, ".DotPlot.Targets.ps"), plot = last_plot())
ggsave(paste0(PLOT_loc, "/", Today, ".DotPlot.Targets.pdf"), plot = last_plot())

rm(p1)

# FeaturePlot(scRNAseqData, features = c(target_genes_qc),
#             cols =  c("#ECECEC", "#DB003F", "#9A3480","#1290D9"),
#             combine = TRUE)
# 
# ggsave(paste0(PLOT_loc, "/", Today, ".FeaturePlot.Targets.png"), plot = last_plot())
# ggsave(paste0(PLOT_loc, "/", Today, ".FeaturePlot.Targets.ps"), plot = last_plot())


```

```{r Visualisation: Targets}
# VlnPlot(scRNAseqData, features = "DUSP27")

# VlnPlot files
ifelse(!dir.exists(file.path(PLOT_loc, "/VlnPlot")), 
       dir.create(file.path(PLOT_loc, "/VlnPlot")), 
       FALSE)
VlnPlot_loc = paste0(PLOT_loc, "/VlnPlot")


for (GENE in target_genes_qc){
  print(paste0("Projecting the expression of ", GENE, "."))

  vp1 <-  VlnPlot(scRNAseqData, features = GENE) + 
    xlab("cell communities") + 
    ylab(bquote("normalized expression")) +
    theme(axis.title.x = element_text(color = "#000000", size = 14, face = "bold"), 
            axis.title.y = element_text(color = "#000000", size = 14, face = "bold"), 
            legend.position = "none")
    ggsave(paste0(VlnPlot_loc, "/", Today, ".VlnPlot.",GENE,".png"), plot = last_plot())
    ggsave(paste0(VlnPlot_loc, "/", Today, ".VlnPlot.",GENE,".ps"), plot = last_plot())
    ggsave(paste0(VlnPlot_loc, "/", Today, ".VlnPlot.",GENE,".pdf"), plot = last_plot())
  
  # print(vp1)
  
}

```

### Differential expression between cell communities

Here we project genes to only the broad cell communities:

-   macrophages
-   endothelial cells
-   smooth muscle cells
-   T-cells
-   B-cells
-   Mast cells
-   NK-cells
-   Mixed cells

#### Macrophages

```{r}
unique(scRNAseqData@active.ident)
```

Comparison between the macrophages cell communities (*CD14/CD68*<sup>+</sup>),
and all other communities.

```{r Visualisation: Volcano MAC calculate}

MAC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC"), 
                          ident.2 = c(#"CD68+CASP1+IL1B+SELL MInf", 
                                      #"CD68+CD1C+ DC", 
                                      #"CD68+CD4+ Mono",
                                      #"CD68+IL18+TLR4+TREM2+ MRes",
                                      #"CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(MAC.markers)
```

```{r Visualisation: Volcano MAC, message=FALSE, warning=FALSE}
MAC_Volcano_TargetsA = EnhancedVolcano(MAC.markers,
    lab = rownames(MAC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "Macrophage markers\n(Macrophage communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/(nrow(MAC.markers)), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
MAC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.MAC.DEG.Targets.pdf"), 
       plot = MAC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results MAC}
library(tibble)
MAC.markers <- add_column(MAC.markers, Gene = row.names(MAC.markers), .before = 1)

temp <- MAC.markers[MAC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results MAC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".MAC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### Smooth muscle cells

Comparison between the smooth muscle cell communities (*ACTA2*<sup>+</sup>), and
all other communities.

```{r Visualisation: Volcano SMC calculate}

SMC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("ACTA2+ SMC"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      #"ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(SMC.markers)
```

```{r Visualisation: Volcano SMC, message=FALSE, warning=FALSE}
SMC_Volcano_TargetsA = EnhancedVolcano(SMC.markers,
    lab = rownames(SMC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "SMC markers\n(SMC communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/(nrow(SMC.markers)), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
SMC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.SMC.DEG.Targets.pdf"), 
       plot = SMC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results SMC}
library(tibble)
SMC.markers <- add_column(SMC.markers, Gene = row.names(SMC.markers), .before = 1)

temp <- SMC.markers[SMC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results SMC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".SMC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### Endothelial cells

Comparison between the endothelial cell communities (*CD34*<sup>+</sup>), and
all other communities.

```{r Visualisation: Volcano EC calculate}

EC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD34+ EC I", 
                                      "CD34+ EC II"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      # "CD34+ EC I", 
                                      # "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(EC.markers)
```

```{r Visualisation: Volcano EC, message=FALSE, warning=FALSE}
EC_Volcano_TargetsA = EnhancedVolcano(EC.markers,
    lab = rownames(EC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "Endothelial cell markers\n(EC communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/(nrow(EC.markers)), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
EC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.EC.DEG.Targets.pdf"), 
       plot = EC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results EC}
library(tibble)
EC.markers <- add_column(EC.markers, Gene = row.names(EC.markers), .before = 1)

temp <- EC.markers[EC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results EC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".EC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### T-cells

Comparison between the T-cell communities (*CD3/CD4/CD8*<sup>+</sup>), and all
other communities.

```{r Visualisation: Volcano Tcell calculate}

TC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      # "CD3+ TC I",
                                      # "CD3+ TC II", 
                                      # "CD3+ TC III", 
                                      # "CD3+ TC IV", 
                                      # "CD3+ TC V", 
                                      # "CD3+ TC VI", 
                                      # "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(TC.markers)
```

```{r Visualisation: Volcano Tcell, message=FALSE, warning=FALSE}
TC_Volcano_TargetsA = EnhancedVolcano(TC.markers,
    lab = rownames(TC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "T-cell markers\n(T-cell communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/nrow(TC.markers), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
TC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.TC.DEG.Targets.pdf"), 
       plot = TC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results TC}
library(tibble)
TC.markers <- add_column(TC.markers, Gene = row.names(TC.markers), .before = 1)

temp <- TC.markers[TC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results TC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".TC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### B-cells

Comparison between the B-cell communities (*CD79A*<sup>+</sup>), and all other
communities.

```{r Visualisation: Volcano Bcell calculate}

BC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD79+ BCplasma", 
                                      "CD79A+ BCmem"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC"
                                      # "CD79+ BCplasma", 
                                      # "CD79A+ BCmem"
                                      ))

DT::datatable(BC.markers)
```

```{r Visualisation: Volcano Bcell, message=FALSE, warning=FALSE}
BC_Volcano_TargetsA = EnhancedVolcano(BC.markers,
    lab = rownames(BC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "B-cell markers\n(B-cell communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/nrow(BC.markers), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
BC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.BC.DEG.Targets.pdf"), 
       plot = BC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results BC}
library(tibble)
BC.markers <- add_column(BC.markers, Gene = row.names(BC.markers), .before = 1)

temp <- BC.markers[BC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results BC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".BC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### Mast cells

Comparison between the mast cell communities (*KIT*<sup>+</sup>), and all other
communities.

```{r Visualisation: Volcano Mast calculate}

MC.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD68+KIT+ MC"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      "CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II", 
                                      # "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(MC.markers)
```

```{r Visualisation: Volcano Mast, message=FALSE, warning=FALSE}
MC_Volcano_TargetsA = EnhancedVolcano(MC.markers,
    lab = rownames(MC.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "Mast cell markers\n(Mast cell communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/nrow(MC.markers), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
MC_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.MC.DEG.Targets.pdf"), 
       plot = MC_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results MC}
library(tibble)
MC.markers <- add_column(MC.markers, Gene = row.names(MC.markers), .before = 1)

temp <- MC.markers[MC.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results MC: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".MC.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

#### NK-cells

Comparison between the natural killer cell communities (*NCAM1*<sup>+</sup>),
and all other communities.

```{r Visualisation: Volcano NK calculate}

NK.markers <- FindMarkers(object = scRNAseqData, 
                          ident.1 = c("CD3+CD56+ NK I",
                                      "CD3+CD56+ NK II"), 
                          ident.2 = c("CD68+CASP1+IL1B+SELL MInf", 
                                      "CD68+CD1C+ DC", 
                                      "CD68+CD4+ Mono",
                                      "CD68+IL18+TLR4+TREM2+ MRes",
                                      "CD68+ABCA1+OLR1+TREM2+ FC",
                                      "CD3+ TC I",
                                      "CD3+ TC II", 
                                      "CD3+ TC III", 
                                      "CD3+ TC IV", 
                                      "CD3+ TC V", 
                                      "CD3+ TC VI", 
                                      "FOXP3+ TC", 
                                      "CD34+ EC I", 
                                      "CD34+ EC II",
                                      "ACTA2+ SMC", 
                                      # "CD3+CD56+ NK I",
                                      # "CD3+CD56+ NK II", 
                                      "CD68+KIT+ MC",
                                      "CD79+ BCplasma", 
                                      "CD79A+ BCmem"))

DT::datatable(NK.markers)
```

```{r Visualisation: Volcano NK, message=FALSE, warning=FALSE}
NK_Volcano_TargetsA = EnhancedVolcano(NK.markers,
    lab = rownames(NK.markers),
    x = "avg_log2FC",
    y = "p_val_adj",
    selectLab = target_genes_qc,
    axisLabSize = 12,
    xlab = "average fold-change",
    title = "NK markers\n(NK-cell communities vs the rest)",
    titleLabSize = 14,
    pCutoff = 0.05/nrow(NK.markers), # 20552 genes
    FCcutoff = 1.25,
    pointSize = 1.5,
    labSize = 3.0,
    legendLabels =c('NS','avg. fold-change','P',
      'P & avg. fold-change'),
    legendPosition = "right",
    legendLabSize = 10,
    legendIconSize = 3.0,
    drawConnectors = TRUE,
    widthConnectors = 0.2,
    colConnectors = "#595A5C",
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
NK_Volcano_TargetsA
ggsave(paste0(PLOT_loc, "/", Today, ".Volcano.NK.DEG.Targets.pdf"), 
       plot = NK_Volcano_TargetsA)
```

The target results are given below and written to a file.

```{r Results NK}
library(tibble)
NK.markers <- add_column(NK.markers, Gene = row.names(NK.markers), .before = 1)

temp <- NK.markers[NK.markers$Gene %in% target_genes_qc,]

DT::datatable(temp)
```

```{r Results NK: writing}
fwrite(temp, file = paste0(OUT_loc, "/", Today, ".NK.DEG.Targets.txt"),
       quote = FALSE,
       sep = "\t", 
       showProgress = FALSE, verbose = FALSE)
```

# Subset scRNAseq data

List of samples to be included based on informed consent (see above).

```{r}
samples_of_interest <- unlist(scRNAseqDataMetaAE.all$Patient)

```

```{r}
scRNAseqDataCEA39 <- subset(scRNAseqData, subset = Patient %in% samples_of_interest)
```

```{r}
variables_of_interest <- c("Hospital", "ORyear", "Artery_summary",
                           "Age", "Gender",
                           "TC_final", "LDL_final", "HDL_final", "TG_final",
                           "systolic", "diastoli", "GFR_MDRD", "BMI",
                           "KDOQI", "BMI_WHO",
                           "SmokerStatus", "AlcoholUse",
                           "DiabetesStatus",
                           "Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs",
                           "Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
                           "Stroke_Dx",
                           "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
                           "Symptoms.Update2G", "Symptoms.Update3G", "indexsymptoms_latest_4g",
                           "restenos", "stenose",
                           "CAD_history", "PAOD", "Peripheral.interv",
                           "EP_composite", "EP_composite_time", "EP_major", "EP_major_time")

temp <- subset(scRNAseqDataMetaAE.all, select = c("Patient", variables_of_interest))
# str(temp)

```

```{r}
scRNAseqDataCEA39@meta.data <- merge(scRNAseqDataCEA39@meta.data, temp, by.x = "Patient", by.y = "Patient")
scRNAseqDataCEA39@meta.data <- dplyr::rename(scRNAseqDataCEA39@meta.data, "STUDY_NUMBER" = "Patient")

# str(scRNAseqDataCEA39@meta.data)

```

## Saving new dataset

```{r}
temp2 <- as_tibble(subset(scRNAseqDataCEA39@meta.data, select = c("STUDY_NUMBER", "orig.ident", "nCount_RNA", "nFeature_RNA",
                                                                 "Plate", "Batch", "C.H", "Type", "percent.mt",
                                                                 "nCount_SCT", "nFeature_SCT", "seurat_clusters")))

# fwrite(temp2,
#        file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.samplelist.after_qc.IC_commercial.csv"),
#        sep = ",", row.names = FALSE, col.names = TRUE,
#        showProgress = TRUE)
# rm(temp2)
# 
# temp <- dplyr::rename(temp, "STUDY_NUMBER" = "Patient")
# fwrite(temp,
#        file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.clinicaldata.after_qc.IC_commercial.csv"),
#        sep = ",", row.names = FALSE, col.names = TRUE,
#        showProgress = TRUE)
# rm(temp)
# 
# saveRDS(scRNAseqDataCEA39, file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.Seurat.after_qc.IC_commercial.RDS"))

fwrite(temp2,
       file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.samplelist.after_qc.IC_academic.csv"),
       sep = ",", row.names = FALSE, col.names = TRUE,
       showProgress = TRUE)
rm(temp2)

temp <- dplyr::rename(temp, "STUDY_NUMBER" = "Patient")
fwrite(temp,
       file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.clinicaldata.after_qc.IC_academic.csv"),
       sep = ",", row.names = FALSE, col.names = TRUE,
       showProgress = TRUE)
rm(temp)

saveRDS(scRNAseqDataCEA39, file = paste0(OUT_loc, "/", Today, ".AESCRNA.CEA.39pts.Seurat.after_qc.IC_academic.RDS"))

```


# Session information

--------------------------------------------------------------------------------

    Version:      v1.0.1
    Last update:  2022-03-19
    Written by:   Sander W. van der Laan (s.w.vanderlaan-2[at]umcutrecht.nl).
    Description:  Script to load single-cell RNA sequencing (scRNAseq) data, and perform quality control (QC), and initial mapping to cells.
    Minimum requirements: R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo', macOS Mojave (10.14.2).

    **MoSCoW To-Do List**
    The things we Must, Should, Could, and Would have given the time we have.
    _M_

    _S_

    _C_

    _W_

    **Changes log**
    * v1.0.1 Update to main AEDB (there is an error in the Age-variable in the new version). Fewer patients in scRNAseq (32 vs 39 with the newer dataset).
    * v1.0.0 Initial version.

--------------------------------------------------------------------------------

```{r eval = TRUE}
sessionInfo()
```

# Saving environment

```{r Saving}
rm(backup.scRNAseqData)
rm(scRNAseqData, scRNAseqDataCEA39)

save.image(paste0(PROJECT_loc, "/",Today,".",PROJECTNAME,".AESCRNA.results.RData"))

```

+---------------------------------------------------------------------------------------------------------------------------------------+
| <sup>© 1979-2022 Sander W. van der Laan | s.w.vanderlaan[at]gmail.com [swvanderlaan.github.io](https://swvanderlaan.github.io).</sup> |
+---------------------------------------------------------------------------------------------------------------------------------------+