Predicting regional mutation burden in cancer genomes using chromatin accessibility (CA) and replication timing (RT)
This repository includes source code, tutorials, and processed datasets for the study:
Chromatin accessibility of primary human cancers ties regional mutational processes and signatures with tissues of origin .
Oliver Ocsenas and Jüri Reimand (2022) in revision.
Tutorials - Jupyter notebooks
- 1_BigWigtoWindow.ipynb - mapping chromatin signals to megabase-scale windows
- 2_MAFtoWindow.ipynb - mapping cancer mutations to megabase-scale windows
- 3_CA2M_RF.ipynb - random forest models of megabase-scale mutation burden, chromatin accessibility and replication timing
- 4_CA2M_RF_FeatureSelection_Tutorial.ipynb - selecting significant features predicting mutation rates
- 5_CA2M_RF_SHAPscores.ipynb - computing feature importance scores (SHAP)
- 6_CA2M_RF_EnrichedMutations_Tutorial.ipynb - detecting genomic regions with enriched mutations that are not explained by chromatin and replication timing alone
Tutorials/data - files needed for tutorials
- All_CA_RT_100KB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 100-kbps resolution
- All_CA_RT_1MB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 1-Mbps resolution
- NormalCA_RT_MBscale.csv.gz - CA and RT tracks for normal tissues and cell lines, 1-Mbps resolution
- PCAWG_SNVbinned_100KB_scale.csv.gz
- PCAWG_SNVbinned_MBscale.csv.gz - mutation burden in whole cancer genomes, 1-Mbps resolution
- PCAWG_breastcancer_SNV.MAF.gz - example file of somatic mutations in breast cancer for creating files above
- SHAP_plot.pdf - example plot of feature importance scores (SHAP)
- TCGA_BRCA_ATACSeq_chr1_2.bw - example file of chromatin accessibility in breast cancer for creating files above (chrs 1-2 only)
- TumorCA_RT_MBscale.csv.gz - CA and RT tracks for cancer samples, 1-Mbps resolution
All_code - entire code repository for the project; use on your own responsibility
Contact: oocsenas [@] oicr.on.ca ; juri.reimand [@] utoronto.ca