Skip to content

Integrating multi-omics data coming from the same physical source (e.g. patient) by taking into account the chromatin configuration of the genome, i.e. the topologically associ-ating domains (TADs)

License

Notifications You must be signed in to change notification settings

npechl/InterTADs

Repository files navigation

InterTADs

InterTADs is an open-source tool written in R, for integrating multi-omics data (e.g. DNA methylation, expression, mutation) from the same physical source (e.g. patient) taking into account the chromatin configuration of the genome, i.e. the topologically associating domains (TADs).

Installation

You can simply clone the repository by using git:

git clone https://github.com/nikopech/InterTADs

Before running any scripts, make sure the following packages are installed in your machine:

install.packages(c("data.table", "tidyverse", "gplots", "png", "gghalves"))
devtools::install_github("stephenturner/annotables")

...and from Bioconductor:

BiocManager::install(c("TxDb.Hsapiens.UCSC.hg19.knownGene", "TxDb.Hsapiens.UCSC.hg38.knownGene", "GenomicRanges", "org.Hs.eg.db", "systemPipeR", "karyoploteR"))

Usage

There are three main scripts for integrating your multi-omics data:

  • Data_Integration.R
  • TADiff.R
  • Visualization.R

Data Integration

For the Data Integration part, all datasets are separated into two folders, freq and counts, based on the information they are carrying (frequency or score count values).

The two folders are placed into a directory, along with a meta-data file which provides information about the mapping between the columns for each dataset. For more details regarding the structure of this file please see here.

The script allows the user to define different folder (or file) names. Moreover, the user can choose a folder name for the output table and a option about the Human Genome that is being used (accepted values are hg19 or hg38).

Once every input is provided, the script can be run by:

source("Data_Integration.R")

TADiff

For the TADiff part, the paths to the input and output folders must be provided. Also a BED file is needed containing information about the TADs. In order to run the script:

source("TADiff.R")

Visualization

For the visualization of the results, the paths to input and output data need to be provided:

source("Visualization.R")

Data

The proposed method was evaluated on data from Chronic lymphocytic leukemia (DNA methylation and expression values). The datasets have been deposited in the ArrayExpress database at EMBL‐EBI under the accession numbers E‐MTAB‐6955 and E‐MTAB‐6962, respectively.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details

About

Integrating multi-omics data coming from the same physical source (e.g. patient) by taking into account the chromatin configuration of the genome, i.e. the topologically associ-ating domains (TADs)

Topics

Resources

License

Stars

Watchers

Forks

Languages