Skip to content

YangLabHKUST/MRbenchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking Mendelian Randomization methods for causal inference using genome‐wide association study summary statistics

The experimental design for benchmarking MR methods

We present a benchmarking analysis of MR methods for causal inference with real-world genetic datasets. Our focus is on MR methods that utilize GWAS summary statistics as input, as they do not require access to individual-level GWAS data and are widely applicable. Specifically, we consider 16 MR methods, including the standard IVW (fixed), IVW (random) and 14 other advanced MR methods: dIVW, Egger, RAPS, Weighted-median, Weighted-mode, MR-PRESSO, MRMix, cML-MA, MR-Robust, MR-Lasso, MR-CUE, CAUSE, MRAPSS and MR-ConMix (Figure A). The procedure for running the MR methods is outlined in Figure B. To assess the performance of these MR methods, we utilized real-world datasets and focused on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power (Figure C). My Image

Datasets

GWAS sources

The original GWAS datasets used in this study are summarized in Table GWASs.xlsx. You can access the original GWAS datasets directly through the download links provided in the table. The formatted datasets used in this study are provided below.

Dataset 1: GWASATLAS Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs for exposures; Formatted GWASs for outcomes; Formatted IV data for MR analysis;

Dataset 2: the Neal Lab Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs; Formatted IV data for MR analysis.

Dataset 3: the Pan UKBB Dataset for evaluation of type I error control in confounding scenario (a): Population stratification

Formatted GWASs; Formatted IV data for MR analysis.

Dataset 4: the dataset for evaluation of type I error control in confounding scenario (b): Pleiotropy

Formatted GWASs; Formatted IV data for MR analysis

Dataset 5: the dataset for evaluation of type I error control in confounding scenario (c): Family-level confounders

Formatted GWASs; Formatted IV data for MR analysis

Dataset 6: the dataset for evaluation of the accuracy of causal effect estimates

Formatted GWASs; Formatted IV data for MR analysis;

Dataset 7: the dataset for evaluation of replicability

Formatted GWASs; Formatted IV data for MR analysis;

Notes:

(1) "Formatted GWASs" refers to the formatted summary-level data files generated after quality control from the original GWAS datasets. (2) "Formatted IV data for MR analysis" contains the following three types of files:
"Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;
"MRdat": refers to the summary statistics of LD clumped IV sets for each trait pair tested which can be directed used for MR analysis;
"bg_paras": refers to the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.
(3) The details on data preprocessing including quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found in the supplementary note of our paper[1].
Implementation details on data preprocessing can be found in the MR-APSS software tutorial on MR-APSS GitHub website.

R code

Install required packages

#install.packages("devtools") #install.packages("remotes")

devtools::install_github("gqi/MRMix")

devtools::install_github("xue-hr/MRcML")

devtools::install_github("jean997/[email protected]")

devtools::install_github("rondolab/MR-PRESSO")

install.packages("MendelianRandomization")

devtools::install_github("YangLabHKUST/MR-APSS")

devtools::install_github("QingCheng0218/MR.CUE@main")

remotes::install_github("MRCIEU/TwoSampleMR")

devtools::install_github("qingyuanzhao/mr.raps")

install.packages(“robustbase”)

Run MR Methods

We perform IV selection for each trait pair in each dataset. The R code for IV selection is available in IV_selection.R.

We then applied each compared method using the dataset after IV selection. The R codes for running the 15 MR methods for each dataset are available in main_run_MR_methods.R. To run the codes of main_run_MR_methods.R, you must load the required packages and the R functions in the folder Rfuncs.

Results of MR methods

Results for dataset 1;
Results for dataset 2;
Results for dataset 3;
Results for dataset 4;
Results for dataset 5;
Results for dataset 6;
Results for dataset 7.

updates

The datasets underwent a recent reorganization on September 24, 2024.

Reference

Xianghong Hu, Mingxuan Cai, Jiashun Xiao, Xiaomeng Wan, Zhiwei Wang, Hongyu Zhao, Can Yang, Benchmarking Mendelian randomization methods for causal inference using genome-wide association study summary statistics, The American Journal of Human Genetics, 2024. [link]; [medrxiv version].

Contact information

Please feel free to contact Xianghong Hu ([email protected]) or Prof. Can Yang ([email protected]) if any questions.