Skip to content

This is a class project that analyzes cardiovascular data using empirical statistics.

Notifications You must be signed in to change notification settings

farhanarrafi/empirical-analysis-cardiovascular-data

Repository files navigation

CardioVascular Data Analysis - An Empirical Study Using Statistical Methods

Introduction

This project aims to provide a detailed analysis of Patient Data and their relation to different stages of Hypertension. In this study, we have analyzed the relation of a subject’s weight, systolic blood pressure, and diastolic blood pressure to the Hypertension stage of the subject. In this project, we have only used Hypertension Stage 1 and Hypertension Stage 2 as target categories.

Important: To run the notebook you will have to get an API Token from Kaggle. Follow the instructions to run the notebook:

Instructions

  1. Create a new API Token.
  2. Download the API Token and open the (downloaded) JSON file as a text file.
  3. Copy the key from the JSON file and replace the following line:
{"username":"farhanarrafi","key":"get_a_key_from_kaggle_to_run_the_notebook"}
  1. Run the notebook.

Variables Used in this Analysis

  1. Weight - The weight of the subject in kilogram.
  2. Systolic Pressure - The maximum blood pressure during contraction of the ventricles.
  3. Diastolic Pressure - The minimum blood pressure recorded just before the next contraction.

Exploratory Data Analysis

Weight Density

Systolic Blood Pressure Density

Systolic Blood Pressure Density

Analysis

Regression line of Weight

Weight Regression Line

Regression using 2 variables Systolic and Diastolic Blood Pressure

Systolic and Diastolic Blood Pressure Regression

Regression using 3 variables Weight, Systolic, and Diastolic Blood Pressure

Weight, Systolic and Diastolic Blood Pressure Regression

Results

Using only Systolic BP and Diastolic BP provides better predictions than using all three - Weight, Systolic BP, and Diastolic BP.

For more details, you can check the final presentation.

Results

Dataset Source

For this project, we have collected the data from the Kaggle dataset - “Cardiovascular Disease by Aidan”. As per the information provided, this data represents consolidated data from two sources:

  1. UCI Machine Learning Repository - Heart Disease Dataset
  2. Kaggle - Heart Disease Dataset by YasserH

In the original dataset, there are about 68000 rows of data. However, to keep our analysis simple we have preserved the 2000 rows using random selection and discarded the rest of the data.

Statistical Methods used:

  1. Hypothesis Testing
  2. Proportion, Mean, Standard Deviation, Variance Analysis
  3. Correlation between variables
  4. Univariate and Multivariate Regression
  5. Determination Coefficient $R^2$ Analysis

Contribution

In this project, three other people also worked with different combinations of variables.

  1. Sushant Thapa - You can check their work on other variables here.
  2. Harika Prathipati - You can check their work on other variables here.
  3. Lokesh Mylavarpu

References

  1. Patricia S. Abril and Robert Plant, 2007. The patent holder's dilemma: Buy, sell, or troll? Commun. ACM 50, 1 (Jan, 2007), 36-44. DOI: https://doi.org/10.1145/1188913.1188915.
  2. Clinical Methods: The History, Physical, and Laboratory Examinations. 3rd edition, Walker HK, Hall WD, Hurst JW, editors. Boston: Butterworths; 1990, Chapter 16.

About

This is a class project that analyzes cardiovascular data using empirical statistics.

Topics

Resources

Stars

Watchers

Forks