Skip to content

A repo that takes you through some principles about data privacy based on the Kenya Data Protection Act and General Data Protection Regulation. Useful for a data person.

License

Notifications You must be signed in to change notification settings

Shuyib/data-privacy-pres

Repository files navigation

This is a presentation about Data privacy and anonymization that was held in Africa's Talking Ltd. Mostly on a data person level by that I mean those who work with data and those who are working with data person. You can simulate data to make the insurance data set. See the folder layout to learn how to do it.

Folders:
.
├── codebook - this folder has a description of the simulated dataset. Particularly what the columns of the dataframe mean.
│   ├── Insurance_data_ke.txt - this was created with CSVkit (csvstat) function.
│   └── insurance_report.html - this is generated by pandas profiling library. A short cut in doing Exploratory data analysis fast.
├── data - directory where the simulated data should be placed. Run utils/dataloader.py to generate it.
│   ├── feature_engineered_insurance2.csv - data which has undergone feature engineering used in the demo.
│   ├── feature_engineered_insurance.csv - data which was created for the same problem but has issues. Create a new one.
│   ├── Insurance_data_ke.csv - The insurance dataset created by running python utils/dataloader.py
│   ├── Insurance_data_ke_featureeng.csv - Insurance dataset created as an intermediate step for feature engineering.
│   └── Organs.csv - Single patient data who was recovering from surgery from a heart disease. Just contains data about their vitals from a thermometer, pulse oximeter.
├── Dockerfile - a blueprint to run the project in a reproducible way see. # How to run in docker image.
├── environment.yml - a conda virtual environment file.
├── Kenya Data Protection Act - Quick Guide 2021.pdf - a demo for privacy engineering strategy at Deloitte.
├── Makefile - workflow orchestrator. Helps automating code formating and running repetitive tasks.
├── presentation - this directory has the presentations that were used live.
│   ├── presentation.pdf - HTML to PDF using LaTeX.
│   └── presentation.slides.html - reveal.js presentation. Open with your browser.
├── presentation.ipynb - jupyter notebook with jupyter notebook extensions and reveal.js extension.
├── README.md - the file you are reading.
├── requirements.txt - what packages were used.
├── Screenshot from 2022-09-10 07-03-38.png - demo of PCA using the iris dataset.
└── utils - Scripts used to generate the simulated data
├── codebook.sh - this is bash script used to create the codebook Insurance_data_ke.txt
├── dataloader.py - data generator that uses methods from the faker library and numpy.
├── Feature_engineering.ipynb - a feature engineering workflow that I use for making the insurance dataset ready for statistical modeling aka machine learning.

Warning: The diffprivlib doesn't seem to be working with the standard dockerfile. You may want to edit the dockerfile or make the environment.yml file.

How to make the conda environment locally

If you have anaconda/miniconda. In the data-privacy-pres directory, complete the following steps.

  1. Create the virtual environment
conda env create -f environment.yml 
  1. This will create an environment called data-privacy-env. You can activate it like this.
source activate data-privacy-env

How to run the docker image

Build docker image

sudo docker build -t data-privacy-env:v1 .

Run the docker image

sudo docker run -p 9999:9999 data-privacy-env:v1

References

About

A repo that takes you through some principles about data privacy based on the Kenya Data Protection Act and General Data Protection Regulation. Useful for a data person.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published