Merge pull request #340 from OpenMined/feature/machine-learning

Added Naive Baye Support
OpenMined · Apr 4, 2021 · b1d6ba6 · b1d6ba6
2 parents de7094f + 30a7897
commit b1d6ba6
Show file tree

Hide file tree

Showing 21 changed files with 3,222 additions and 19 deletions.
diff --git a/examples/Naive_Bayes_Iris_demo/PyDP Naive Bayes.ipynb b/examples/Naive_Bayes_Iris_demo/PyDP Naive Bayes.ipynb
@@ -0,0 +1,219 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Implementation of Naive Bayes in PyDP\n",
+ "\n",
+ "*Source*: [Differentially Private Na¨ıve Bayes Classification](https://www.researchgate.net/profile/Anirban-Basu/publication/262254729_Differentially_Private_Naive_Bayes_Classification/links/55dfa68208ae2fac4718fdfd/Differentially-Private-Naive-Bayes-Classification.pdf)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Cite the paper here\n",
+ "# Mentioned any part of the paper realted to the implementation\n",
+ "# Some analysis of the data that is being used\n",
+ "from sklearn import datasets\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "dataset = datasets.load_iris()\n",
+ "X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In order to use the Naive Bayes algorithm with PyDP, we only need to import the `GaussianNB` class from the PyDP's package like the following:\n",
+ "\n",
+ "`from pydp.ml.naive_bayes import GaussianNB`\n",
+ "\n",
+ "The implementaion is inherited from scikiet learn's Naive Bayes class. Some attributes and methods have been modified to support privacy guarentees. \n",
+ "\n",
+ "The following parameters can be adjust according to the use of the algorithm:\n",
+ "\n",
+ "- `epsilon`: Privacy parameter for the model. (float, default: 1.0)\n",
+ " \n",
+ "- `bounds`: Bounds of the data, provided as a tuple of the form (min, max). `min` and `max` can either be scalars, covering the min/max of the entire data, or vectors with one entry per feature. If not provided, the bounds are computed on the data when ``.fit()`` is first called, resulting in a :class:`.PrivacyLeakWarning`. (tuple, optional)\n",
+ " \n",
+ "- `priors`: Prior probabilities of the classes. If specified the priors are not adjusted according to the data. (array-like, shape (n_classes,))\n",
+ "\n",
+ "- `var_smoothing` Portion of the largest variance of all features that is added to variances for calculation stability. (float, default: 1e-9)\n",
+ "\n",
+ "- `probability`. Probability for a geometric distribution from which a sample will be drawn as noise for categorical features. (float, default: 1e-2)\n",
+ "\n",
+ "Source codes:\n",
+ "- [PyDP's Navie Bayes](https://github.com/OpenMined/PyDP/blob/feature/machine-learning/src/pydp/ml/naive_bayes.py)\n",
+ "- [Geometric Mechanism in PyDP Naive Bayes' implementation](https://github.com/OpenMined/PyDP/blob/feature/machine-learning/src/pydp/ml/mechanisms/geometric.py)\n",
+ "- [Laplace Mechanism in PyDP Naive Bayes' implementation](https://github.com/OpenMined/PyDP/blob/feature/machine-learning/src/pydp/ml/mechanisms/laplace.py)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "GaussianNB(accountant=BudgetAccountant(spent_budget=[(1.0, 0)]),\n",
+ " bounds=(array([4.3, 2. , 1. , 0.1]), array([7.5, 4. , 6. , 2. ])),\n",
+ " probability=0.002, var_smoothing=0.0001)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "epsilon = 1.0 # Privacy Budger\n",
+ "\n",
+ "lower = np.array([4.3, 2. , 1. , 0.1]) # lower bound of each feature's values\n",
+ "upper = np.array([7.5, 4. , 6. , 2.]) # upper bound of each feature's values\n",
+ "\n",
+ "priors = np.array([0.5, 0.5, 0.5]) # priors of each classes\n",
+ "\n",
+ "probability = 0.002 # probability for geometric distribution\n",
+ "\n",
+ "var_smoothing = 1e-4 # variance smoothing\n",
+ "\n",
+ "from pydp.ml.naive_bayes import GaussianNB\n",
+ "\n",
+ "clf = GaussianNB(epsilon=epsilon, bounds=(lower, upper),probability=probability, var_smoothing=var_smoothing)\n",
+ "clf.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([2, 0, 0, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 0, 0, 2, 2, 1, 0,\n",
+ " 2, 2, 0, 2, 0, 2, 2, 2])"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y_pred = clf.predict(X_test)\n",
+ "y_pred"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accuracy : 0.8\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "array([[ 9, 0, 0],\n",
+ " [ 0, 1, 6],\n",
+ " [ 0, 0, 14]])"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import confusion_matrix\n",
+ "from sklearn.metrics import accuracy_score \n",
+ "\n",
+ "cm = confusion_matrix(y_test, y_pred)\n",
+ "print (\"Accuracy : \", accuracy_score(y_test, y_pred))\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accuracy : 0.9666666666666667\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "array([[ 9, 0, 0],\n",
+ " [ 0, 7, 0],\n",
+ " [ 0, 1, 13]])"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Result from sklearn's version of GaussianNB\n",
+ "\n",
+ "from sklearn.naive_bayes import GaussianNB\n",
+ "\n",
+ "classifier = GaussianNB()\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "cm = confusion_matrix(y_test, y_pred)\n",
+ "print (\"Accuracy : \", accuracy_score(y_test, y_pred))\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}