Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decision Trees with Differential Privacy #355

Open
4 tasks
stoic-signs opened this issue Apr 13, 2021 · 0 comments
Open
4 tasks

Decision Trees with Differential Privacy #355

stoic-signs opened this issue Apr 13, 2021 · 0 comments
Labels
gsoc Google Summer of Code Project Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues Type: Research 🔬 When further investigation into a subject is required

Comments

@stoic-signs
Copy link

stoic-signs commented Apr 13, 2021

Feature Description

This feature will eventually add Decision Trees into the PyDP library, along with the necessary control mechanism needed to use it as part of a pipeline.

Is your feature request related to a problem?

No, it is due to a lack of support.

What alternatives have you considered?

No alternatives currently exist.

Are you interested in working on this yourself?

Yes.

Additional Context

Given that scope of this issue is widespread, it will eventually be broken down into smaller issues. Here's an outline of the functionalities to be implemented:

  • Base Decision Tree Model

  • The base Decision Tree model: a vanilla ID3 based decision tree.

  • ID3 construction algorithm

  • Differentially Private algorithm: to construct and update the tree in a differentially private way.

  • References:

    • Semi-Supervised Learning Approach to Differential Privacy by G. Jagannathan, C. Monteleoni, and K. Pillaipakkamnatt
    • A Practical Differentially Private Random Decision Tree Classifier by Geetha Jagannathan, Krishnan Pillaipakkamnatt, and Rebecca N. Wright.
    • Differentially Private Random Decision Forests using Smooth Sensitivity by Sam Fletcher, Md Zahidul Islam
  • Differentially Private Bagging

  • An algorithm to partition data multiple times in order to achieve differentially private subsample-and-aggregate.

  • References:

    • Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate by James Jordon, Jinsung Yoon, Mihaela van der Schaar
  • Support for Horizontally and Vertically Partitioned Data

  • This will extend the functionality of the Private Decision Tree to be able to work with horizontally distributed data (by means of incremental learning) and vertically distributed data.

  • Vertically distributed data requires each stakeholder to construct a differentially private decision tree (assuming no overlap in the attributes) and makes a union.

  • References:

    • A Practical Differentially Private Random Decision Tree Classifier by Geetha Jagannathan, Krishnan Pillaipakkamnatt, and Rebecca N. Wright.
  • Boosted Differential Private ensembles

  • Add boosting to differentially private decisions trees to enable ensemble formation.

  • References:

    • Boosted and Differentially Private Ensembles of Decision Trees by Richard Nock, Wilko Henecka

More details can be found here as well as in the reference papers.

@stoic-signs stoic-signs added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Apr 13, 2021
@chinmayshah99 chinmayshah99 added gsoc Google Summer of Code Project Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues Type: Research 🔬 When further investigation into a subject is required and removed Type: New Feature ➕ Introduction of a completely new addition to the codebase labels Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Google Summer of Code Project Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues Type: Research 🔬 When further investigation into a subject is required
Projects
None yet
Development

No branches or pull requests

2 participants