Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKLearn Pipeline calculate sensitivity of categorical features for Local DP #389

Open
grilhami opened this issue Aug 31, 2021 · 2 comments
Assignees
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase

Comments

@grilhami
Copy link
Contributor

grilhami commented Aug 31, 2021

Feature Description

The current SKLearn Pipeline noise mechanism "operator" for Local DPassumes that the dataset given contains only all numerical features. This means that noise is calculated on top of the sensitivity calculation on numerical features.

However, most often, datasets also contain categorical features, which requires a different method to calculate the sensitivity. The "operator" should also support categorical features.

This applies to all the noise mechanisms: LaplaceMechanism, GaussianMechanism, and GeometricMechanism.

Note: as far as this issue was created, only LaplaceMechanism has been implemented, so it's a good starting point to start with LaplaceMechanism. Once GeometricMechanism and GaussianMechanism have been implemented, the specifications for categorical feature support are the same.

Additional Context

Preferably, the support for the categorial features would be in the form of parameters for the "operator" class.

For example, in the case of LaplaceMechanism, it would look something like this:

# Set a privacy budget accountant
accountant = BudgetAccountant(10000)

# Set sensitivity function for numerical data
sensitivity = lambda x: (max(x) - min(x))/ (len(x) + 1)

# Set sensitivity function for categorical data
sensitivity_cat = lambda x: ...

# Indecies of the categorical features in the dataset
cat_features = [0, 1, ...]

# Set laplace mechanism with epsilon, sensitivity, and accountant
laplace = LaplaceMechanism(
    epsilon=0.1, 
    sensitivity=sensitivity, 
    accountant=accountant,
    sensitivity_cat=sensitivity_cat,
    cat_features=cat_features
)

# Initialize scaler and naive bayes extimator
scaler = StandardScaler()
nb = GaussianNB()

# Create the pipeline
pipe = Pipeline([('scaler', scaler), ('laplace', laplace), ('nb', nb)])

For more examples, please have look at the notebook example of Laplace Mechanism's implementation.

As starting guidance, please refer to the source code for LaplaceMechanism in here.

@grilhami grilhami added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Aug 31, 2021
@grilhami
Copy link
Contributor Author

I'm handling this one

@grilhami
Copy link
Contributor Author

Support for the Laplace mechanism has been added in PR #403

For Gaussian and Geometrical, still waiting for #387 and #388 to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

No branches or pull requests

1 participant