Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pickle with json serialization #10

Open
kostaleonard opened this issue Nov 28, 2021 · 1 comment
Open

Replace pickle with json serialization #10

kostaleonard opened this issue Nov 28, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@kostaleonard
Copy link
Owner

Pickle has known arbitrary code execution vulnerabilities. These vulnerabilities are mitigated by the fact that we are only unpickling objects from local files and S3 buckets that the user trusts; however, there is still a risk that the user inadvertently unpickles a malicious object that they did not produce themselves. Changing the serialization method from pickle to json would provide a higher level of security to users.

@kostaleonard kostaleonard added the enhancement New feature or request label Nov 28, 2021
@kostaleonard kostaleonard added this to the Second release milestone Nov 28, 2021
@kostaleonard kostaleonard self-assigned this Nov 28, 2021
@kostaleonard
Copy link
Owner Author

This is a tough issue. The whole purpose of pickling the data processor object is so that you can execute arbitrary code; there could be a potential workaround to preserve the data processor state in such a way that future examples are preprocessed precisely according to the data processor standards at the time of versioned dataset publication (not necessarily in line with any git commit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant