Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Handling predictions too big for the database #147

Open
aboucaud opened this issue Sep 24, 2018 · 0 comments
Open

[RFC] Handling predictions too big for the database #147

aboucaud opened this issue Sep 24, 2018 · 0 comments

Comments

@aboucaud
Copy link
Contributor

aboucaud commented Sep 24, 2018

This is a summary of a discussion we just had with @kegl on which we'd like to have comments, opinions, ideas @jorisvandenbossche @glemaitre @agramfort.

In the close future, we might be faced with RAMP problems whose target dimension is too big to be handled by the existing workflow without making the database explode. Simple example is an image-to-image workflow. These problems need a huge training / testing sample, making each predictions equally as big (order of a few Gb), while the current database size is 100 Gb.

Which brings us down to two options:

  1. modify the database model and migrate it,
  2. find a smart way of storing and scoring the predictions for these specific problems.

We would like for now to avoid option 1 if possible, so here is our take on option 2.

Since the target is a pixel-by-pixel prediction, we would sample the prediction, e.g. take a sub-grid of pixels to compute the score. To avoid cheating, we would use a different random sub-grid for the public and the backend datasets.
Practically, this would mean creating a specific SamplingScore class which uses a hash of the input dataset as a seed to generate the scoring grid. It then passes the grid to the scoring method in y_pred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant