Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] distribution transformations, calibration #321

Open
fkiraly opened this issue May 13, 2024 · 3 comments
Open

[ENH] distribution transformations, calibration #321

fkiraly opened this issue May 13, 2024 · 3 comments
Labels
API design API design & software architecture enhancement module:probability&simulation probability distributions and simulators module:transformations transformations module: feature extraction, pre-/post-processing

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2024

Discussion with @benHeid on probability calibration indicates that we may like to have another special category of transformations: distribution-to-distribution, possibly with a secondary input being samples.

Examples:

  • quantile calibration - fit(X=y_proba, y_true), where y_proba are proba predictions, and y_true is a calibration sample. Both y_proba and y_true have 2D shape (N, d).
  • model estimation, distribution fitting - fit(X).transform(X) produces a distribution of same shape as X. If X is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row of X. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface with estimate - can be row, scalar - and transform - always array - can be helpful here.
  • distribution smoothing or simplification - e.g., fit a closeby semi-paramtric or parametric distribution to another distribution. For instance, replace an Empirical by a QPD.
@fkiraly fkiraly added enhancement module:probability&simulation probability distributions and simulators module:transformations transformations module: feature extraction, pre-/post-processing API design API design & software architecture labels May 13, 2024
@benHeid
Copy link

benHeid commented May 13, 2024

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles? E.g. like: https://scikit-learn.org/stable/modules/calibration.html

@benHeid
Copy link

benHeid commented May 13, 2024

  • quantile calibration - fit(X=y_proba, y_true), where y_proba are proba predictions, and y_true is a calibration sample. Both y_proba and y_true have 2D shape (N, d).

Why should have y_true the same shape as y_proba? I would assume that y_proba needs to have an additional dimension since it contains predicted quantiles while y_trueare the actual values.

  • model estimation, distribution fitting - fit(X).transform(X) produces a distribution of same shape as X. If X is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row of X. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface with estimate - can be row, scalar - and transform - always array - can be helpful here.
  • distribution smoothing or simplification - e.g., fit a closeby semi-paramtric or parametric distribution to another distribution. For instance, replace an Empirical by a QPD.

regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.

In general, I think such a transformers would be very useful.

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 14, 2024

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?

Exactly!

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?

I am considering the distribution objects, as inheriting from BaseDistribution - the parameterization by quantiles is dealt with, e.g., via the iterable alpha argument of the quantile method, so at the object level, you only have two relevant dimensions.

The quantile returns will indeed have three dimensions, one more than y_true.

regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.

Sure - one of the dev meetings? It is probably not clear in this brevity, possibly I need to write an API design proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture enhancement module:probability&simulation probability distributions and simulators module:transformations transformations module: feature extraction, pre-/post-processing
Projects
None yet
Development

No branches or pull requests

2 participants