[ENH] distribution transformations, calibration #321

fkiraly · 2024-05-13T14:06:21Z

Discussion with @benHeid on probability calibration indicates that we may like to have another special category of transformations: distribution-to-distribution, possibly with a secondary input being samples.

Examples:

quantile calibration - fit(X=y_proba, y_true), where y_proba are proba predictions, and y_true is a calibration sample. Both y_proba and y_true have 2D shape (N, d).
model estimation, distribution fitting - fit(X).transform(X) produces a distribution of same shape as X. If X is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row of X. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface with estimate - can be row, scalar - and transform - always array - can be helpful here.
distribution smoothing or simplification - e.g., fit a closeby semi-paramtric or parametric distribution to another distribution. For instance, replace an Empirical by a QPD.

The text was updated successfully, but these errors were encountered:

benHeid · 2024-05-13T19:47:26Z

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles? E.g. like: https://scikit-learn.org/stable/modules/calibration.html

benHeid · 2024-05-13T20:08:33Z

quantile calibration - fit(X=y_proba, y_true), where y_proba are proba predictions, and y_true is a calibration sample. Both y_proba and y_true have 2D shape (N, d).

Why should have y_true the same shape as y_proba? I would assume that y_proba needs to have an additional dimension since it contains predicted quantiles while y_trueare the actual values.

model estimation, distribution fitting - fit(X).transform(X) produces a distribution of same shape as X. If X is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row of X. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface with estimate - can be row, scalar - and transform - always array - can be helpful here.

distribution smoothing or simplification - e.g., fit a closeby semi-paramtric or parametric distribution to another distribution. For instance, replace an Empirical by a QPD.

regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.

In general, I think such a transformers would be very useful.

fkiraly · 2024-05-14T00:04:20Z

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?

Exactly!

I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?

I am considering the distribution objects, as inheriting from BaseDistribution - the parameterization by quantiles is dealt with, e.g., via the iterable alpha argument of the quantile method, so at the object level, you only have two relevant dimensions.

The quantile returns will indeed have three dimensions, one more than y_true.

regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.

Sure - one of the dev meetings? It is probably not clear in this brevity, possibly I need to write an API design proposal.

fkiraly added enhancement module:probability&simulation probability distributions and simulators module:transformations transformations module: feature extraction, pre-/post-processing API design API design & software architecture labels May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] distribution transformations, calibration #321

[ENH] distribution transformations, calibration #321

fkiraly commented May 13, 2024

benHeid commented May 13, 2024

benHeid commented May 13, 2024 •

edited by fkiraly

fkiraly commented May 14, 2024

[ENH] distribution transformations, calibration #321

[ENH] distribution transformations, calibration #321

Comments

fkiraly commented May 13, 2024

benHeid commented May 13, 2024

benHeid commented May 13, 2024 • edited by fkiraly

fkiraly commented May 14, 2024

benHeid commented May 13, 2024 •

edited by fkiraly