Scale, Normalize or Standardize ? #14

pasquierjb · 2018-06-05T13:54:53Z

At the moment the features are standardized before the evaluation loops (mean removal and dividing by variance) with the following:
data_features = (data_features - data_features.mean()) / data_features.std() in master.py

And they are also normalized (mean removal and dividing by l2-norm) in each cross-validation fold with the following:
model = Ridge(normalize=True) in modeller.py

This is not optimal because:

Normalization cancels out Standardization in the Ridge regression
All data transformations should be done independantly in the cross-validation folds
Normalization is done for Ridge and not for the other models (this is not necessarly an issue)

Strangely for some configs (2000 for example) removing the normalization in the Ridge Regression impacts a lot the results (R2 from 20% to 0%)!

A possibility to implement more complexed transformations in cross-validation fold is to use the Pipeline class of sklearn. For example to perform scaling (between 0 and 1) and Ridge, we would do:

model = Ridge()
minmax_scaler = MinMaxScaler()
pipeline = make_pipeline(minmax_scaler, model)
scores = cross_val_score(pipeline, X, y)

However, my attempts to combine Normalization and Ridge in a piepline have led to very different results compared to using the normalize=True argument of the Ridge regression...

The text was updated successfully, but these errors were encountered:

pasquierjb · 2018-08-17T14:54:12Z

@lorenzori I changed the standardization (dividing by std) of the features to a max normalization (dividing by the max) to fix the problem of outliers in the features. The impact on R2 in Mali was minimum but this does not solve the problem of applying a different re-scaling between the evaluation and the scoring set.

pasquierjb assigned pasquierjb and lorenzori Jun 5, 2018

lorenzori added the enhancement label Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale, Normalize or Standardize ? #14

Scale, Normalize or Standardize ? #14

pasquierjb commented Jun 5, 2018

pasquierjb commented Aug 17, 2018

Scale, Normalize or Standardize ? #14

Scale, Normalize or Standardize ? #14

Comments

pasquierjb commented Jun 5, 2018

pasquierjb commented Aug 17, 2018