Skip to content

ptavaressilva/data-preparation

Repository files navigation

Data preparation and transformation exercise

The objective of this exercise is to practice various steps of data preprocessing and feature engineering.

The scenario is the preparation of data for a ML multilinear regressions.

The dataset used is the "Climate Weather Surface of Brazil - Hourly", wich is available at Kaggle.

It contains hourly climate data taken from weather stations in Brasil, taken between 2000 and 2021.

This exercise is broken down as follows:

Part I

  1. Load data
  2. Inspect data

Part II

  1. Format features
  2. Clean messy data
  3. Remove duplicate values

Part III

  1. Treat missing values
  2. Imputation

Part IV

  1. Remove strongly correlated features
  2. Remove outliers

Part V

  1. Aggregate features
  2. Encode categorical features
  3. Feature scaling
  4. Dimensionality reduction and feature decomposition

Part VI

  1. Sample and balance

About

Data preparation and transformation exercise

Resources

License

Stars

Watchers

Forks

Packages

No packages published