Skip to content

ricber/data-science-course

Repository files navigation

Practical Data Science Course Lessons

This is the teaching material that I use for laboratory lessons on practical data science for the following courses:

  • Data Analytics for Smart Agriculture (I semester, Politecnico di Milano, Milan campus)
  • Data Harvesting and Data Analysis for Agriculture (II semester, Politecnico di Milano, Cremona campus)

Within each directory, you will find a theory notebook that is extensively commented on the respective lesson topic, along with a homework directory containing exercises and their solutions. The comprehensive index of course topics is provided below.

01 - Python

  • What is programming?
  • Python
  • Variables and Types
  • Lists
  • Tuples
  • Basic Operators
  • Conditions
  • Loops
  • Functions
  • Dictionaries
  • Classes and Objects
  • Modules and Packages
  • Basic String Operations
  • String Formatting

02 - NumPy

  • What is NumPy?
  • NumPy Arrays
  • Array Operations
  • Array Slicing and Indexing
  • Array Reshaping
  • Array Stacking and Concatenation
  • Random Numbers
  • Unique Items and Counts
  • Adding and Removing Dimensions

03 - Pandas

  • What is Pandas?
  • Pandas Data Structures
  • Data Import and Export
  • Data Exploration
  • Indexing and Selecting Data
  • Assigning Data
  • Adding and deleting columns
  • Grouping
  • Merging

04 - Exploratory Data Analysis (EDA)

  • What is Exploratory Data Analysis (EDA)?
  • Preliminary Exploration
  • Descriptive Statistics
  • Data Visualization
  • Pandas, Seaborn or Matplotlib?
  • Summary of functions

05 - Data Preparation

  • What is Data Preparation?
  • Missing values
  • Figure out why the data is missing
  • Dealing with missing values
  • Drop missing values
  • Imputation
  • Imputation with scikit-learn
  • Missing indicators
  • Feature scaling
  • Parsing dates
  • Inconsistent data entry

06 - Feature Engineering

  • What is Feature Engineering?
  • Handling categorical variables
  • Creating features
  • Principal Component Analysis
  • Feature selection
  • Mutual information

07 - Regression

  • What is Supervised Learning and Regression?
  • What is Linear Regression?
  • Why to use Linear Regression?
  • How to use Linear Regression?
  • Linear Regression Equations
  • Linear Regression with Scikit-Learn
  • Least Squares Method
  • Model Building
    • Train-Validation-Test split
  • Model Evaluation
  • Linear Regression Assumptions
  • Considerations of Multiple Linear Regression
    • Overfitting
    • Multicollinearity
  • Polynomial Regression
  • Regularization Techniques
  • Model Selection
    • Cross-validation
  • Hypothesis Testing
  • k-Nearest Neighbors Regression

08 - Classification

  • What is Classification?
  • What is Logistic Regression?
  • Linear Regression for Classification
  • Simple Logistic Regression
  • Multinomial Logistic Regression
  • Model Evaluation
  • Visualize Predictions and Decision Boundaries
  • Polynomial Logistic Regression
  • Regularization
  • k-Nearest Neighbors Classification

09 - Decision Trees

  • What are Decision Trees?
  • How to build Decision Trees?
  • Comparison with other models
  • How do decision trees work?
  • Class-imbalanced datasets
  • Ensemble methods
    • Bagging
    • Random Forest
    • Boosting

10 - Clustering

  • What is Clustering?
  • Distance Metrics
  • Standardization for Clustering
  • Agglomerative (or Hierarchical) Clustering
    • Linkage Matrix
    • The Dendrogram
    • Linkage Methods
  • K-Means Clustering
  • DBSCAN Clustering
  • Evaluation Metrics for Clustering
  • Deciding the Number of Clusters
  • Comparing Clustering Algorithms on Synthetic Data