Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module 2 course notes #6

Merged
merged 22 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
6c1904d
Update ml-monitoring-metrics.md
dmaliugina Oct 10, 2023
1a1cc3f
Update ml-monitoring-setup.md
dmaliugina Oct 10, 2023
80091d7
Update ml-monitoring-architectures.md
dmaliugina Oct 10, 2023
b8369fc
Create readme.md
dmaliugina Oct 12, 2023
75068b4
Added images for Module 2
dmaliugina Oct 12, 2023
4339cae
Create evaluate-ml-model-quality.md
dmaliugina Oct 12, 2023
b59d99a
Create ml-quality-metrics-classification-regression-ranking.md
dmaliugina Oct 12, 2023
b67550f
Update ml-quality-metrics-classification-regression-ranking.md
dmaliugina Oct 12, 2023
f7e182f
Create ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
e3355a3
Update ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
6df945a
Create data-quality-in-ml.md
dmaliugina Oct 12, 2023
13ad922
Create data-quality-code-practice.md
dmaliugina Oct 12, 2023
d00d6ca
Create data-prediction-drift-in-ml.md
dmaliugina Oct 12, 2023
151e6f6
Update data-prediction-drift-in-ml.md
dmaliugina Oct 12, 2023
c439695
Create data-prediction-drift-code-practice.md
dmaliugina Oct 12, 2023
e9bb925
Update ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
3a3370b
Update data-quality-code-practice.md
dmaliugina Oct 12, 2023
3046ba1
Update ml-monitoring-metrics.md
dmaliugina Oct 12, 2023
38b651b
Delete docs/book/ml-observability-course/module-2-ml-monitoring-metri…
dmaliugina Oct 12, 2023
b8f66c0
Update ml-monitoring-architectures.md
dmaliugina Oct 12, 2023
849029c
Update README.md
dmaliugina Oct 12, 2023
0d4b896
Update SUMMARY.md
dmaliugina Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/book/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ The course starts on **October 16, 2023**. \
* **Newsletter**. [Sign up](https://www.evidentlyai.com/ml-observability-course) to receive weekly updates with the course materials.
* **Discord community**. Join the [community](https://discord.gg/PyAJuUD5mB) to ask questions and chat with others.
* **Course platform**. [Register](https://evidentlyai.thinkific.com/courses/ml-observability-course) if you want to submit assignments and receive the certificate. This is optional.
* **Code examples**. Will be published in this GitHub [repository](https://github.com/evidentlyai/ml_observability_course) throughout the course.
* **Code examples**. Will be published in this GitHub [repository](https://github.com/evidentlyai/ml_observability_course) throughout the course.
* **Enjoying the course?** [Star](https://github.com/evidentlyai/evidently) Evidently on GitHub to contribute back! This helps us create free, open-source tools and content for the community.


The course starts on **October 16, 2023**. The videos and course notes for the new modules will be released during the course cohort.

Expand Down
9 changes: 8 additions & 1 deletion docs/book/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,14 @@
* [1.3. ML monitoring metrics. What exactly can you monitor?](ml-observability-course/module-1-introduction/ml-monitoring-metrics.md)
* [1.4. Key considerations for ML monitoring setup](ml-observability-course/module-1-introduction/ml-monitoring-setup.md)
* [1.5. ML monitoring architectures](ml-observability-course/module-1-introduction/ml-monitoring-architectures.md)
* [Module 2: ML monitoring metrics](ml-observability-course/module-2-ml-monitoring-metrics.md)
* [Module 2: ML monitoring metrics](ml-observability-course/module-2-ml-monitoring-metrics/readme.md)
* [2.1. How to evaluate ML model quality](ml-observability-course/module-2-ml-monitoring-metrics/evaluate-ml-model-quality.md)
* [2.2. Overview of ML quality metrics. Classification, regression, ranking](ml-observability-course/module-2-ml-monitoring-metrics/ml-quality-metrics-classification-regression-ranking.md)
* [2.3. Evaluating ML model quality CODE PRACTICE](ml-observability-course/module-2-ml-monitoring-metrics/ml-model-quality-code-practice.md)
* [2.4. Data quality in machine learning](ml-observability-course/module-2-ml-monitoring-metrics/data-quality-in-ml.md)
* [2.5. Data quality in ML CODE PRACTICE](ml-observability-course/module-2-ml-monitoring-metrics/data-quality-code-practice.md)
* [2.6. Data and prediction drift in ML](ml-observability-course/module-2-ml-monitoring-metrics/data-prediction-drift-in-ml.md)
* [2.8. Data and prediction drift in ML CODE PRACTICE](ml-observability-course/module-2-ml-monitoring-metrics/data-prediction-drift-code-practice.md)
* [Module 3: ML monitoring for unstructured data](ml-observability-course/module-3-ml-monitoring-for-unstructured-data.md)
* [Module 4: Designing effective ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring.md)
* [Module 5: ML pipelines validation and testing](ml-observability-course/module-5-ml-pipelines-validation-and-testing.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,10 @@ When it comes to visualizing the results of monitoring, you also have options.

Each ML monitoring architecture has its pros and cons. When choosing between them, consider existing tools, the scale of ML deployments, and available team resources for systems support. Be pragmatic: you can start with a simpler architecture and expand later.

For a deeper dive into the ML monitoring architectures with specific code examples, head to [Module 5](ml-observability-course/module-5-ml-pipelines-validation-and-testing.md) and [Module 6](ml-observability-course/module-6-deploying-an-ml-monitoring-dashboard.md).
For a deeper dive into the ML monitoring architectures with specific code examples, head to [Module 5](../module-5-ml-pipelines-validation-and-testing.md) and [Module 6](../module-6-deploying-an-ml-monitoring-dashboard.md).

## Enjoyed the content?

Star Evidently on GitHub to contribute back! This helps us create free, open-source tools and content for the community.

⭐️ [Star](https://github.com/evidentlyai/evidently) on GitHub!
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ The ultimate measure of the model quality is its impact on the business. Dependi

![](<../../../images/2023109\_course\_module1\_fin\_images.034.png>)

For a deeper dive into **ML model quality and relevance** and **data quality and integrity** metrics, head to [Module 2](ml-observability-course/module-2-ml-monitoring-metrics.md).
For a deeper dive into **ML model quality and relevance** and **data quality and integrity** metrics, head to [Module 2](../module-2-ml-monitoring-metrics/readme.md).
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,4 @@ While designing an ML monitoring system, tailor your approach to fit your specif
* Use reference datasets to simplify the monitoring process but make sure they are carefully curated.
* Define custom metrics that fit your problem statement and data properties.

For a deeper dive into the ML monitoring setup, head to [Module 4](ml-observability-course/module-4-designing-effective-ml-monitoring.md).
For a deeper dive into the ML monitoring setup, head to [Module 4](../module-4-designing-effective-ml-monitoring.md).

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# 2.8. Data and prediction drift in ML [CODE PRACTICE]

{% embed url="https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14" %}

**Video 8**. [Data and prediction drift in ML [CODE PRACTICE]](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14), by Emeli Dral

In this video, we walk you through the code example of detecting data drift and creating a custom method for drift detection using the open-source [Evidently](https://github.com/evidentlyai/evidently) Python library.

**Want to go straight to code?** Here is the [example notebook](https://github.com/evidentlyai/ml_observability_course/blob/main/module2/data_drift_deep_dive.ipynb) to follow along.

**Outline**:\
[00:00](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=0s) Create a working environment and import libraries\
[01:33](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=93s) Overview of the data drift options\
[04:25](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=265s) Evaluating share of drifted features\
[06:40](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=400s) Detecting column drift\
[11:47](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=707s) Set different drift detection method per feature type\
[12:57](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=777s) Set individual different drift detection methods per feature\
[15:34](https://www.youtube.com/watch?v=oO1K4CaWxt0&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=14&t=934s) Custom drift detection method

## Enjoyed the content?

Star Evidently on GitHub to contribute back! This helps us create free, open-source tools and content for the community.

⭐️ [Star](https://github.com/evidentlyai/evidently) on GitHub!
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# 2.6. Data and prediction drift in ML

{% embed url="https://www.youtube.com/watch?v=bMYcB_5gP4I&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=12" %}

**Video 6**. [Data and prediction drift in ML](https://www.youtube.com/watch?v=bMYcB_5gP4I&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=12), by Emeli Dral

## What is data drift, and why evaluate it?

When ground truth is unavailable or delayed, we cannot calculate ML model quality metrics directly. Instead, we can use proxy metrics like feature and prediction drift.

**Prediction drift** shows changes in the distribution of **model outputs** over time. Without target values, this is the best proxy of the model behavior. Detected changes in the model outputs may be an early signal of changes in the model environment, data quality bugs, pipeline errors, etc.

![](<../../../images/2023109\_course\_module2.058.png>)

**Feature drift** demonstrates changes in the distribution of **input features** over time. When we train the model, we assume that if the input data remains reasonably similar, we can expect similar model quality. Thus, data distribution drift can be an early warning about model quality decay, important changes in the model environment or user behavior, unannounced changes to the modeled process, etc.

![](<../../../images/2023109\_course\_module2.060.png>)

Prediction and feature drift can serve as early warning signs for model quality issues. They can also help pinpoint a root cause when the model decay is already observed.

![](<../../../images/2023109\_course\_module2.065.png>)

Some key considerations about data drift to keep in mind:
* **Prediction drift is usually more important than feature drift**. If you monitor one thing, look at the outputs.
* **Data drift in ML is a heuristic**. There is no “objective” drift; it varies based on the specific use case and data.
* **Not all distribution drift leads to model performance decay**. Consider the use case, the meaning of specific features, their importance, etc.
* **You don’t always need to monitor data drift**. It is useful for business-critical models with delayed feedback. But often you can wait.
* **Data drift helps with debugging**. Even if you do not alert on feature drift, it might help troubleshoot the decay.
* **Drift detection might be valuable even if you have the labels**. Feature drift might appear before you observe the model quality drop.

{% hint style="info" %}
**Further reading:** [How to break a model in 20 days. A tutorial on production model analytics](https://www.evidentlyai.com/blog/tutorial-1-model-analytics-in-production).
{% endhint %}

## How to detect data drift?

To detect distribution drift, you need to pick:
* **Drift detection method**: statistical tests, distance metrics, rules, etc.
* **Drift detection threshold**: e.g., confidence levels for statistical tests or numeric threshold for distance metrics.
* **Reference dataset**: what an exemplary distribution is.
* **Alert conditions**: e.g., based on feature importance and the share of the drifting features.

## Data drift detection methods

There are three commonly used approaches to drift detection:
* **Statistical tests**, e.g., Kolmogorov-Smirnov or Chi-squared test. You can use parametric or non-parametric tests to compare distributions. Generally, parametric tests are more sensitive. Using statistical tests for drift detection is best for smaller datasets and samples. The resulting drift “score” is measured by p-value (a “confidence” of drift detection).
* **Distance-based metrics**, e.g., Wasserstein distance or Jensen Shannon Divergence. This group of metrics works well for larger datasets. The drift “score” is measured as distance, divergence, or level of similarity.
* **Rule-based checks** are custom rules for detecting drift based on heuristics and domain knowledge. These are great when you expect specific changes, e.g., new categories added to the dataset.

Here is how the defaults are implemented in the Evidently open-source library.

**For small datasets (<=1000)**, you can use Kolmogorov-Smirnov test for numerical features, Chi-squared test for categorical features, and proportion difference test for independent samples based on Z-score for binary categorical features.

![](<../../../images/2023109\_course\_module2.070.png>)

**For large datasets (>1000)**, you might use Wasserstein Distance for numerical features and Jensen-Shannon divergence for categorical features.

![](<../../../images/2023109\_course\_module2.071.png>)

## Univariate vs. multivariate drift

The **univariate drift** detection approach looks at drift in each feature individually. It returns drift/no drift for each feature and can be easily interpretable.

The **multivariate drift** detection approach looks at the complete dataset (e.g., using PCA and certain methods like domain classifier). It returns drift/no drift for the dataset and may be useful for systems with many features.

You can still use the univariate approach to detect drift in a dataset by:
* Tracking the share (%) of drifting features to get a dataset drift decision.
* Tracking distribution drift only in the top model features.
* Combining both solutions.

## Tips for calculating drift

Here are some tips to keep in mind when calculating data drift:
* **Data quality is a must**. Calculate data quality metrics first and then monitor for drift. Otherwise, you might detect “data drift” that is caused by data quality issues.
* **Mind the feature set**. The approach to drift analysis varies based on the type and importance of features.
* **Mind the segments**. Consider segment-based drift monitoring when you have clearly defined segments in your data. For example, in manufacturing, you might have different suppliers of raw materials and need to monitor distribution drift separately for each of them.

## Summing up

We discussed the key concepts of data drift and how to measure it. When calculating data drift, consider drift detection method and thresholds, properties of reference data, and alert conditions.

Further reading: [How to break a model in 20 days. A tutorial on production model analytics](https://www.evidentlyai.com/blog/tutorial-1-model-analytics-in-production).

Up next: deep dive into data drift detection [OPTIONAL] and practice on how to detect data drift using Python and [Evidently](https://github.com/evidentlyai/evidently) library.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# 2.5. Data quality in ML [CODE PRACTICE]

{% embed url="https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11" %}

**Video 5**. [Data quality in ML [CODE PRACTICE]](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11), by Emeli Dral

In this video, we walk you through the code example of data quality evaluation using [Evidently](https://github.com/evidentlyai/evidently) Reports and Test Suites.

**Want to go straight to code?** Here is the [example notebook](https://github.com/evidentlyai/ml_observability_course/blob/main/module2/data_quality.ipynb) to follow along.

Here is a quick refresher on the Evidently components we will use:
* **Reports** compute and visualize 100+ metrics in data quality, drift, and model performance. You can use in-built report presets to make visuals appear with just a couple of lines of code.
* **Test Suites** perform structured data and ML model quality checks. They verify conditions and show which of them pass or fail. You can start with default test conditions or design your testing framework.

**Outline**:\
[00:00](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11&t=0s) Create a working environment and import libraries\
[01:30](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11&t=90s) Prepare reference and current dataset\
[05:20](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11&t=320s) Run data quality Test Suite and visualize the results\
[09:30](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11&t=570s) Customize the Test Suite by specifying individual tests and test conditions\
[13:20](https://www.youtube.com/watch?v=_HKGrW2mVdo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=11&t=800s) Build and customize data quality Report

That’s it! We evaluated data quality using Evidently Reports and Test Suites and demonstrated how to add custom metrics, tests, and test conditions to the analysis.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# 2.4. Data quality in machine learning

{% embed url="https://www.youtube.com/watch?v=IRbmQGqzVZo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=10" %}

**Video 4**. [Data quality in machine learning](https://www.youtube.com/watch?v=IRbmQGqzVZo&list=PL9omX6impEuOpTezeRF-M04BW3VfnPBRF&index=10), by Emeli Dral

## What can go wrong with the input data?

If you have a complex ML system, there are many things that can go wrong with the data. The golden rule is: garbage in, garbage out. We need to make sure that the data we feed our model with is fine.

Some common data processing issues are:
* **Wrong source**. E.g., a pipeline points to an older version of the table.
* **Lost access**. E.g., permissions are not updated.
* **Bad SQL. Or not SQL**. E.g., a query breaks when a user comes from a different time zone and makes an action “tomorrow."
* **Infrastructure update**. E.g., change in computation based on a dependent library.
* **Broken feature code**. E.g., feature computation breaks at a corner case like a 100% discount.

Issues can also arise if the data schema changes or data is lost at the source (e.g., broken in-app logging or frozen sensor values). If you have several models interacting with each other, broken upstream models can affect downstream models.

![](<../../../images/2023109\_course\_module2.041.png>)

## Data quality metrics and analysis

**Data profiling** is a good starting point for monitoring data quality metrics. Based on the data type, you can come up with basic descriptive statistics for your dataset. For example, for numerical features, you can calculate:
* Min and Max values
* Quantiles
* Unique values
* Most common values
* Share of missing values, etc.

Then, you can visualize and compare statistics and data distributions of the current data batch and reference data to ensure data stability.

![](<../../../images/2023109\_course\_module2.047.png>)

When it comes to monitoring data quality, you must define the conditions for alerting.

**If you do not have reference data, you can set up thresholds manually based on domain knowledge**. “General ML data quality” can include such characteristics as:
* no/low share of missing values
* no duplicate columns/rows
* no constant (or almost constant!) features
* no highly correlated features
* no target leaks (high correlation between feature and target)
* no range violations (based on the feature context, e.g., negative age or sales).

Since setting up these conditions manually can be tedious, it often helps to have a reference dataset.

**If you have reference data, you can compare it with the current data and autogenerate test conditions based on the reference**. For example, based on the training or past batch, you can monitor for:
* expected data schema and column types
* expected data completeness (e.g., 90% non-empty)
* expected batch size (e.g., number of rows)
* expected patterns for specific columns, such as:
* non-unique (features) or unique (IDs)
* specific data distribution types (e.g., normality)
* expected ranges based on observed values
* descriptive statistics: averages, median, quantiles, min-max (point estimation or statistical tests with a confidence interval).

## Summing up

Monitoring data quality is critical to ensuring that ML models function reliably in production. Depending on the availability of reference data, you can manually set up thresholds based on domain knowledge or automatically generate test conditions based on the reference.

Up next: hands-on practice on how to evaluate and test data quality using Python and [Evidently](https://github.com/evidentlyai/evidently) library.
Loading