Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added course notes for Module 4 #13

Merged
merged 72 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
bbc362d
Added compressed images for module 4
dmaliugina Nov 3, 2023
f371164
Added compressed images for module 4
dmaliugina Nov 3, 2023
25e5202
Create readme.md
dmaliugina Nov 3, 2023
9d991bf
Delete docs/book/ml-observability-course/docs/book/ml-observability-c…
dmaliugina Nov 3, 2023
14a2018
Create readme.md
dmaliugina Nov 3, 2023
9f34930
Create logging-ml-monitoring.md
dmaliugina Nov 3, 2023
6fa742c
Create how-to-prioritize-monitoring-metrics.md
dmaliugina Nov 3, 2023
85960b5
Create when-to-retrain-ml-models.md
dmaliugina Nov 3, 2023
1289eb7
Create how-to-choose-reference-dataset-ml-monitoring.md
dmaliugina Nov 3, 2023
6baadf5
Create custom-metrics-ml-monitoring.md
dmaliugina Nov 3, 2023
9b074de
Create custom-metrics-evidently-code-practice.md
dmaliugina Nov 3, 2023
fbabe70
Create choosing-ml-monitoring-deployment-architecture.md
dmaliugina Nov 3, 2023
f2c25e8
Update SUMMARY.md
dmaliugina Nov 3, 2023
77740cd
Delete docs/book/ml-observability-course/module-4-designing-effective…
dmaliugina Nov 3, 2023
d03d548
Delete docs/images/2023110_course_module4_fin.001-min.png
dmaliugina Nov 3, 2023
ecae49c
Delete docs/images/2023110_course_module4_fin.002-min.png
dmaliugina Nov 3, 2023
e15fc93
Delete docs/images/2023110_course_module4_fin.003-min.png
dmaliugina Nov 3, 2023
a5e0f07
Delete docs/images/2023110_course_module4_fin.041-min.png
dmaliugina Nov 3, 2023
484baa7
Delete docs/images/2023110_course_module4_fin.040-min.png
dmaliugina Nov 3, 2023
b80b077
Delete docs/images/2023110_course_module4_fin.039-min.png
dmaliugina Nov 3, 2023
8858c24
Delete docs/images/2023110_course_module4_fin.035-min.png
dmaliugina Nov 3, 2023
a65c8ac
Delete docs/images/2023110_course_module4_fin.032-min.png
dmaliugina Nov 3, 2023
d7b8c17
Delete docs/images/2023110_course_module4_fin.031-min.png
dmaliugina Nov 3, 2023
571052a
Delete docs/images/2023110_course_module4_fin.030-min.png
dmaliugina Nov 3, 2023
3a3eb3c
Delete docs/images/2023110_course_module4_fin.027-min.png
dmaliugina Nov 3, 2023
8776a1e
Delete docs/images/2023110_course_module4_fin.026-min.png
dmaliugina Nov 3, 2023
0650041
Delete docs/images/2023110_course_module4_fin.024-min.png
dmaliugina Nov 3, 2023
18d9a1a
Delete docs/images/2023110_course_module4_fin.022-min.png
dmaliugina Nov 3, 2023
f0a1c00
Delete docs/images/2023110_course_module4_fin.017-min.png
dmaliugina Nov 3, 2023
ba9c770
Delete docs/images/2023110_course_module4_fin.016-min.png
dmaliugina Nov 3, 2023
fb3d430
Delete docs/images/2023110_course_module4_fin.015-min.png
dmaliugina Nov 3, 2023
667fb37
Delete docs/images/2023110_course_module4_fin.014-min.png
dmaliugina Nov 3, 2023
902e36e
Delete docs/images/2023110_course_module4_fin.013-min.png
dmaliugina Nov 3, 2023
cc15dbe
Delete docs/images/2023110_course_module4_fin.011-min.png
dmaliugina Nov 3, 2023
f2864bb
Delete docs/images/2023110_course_module4_fin.010-min.png
dmaliugina Nov 3, 2023
e362f2c
Delete docs/images/2023110_course_module4_fin.007-min.png
dmaliugina Nov 3, 2023
3d6672a
Delete docs/images/2023110_course_module4_fin.006-min.png
dmaliugina Nov 3, 2023
2224549
Delete docs/images/2023110_course_module4_fin.085-min.png
dmaliugina Nov 3, 2023
839a3ba
Delete docs/images/2023110_course_module4_fin.084-min.png
dmaliugina Nov 3, 2023
5144154
Delete docs/images/2023110_course_module4_fin.083-min.png
dmaliugina Nov 3, 2023
6960f03
Delete docs/images/2023110_course_module4_fin.082-min.png
dmaliugina Nov 3, 2023
6a1b181
Delete docs/images/2023110_course_module4_fin.081-min.png
dmaliugina Nov 3, 2023
97bf683
Delete docs/images/2023110_course_module4_fin.077-min.png
dmaliugina Nov 3, 2023
5e5f013
Delete docs/images/2023110_course_module4_fin.076-min.png
dmaliugina Nov 3, 2023
2c9950c
Delete docs/images/2023110_course_module4_fin.075-min.png
dmaliugina Nov 3, 2023
a826dee
Delete docs/images/2023110_course_module4_fin.072-min.png
dmaliugina Nov 3, 2023
718b8c8
Delete docs/images/2023110_course_module4_fin.070-min.png
dmaliugina Nov 3, 2023
47b30f7
Delete docs/images/2023110_course_module4_fin.067-min.png
dmaliugina Nov 3, 2023
ff09390
Delete docs/images/2023110_course_module4_fin.066-min.png
dmaliugina Nov 3, 2023
f89620c
Delete docs/images/2023110_course_module4_fin.063-min.png
dmaliugina Nov 3, 2023
98afb95
Delete docs/images/2023110_course_module4_fin.062-min.png
dmaliugina Nov 3, 2023
4fc054d
Delete docs/images/2023110_course_module4_fin.061-min.png
dmaliugina Nov 3, 2023
7796d58
Delete docs/images/2023110_course_module4_fin.053-min.png
dmaliugina Nov 3, 2023
6328754
Delete docs/images/2023110_course_module4_fin.052-min.png
dmaliugina Nov 3, 2023
c5708e0
Delete docs/images/2023110_course_module4_fin.051-min.png
dmaliugina Nov 3, 2023
7233c56
Delete docs/images/2023110_course_module4_fin.049-min.png
dmaliugina Nov 3, 2023
ae07303
Delete docs/images/2023110_course_module4_fin.047-min.png
dmaliugina Nov 3, 2023
8b3e41d
Delete docs/images/2023110_course_module4_fin.044-min.png
dmaliugina Nov 3, 2023
be5f68c
Delete docs/images/2023110_course_module4_fin.043-min.png
dmaliugina Nov 3, 2023
da4990c
Delete docs/images/2023110_course_module4_fin.104-min.png
dmaliugina Nov 3, 2023
426d030
Delete docs/images/2023110_course_module4_fin.101-min.png
dmaliugina Nov 3, 2023
78015c0
Delete docs/images/2023110_course_module4_fin.100-min.png
dmaliugina Nov 3, 2023
db967de
Delete docs/images/2023110_course_module4_fin.099-min.png
dmaliugina Nov 3, 2023
d1dce7e
Delete docs/images/2023110_course_module4_fin.098-min.png
dmaliugina Nov 3, 2023
624a957
Delete docs/images/2023110_course_module4_fin.097-min.png
dmaliugina Nov 3, 2023
dab6ae1
Delete docs/images/2023110_course_module4_fin.096-min.png
dmaliugina Nov 3, 2023
ea0dbba
Delete docs/images/2023110_course_module4_fin.095-min.png
dmaliugina Nov 3, 2023
a742a60
Delete docs/images/2023110_course_module4_fin.094-min.png
dmaliugina Nov 3, 2023
77ec17d
Delete docs/images/2023110_course_module4_fin.093-min.png
dmaliugina Nov 3, 2023
89ad077
Delete docs/images/2023110_course_module4_fin.091-min.png
dmaliugina Nov 3, 2023
40c1cf5
Update README.md
dmaliugina Nov 3, 2023
3d7ce44
Update README.md
dmaliugina Nov 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions docs/book/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ description: Open-source ML observabilty course.

Welcome to the Open-source ML observability course!

The course starts on **October 16, 2023**. \
[Sign up](https://www.evidentlyai.com/ml-observability-course) to save your seat and receive weekly course updates.

# How to participate?
* **Join the course**. [Sign up](https://www.evidentlyai.com/ml-observability-course) to receive weekly updates with course materials and information about office hours.
* **Course platform [OPTIONAL]**. If you want to receive a course certificate, you should **also** [register](https://evidentlyai.thinkific.com/courses/ml-observability-course) on the platform and complete all the assignments before **December 1, 2023**.
Expand Down Expand Up @@ -47,7 +44,7 @@ ML observability course is organized into six modules. You can follow the comple
{% endcontent-ref %}

{% content-ref url="ml-observability-course/module-4-designing-effective-ml-monitoring.md" %}
[Module 4. Designing effective ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring.md).
[Module 4. Designing effective ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring/readme.md).
{% endcontent-ref %}

{% content-ref url="ml-observability-course/module-5-ml-pipelines-validation-and-testing.md" %}
Expand Down
9 changes: 8 additions & 1 deletion docs/book/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@
* [3.4. Monitoring embeddings drift](ml-observability-course/module-3-ml-monitoring-for-unstructured-data/monitoring-embeddings-drift.md)
* [3.5. Monitoring text data [CODE PRACTICE]](ml-observability-course/module-3-ml-monitoring-for-unstructured-data/monitoring-text-data-code-practice.md)
* [3.6. Monitoring multimodal datasets](ml-observability-course/module-3-ml-monitoring-for-unstructured-data/monitoring-multimodal-datasets.md)
* [Module 4: Designing effective ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring.md)
* [Module 4: Designing effective ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring/readme.md)
* [4.1. Logging for ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring/logging-ml-monitoring.md)
* [4.2. How to prioritize ML monitoring metrics](ml-observability-course/module-4-designing-effective-ml-monitoring/how-to-prioritize-monitoring-metrics.md)
* [4.3. When to retrain machine learning models](ml-observability-course/module-4-designing-effective-ml-monitoring/when-to-retrain-ml-models.md)
* [4.4. How to choose a reference dataset in ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring/how-to-choose-reference-dataset-ml-monitoring.md)
* [4.5. Custom metrics in ML monitoring](ml-observability-course/module-4-designing-effective-ml-monitoring/custom-metrics-ml-monitoring.md)
* [4.6. Implementing custom metrics in Evidently [OPTIONAL]](ml-observability-course/module-4-designing-effective-ml-monitoring/custom-metrics-evidently-code-practice.md)
* [4.7. How to choose the ML monitoring deployment architecture](ml-observability-course/module-4-designing-effective-ml-monitoring/choosing-ml-monitoring-deployment-architecture.md)
* [Module 5: ML pipelines validation and testing](ml-observability-course/module-5-ml-pipelines-validation-and-testing.md)
* [Module 6: Deploying an ML monitoring dashboard](ml-observability-course/module-6-deploying-an-ml-monitoring-dashboard.md)

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# 4.7. How to choose the ML monitoring deployment architecture

{% embed url="https://youtu.be/Q1NUCDZFRbU?si=26GhKBdhFAIzxBgi" %}

**Video 7**. [How to choose the ML monitoring deployment architecture](https://youtu.be/Q1NUCDZFRbU?si=26GhKBdhFAIzxBgi), by Emeli Dral

There are alternative backends for machine learning monitoring architecture.

![](<../../../images/2023110\_course\_module4\_fin.086-min.png>)

## Ad-hoc reporting

**Ad-hoc reporting** is a viable option when you've recently deployed a machine learning system, and do not have alternative monitoring systems.
* It has **low engineering overhead**: you can use familiar tools like Jupyter notebooks, Python scripts, or R scripts.
* It is **suitable for initial exploration** of data and model quality and shaping expectations about model performance, but is not a long-term monitoring solution.

![](<../../../images/2023110\_course\_module4\_fin.087-min.png>)

## Batch monitoring

**Batch ML monitoring** is a reliable and stable approach. It is suitable for both machine learning pipelines and services.

To implement batch monitoring, you need a workflow orchestration tool like Airflow or Kubeflow, and tools for calculating metrics and tests, such as Evidently.

**Pros**:
* Works well for both ML models implemented as batch pipelines and ML services.
* It is fairly simple to run monitoring jobs, especially if you already have a workflow orchestrator in place.
* You can use the same tools you use to run model training jobs during the experimental and validation phases of a machine learning lifecycle.
* You can combine immediate monitoring (e.g., data quality checks) and metrics dependent on ground truth (trigger-based calculations).

**Cons**:
* It is not real-time. There are some delays in metric computation due to additional resources required for running the infrastructure.
* It might be complex if you don't have an existing orchestrator; setting up one can be resource-intensive.

![](<../../../images/2023110\_course\_module4\_fin.088-min.png>)

## Near real-time (streaming) monitoring

**Near real-time ML monitoring** architecture is suitable when you serve models as APIs and want to detect issues close to real-time. In this case, you push data from the machine learning service to the monitoring system.

You will need optimal storage solutions for time series data like Prometheus or Clickhouse, and tools like Grafana or Evidently for dashboarding and alerting.

**Pros**:
* Works for models deployed as an ML service as opposed to batch jobs.
* Suitable for scenarios when you need an immediate reaction to issues like missing data or outliers.

**Cons**:
* High operational costs. Make sure you have the resources to maintain an additional monitoring service.
* Potentially double effort. You will often still need to deal with delayed ground truth feedback and run batch monitoring jobs to calculate these metrics.

![](<../../../images/2023110\_course\_module4\_fin.089-min.png>)

**Custom monitoring backend**. You can also combine near real-time and batch monitoring.

For example, you can combine:
* **Real-time checks**. You can send the data available at serving time directly from the ML service to an ML monitoring system to run input and model output checks and to generate alerts.
* **Monitoring jobs**. For delayed ground truth or more complex checks, you can run monitoring jobs over prediction logs on a trigger or a schedule.
* **Dashboarding tool**. You can log all results to the same metric storage system and get a single dashboard with panels for batch and real-time checks.

![](<../../../images/2023110\_course\_module4\_fin.090-min.png>)

## A case for batch ML monitoring

Let’s go through the possible logic of choosing the ML monitoring architecture.

First, let’s contrast it to **traditional software health monitoring**. You can typically implement additional service endpoints for metrics. Then, you can use tools like Prometheus to pull the metrics from these endpoints and store high-frequency time series data. You can add alerting and dashboard tools that rely on these metrics as a data source.

![](<../../../images/2023110\_course\_module4\_fin.092-min.png>)

However, integrating ML metrics into this same setup isn't as simple. Here is why:
* **Complex metrics**. Software metrics are usually more straightforward in terms of computation. You can run simple aggregations over data points like response times and memory usage. Some ML-related metrics (like the number of rows or missing values) are similar. But others, like model quality or statistical tests, involve more complex calculations.
* **Delayed feedback**. Model quality metrics like precision, recall or accuracy typically depend on delayed data. You cannot compute them at serving time and must wait for the labels. Once you calculate them, you must “backfill” time series data for the past period, since the moment you compute metrics is not the moment they refer to.
* **Reference dataset**. For checks like data and prediction drift, you must also pass a batch of data you are comparing against. This does not easily fit into traditional software architecture.

ML model monitoring may require additional components:
* **Metric calculation pipelines**. If you run metric computation as jobs, you can use the appropriate backend for complex evaluations, for example, not just a SQL-like query. You can run complex evaluations like statistical drift and behavioral tests.
* **Run several different pipelines**. You can split metrics into separate pipelines. Some will run on a schedule (for metrics you can compute immediately) and others will be triggered by events like receiving new labeled data.
* **Passing the reference data**. You can implement complex pipelines that would involve querying the reference data, loading it, and comparing it against the current data batch.

**Example**: you can cover the whole model lifecycle with batch checks and monitoring jobs.

![](<../../../images/2023110\_course\_module4\_fin.102-min.png>)

You can still combine this approach with traditional software monitoring system architecture. Once you implement a different metric computation backend for ML metrics, you can store the results in a metric storage and use it as a data source for your dashboarding system to visualize machine learning-related metrics.

![](<../../../images/2023110\_course\_module4\_fin.103-min.png>)

You can add a few ML-related metrics to an existing dashboard or create a separate ML monitoring dashboard.

## Summing up

We discussed the differences between different ML monitoring architectures. Here are some takeaways:
* Choose the ML architecture that matches your available resources, risk mitigation needs, and the complexity of your machine learning model.
* Even if you deploy a model as a service, consider batch ML monitoring. It is a more lightweight option, especially if you have a workflow orchestrator in place. It can handle complex evaluation scenarios.

## Enjoyed the content?

Star Evidently on GitHub to contribute back! This helps us create free, open-source tools and content for the community.
⭐️ [Star](https://github.com/evidentlyai/evidently) on GitHub!
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# 4.6. Implementing custom metrics in Evidently [OPTIONAL]

{% embed url="https://youtu.be/uEyoP-sPhyc?si=7hwr4LaJIeBZ-YLD" %}

**Video 6**. [Implementing custom metrics in Evidently [OPTIONAL, CODE PRACTICE]](https://youtu.be/uEyoP-sPhyc?si=7hwr4LaJIeBZ-YLD), by Emeli Dral

This is an optional code practice video. It is useful when you already have experience using the Evidently Python library and are familiar with the existing Metrics and Tests. If you are new - check out the next module for an end-to-end example!

**Want to go straight to code?** Here is the [example notebook](https://github.com/evidentlyai/ml_observability_course/blob/main/module4/custom_metric_practice.ipynb) to follow along.

**Outline:**\
[00:00](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=0s) Introduction \
[00:37](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=37s) Imports \
[01:54](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=114s) Understanding the structure of Metrics and Tests \
[05:11](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=311s) Create a dummy custom metric \
[12:17](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=737s) Apply a dummy metric on toy data \
[14:00](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=840s) Create a more complicated metric: Mean by Category \
[26:25](https://www.youtube.com/watch?v=uEyoP-sPhyc&t=1585s) Apply a new metric on toy data
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# 4.5. Custom metrics in ML monitoring

{% embed url="https://youtu.be/PrFuzKLM66I?si=68EF7tepIyXxyMig" %}

**Video 5**. [Custom metrics in ML monitoring](https://youtu.be/PrFuzKLM66I?si=68EF7tepIyXxyMig), by Emeli Dral

## Types of custom metrics

While there is no strict division between “standard” and “custom” metrics, there is some consensus on evaluating, for example, classification model quality using metrics like precision and recall. They are fairly “standard.”

However, you often need to implement “custom” metrics to reflect specific aspects of model performance. They typically refer to business objectives or domain requirements and help capture the impact of an ML model within its operational context.

Here are some examples.

**Business and product KPIs (or proxies)**. These metrics are aligned with key performance indicators that reflect the business goals and product performance.

**Examples include**:
* Manufacturing optimization: raw materials saved.
* Chatbots: number of successful chat completions.
* Fraud detection: number of detected fraud cases over $50,000.
* Recommender systems: share of recommendation blocks without clicks.

We recommend **consulting with business stakeholders** even before building the model. They may suggest valuable KPIs, heuristics, and metrics that could be monitored even during the experimentation phase.

When direct measurement of a KPI is not possible, consider **approximating the model impact**. For example, you can assign an average “cost” to specific types of model errors based on domain knowledge.

![](<../../../images/2023110\_course\_module4\_fin.078-min.png>)

**Domain-specific ML metrics**. These are metrics that are commonly used in specific domains and industries.

**Examples include**:
* Churn prediction in telecommunications: lift metrics.
* Recommender systems: serendipity or novelty metrics.
* Healthcare: fairness metrics.
* Speech recognition: word error rate.
* Medical imaging: Jaccard index.

![](<../../../images/2023110\_course\_module4\_fin.079-min.png>)

**Weighted or aggregated metrics**. Sometimes, you can design custom metrics as a “weighted” variation of other metrics. For example, you can adjust them to account for the importance of certain features or classes in your data.

**Examples include**:
* Data drift weighted by feature importance.
* Measuring specific recommender system biases, for example, based on product popularity, price, or product group.
* In unbalanced classification problems, you can weigh precision and recall by class or by specific important user groups, such as based on the estimated user Lifetime Value (LTV).

![](<../../../images/2023110\_course\_module4\_fin.080-min.png>)

## Summing up

There is no need to invent “custom” metrics just for the sake of it. However, you might want to implement them to:
* better reflect important model qualities,
* estimate the business impact of the model,
* add metrics useful for product and business stakeholders and accepted within the domain.

Up next: optional code practice to create and implement a custom quality metric in the Evidently Python library.
Loading