Merge branch 'main' into ui/feature/deterministic-uuid-demo-projects

evidentlyai · Jul 5, 2024 · ad4c909 · ad4c909
2 parents 521fac6 + 9f99354
commit ad4c909
Show file tree

Hide file tree

Showing 54 changed files with 396 additions and 150 deletions.
diff --git a/.github/workflows/ui.yml b/.github/workflows/ui.yml
@@ -61,7 +61,7 @@ jobs:
  name: UI type-check
  runs-on: ubuntu-22.04
  needs: changed_files
- if: ${{ github.event.pull_request.draft == false && needs.changed_files.outputs.ui_any_modified == 'true' && needs.changed_files.outputs.evidently_python_any_modified == 'true' }}
+ if: ${{ github.event.pull_request.draft == false && (needs.changed_files.outputs.ui_any_modified == 'true' || needs.changed_files.outputs.evidently_python_any_modified == 'true') }}
 
  steps:
  - name: ⬇️ Checkout repo

diff --git a/docs/book/.gitbook/assets/cloud/add_descriptor_tab.gif b/docs/book/.gitbook/assets/cloud/add_descriptor_tab.gif
diff --git a/docs/book/get-started/quickstart-llm.md b/docs/book/get-started/quickstart-llm.md
@@ -8,7 +8,7 @@ You can run this example in Colab or any Python environment.
 
 Install the Evidently Python library. 
 
-```
+```python
 !pip install evidently[llm]
 ```
 
@@ -19,93 +19,94 @@ import pandas as pd
 from sklearn import datasets
 from evidently.report import Report
 from evidently.metric_preset import TextEvals
-
-import nltk
-nltk.download('words')
-nltk.download('wordnet')
-nltk.download('omw-1.4')
-nltk.download('vader_lexicon')
+from evidently.descriptors import *
 ```
 
-**Optional**. Import components to send evaluation results to Evidently Cloud:
+**Optional**. Import the components to send evaluation results to Evidently Cloud:
 
 ```python
 from evidently.ui.workspace.cloud import CloudWorkspace
 ```
 
 # 2. Import the toy dataset 
 
-Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text" that you'll analyze.
+Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text". You will take 100 rows to analyze.
 
 ```python
-reviews_data = datasets.fetch_openml(name='Womens-E-Commerce-Clothing-Reviews', version=2, as_frame='auto')
+reviews_data = datasets.fetch_openml(
+ name='Womens-E-Commerce-Clothing-Reviews', 
+ version=2, as_frame='auto')
 reviews = reviews_data.frame[:100]
 ```
 
-# 3. Run the evals
+# 3. Run your first eval
 
-Run an evaluation Preset to check basic text descriptive text properties:
-* text sentiment (scale -1 to 1)
-* text length (number of symbols)
-* number of sentences in a text 
-* percentage of out-of-vocabulary words (scale 0 to 100)
-* percentage of non-letter characters (scale 0 to 100)
+Run a few basic evaluations for all texts in the "Review_Text" column: 
+* text sentiment (measured on a scale from -1 for negative to 1 for positive)
+* text length (returns an absolute number of symbols)
 
 ```python
 text_evals_report = Report(metrics=[
- TextEvals(column_name="Review_Text")
- ]
-)
+ TextEvals(column_name="Review_Text", descriptors=[
+ Sentiment(),
+ TextLength(),
+ ]
+ ),
+])
 
 text_evals_report.run(reference_data=None, current_data=reviews)
 ```
 
-There are more evals to choose from. You can also create custom ones, including LLM-as-a-judge.
+There are 20+ built-in evals to choose from. You can also create custom ones, including LLM-as-a-judge. We call the result of each such evaluation a `descriptor`. 
 
 View a Report in Python:
 
 ```
 text_evals_report
 ```
 
-You will see a summary distribution of results for each evaluation.
+You will see the summary results: the distribution of length and sentiment for all evaluated texts. 
 
 # 4. Send results to Evidently Cloud 
 
-To record and monitor evaluations over time, send them to Evidently Cloud. You'll need an API key.
-* Sign up for an [Evidently Cloud account](https://app.evidently.cloud/signup), and create your Organization.
-* Click on the **Teams** icon on the left menu. Create a Team - for example, "Personal". Copy and save the team ID. ([Team page](https://app.evidently.cloud/teams)).
-* Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)).
-
-Connect to Evidently Cloud using your token.
+To record and monitor evaluations over time, send them to Evidently Cloud. 
+* **Sign up**. Create an [Evidently Cloud account](https://app.evidently.cloud/signup) and your Organization.
+* **Add a Team**. Click **Teams** in the left menu. Create a Team, copy and save the Team ID. ([Team page](https://app.evidently.cloud/teams)).
+* **Get your API token**. Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)).
+* **Connect to Evidently Cloud**. Pass your API key to connect from your Python environment. 
 
 ```python
-ws = CloudWorkspace(token="YOUR_TOKEN_HERE", url="https://app.evidently.cloud")
+ws = CloudWorkspace(token="YOUR_API_TOKEN", url="https://app.evidently.cloud")
 ```
-
-Create a Project inside your Team. Pass the `team_id`:
+* **Create a Project**. Create a new Project inside your Team, adding your title and description:
 
 ```python
 project = ws.create_project("My test project", team_id="YOUR_TEAM_ID")
 project.description = "My project description"
 project.save()
 ```
 
-Send the Report to the Cloud: 
+* **Upload the Report to the Project**. Send the evaluation results: 
 
 ```python
 ws.add_report(project.id, text_evals_report)
 ```
 
-Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)).
+* **View the Report**. Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)).
 
 ![](../.gitbook/assets/cloud/toy_text_report_preview.gif)
 
-In the future, you can log ongoing evaluation results to build monitoring panels and send alerts.
+# 5. Get a dashboard 
+
+Go to the "Dashboard" tab and enter the "Edit" mode. Add a new tab, and select the "Descriptors" template.
+
+You'll see a set of panels that show Sentiment and Text Length with a single data point. As you log ongoing evaluation results, you can track trends and set up alerts. 
+
+![](../.gitbook/assets/cloud/add_descriptor_tab.gif)
 
 # Want to see more?
 
-Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time and more.
+Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time, and more.
 
 {% content-ref url="tutorial-llm.md" %}
 [Evidently LLM Tutorial](tutorial-llm.md). 

diff --git a/docs/book/monitoring/design_dashboard_api.md b/docs/book/monitoring/design_dashboard_api.md
@@ -434,7 +434,7 @@ project.dashboard.add_panel(
 
 **Aggregated by Status**. To show the total number of failed Tests (status filter), with daily level aggregation.
 
-```
+```python
 project.dashboard.add_panel(
  DashboardPanelTestSuite(
  title="All tests: aggregated",
@@ -452,7 +452,7 @@ project.dashboard.add_panel(
 
 **Filtered by Test ID**. To show all results for a specified list of Tests (on constant columns, missing values, empty rows) with daily-level aggregation.
 
-```
+```python
 project.dashboard.add_panel(
  DashboardPanelTestSuite(
  title="Data quality tests",

diff --git a/src/evidently/_version.py b/src/evidently/_version.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python
 # coding: utf-8
 
-version_info = (0, 4, 29)
+version_info = (0, 4, 30)
 __version__ = ".".join(map(str, version_info))
diff --git a/src/evidently/base_metric.py b/src/evidently/base_metric.py
@@ -283,7 +283,7 @@ def required_features(self, data_definition: DataDefinition) -> List["GeneratedF
  for field, value in sorted(self.__dict__.items(), key=lambda x: x[0]):
  if field in ["context"]:
  continue
- if issubclass(type(value), ColumnName) and value.feature_class is not None:
+ if isinstance(value, ColumnName) and value.feature_class is not None:
  required_features.append(value.feature_class)
  return required_features
 

diff --git a/src/evidently/calculation_engine/engine.py b/src/evidently/calculation_engine/engine.py
@@ -1,6 +1,7 @@
 import abc
 import functools
 import logging
+from typing import TYPE_CHECKING
 from typing import Dict
 from typing import Generic
 from typing import List
@@ -19,8 +20,11 @@
 from evidently.features.generated_features import GeneratedFeature
 from evidently.utils.data_preprocessing import DataDefinition
 
+if TYPE_CHECKING:
+ from evidently.suite.base_suite import Context
+
 TMetricImplementation = TypeVar("TMetricImplementation", bound=MetricImplementation)
-TInputData = TypeVar("TInputData")
+TInputData = TypeVar("TInputData", bound=GenericInputData)
 
 
 class Engine(Generic[TMetricImplementation, TInputData]):
@@ -34,10 +38,10 @@ def set_metrics(self, metrics):
  def set_tests(self, tests):
  self.tests = tests
 
- def execute_metrics(self, context, data: GenericInputData):
+ def execute_metrics(self, context: "Context", data: GenericInputData):
  calculations: Dict[Metric, Union[ErrorResult, MetricResult]] = {}
  converted_data = self.convert_input_data(data)
- context.features = self.generate_additional_features(converted_data)
+ context.set_features(self.generate_additional_features(converted_data))
  context.data = converted_data
  for metric, calculation in self.get_metric_execution_iterator():
  if calculation not in calculations:
@@ -65,7 +69,7 @@ def get_data_definition(
  raise NotImplementedError()
 
  @abc.abstractmethod
- def generate_additional_features(self, data: TInputData):
+ def generate_additional_features(self, data: TInputData) -> Optional[Dict[tuple, GeneratedFeature]]:
  raise NotImplementedError
 
  def get_metric_implementation(self, metric):

diff --git a/src/evidently/calculation_engine/python_engine.py b/src/evidently/calculation_engine/python_engine.py
@@ -54,10 +54,10 @@ def get_data_definition(
  raise ValueError("PandasEngine works only with pd.DataFrame input data")
  return create_data_definition(reference_data, current_data, column_mapping, categorical_features_cardinality)
 
- def generate_additional_features(self, data: PythonInputData):
+ def generate_additional_features(self, data: PythonInputData) -> Dict[tuple, GeneratedFeature]:
  curr_additional_data = None
  ref_additional_data = None
- features = {}
+ features: Dict[tuple, GeneratedFeature] = {}
  for metric, calculation in self.get_metric_execution_iterator():
  try:
  required_features = metric.required_features(data.data_definition)

diff --git a/src/evidently/metrics/data_quality/column_category_metric.py b/src/evidently/metrics/data_quality/column_category_metric.py
@@ -43,6 +43,7 @@ class Config:
  "column_name": {IncludeTags.Parameter},
  "counts": {IncludeTags.Extra},
  }
+ smart_union = True
 
  def __init__(self, **data):
  """for backward compatibility"""
@@ -59,7 +60,7 @@ def __init__(self, **data):
  super().__init__(**data)
 
  column_name: str
- category: Union[int, float, str]
+ category: Union[bool, int, float, str]
  current: CategoryStat
  reference: Optional[CategoryStat] = None
  counts: CountOfValues
@@ -76,8 +77,11 @@ def counts_of_values(self) -> Dict[str, pd.DataFrame]:
 class ColumnCategoryMetric(Metric[ColumnCategoryMetricResult]):
  """Calculates count and shares of values in the predefined values list"""
 
+ class Config:
+ smart_union = True
+
  column_name: ColumnName
- category: Union[int, float, str]
+ category: Union[bool, int, float, str]
 
  def __init__(
  self, column_name: Union[str, ColumnName], category: Union[int, float, str], options: AnyOptions = None