Are your data meeting your expectations?
Data Expectations is a Python library which takes a delarative approach to asserting qualities of your datasets. Instead of tests like is_sorted
to determine if a column is ordered, the expectation is column_values_are_increasing
. Most of the time you don't need to know how it got like that, you are only interested what the data looks like now.
Expectations can be used alongside, or in place of a schema validator, however Expectations is intended to perform validation of the data in a dataset, not the structure of a table. Records should be a Python dictionary (or dictionary-like object) and can be processed one-by-one, or against an entire list of dictionaries.
Data Expectations was inspired by the great Great Expectations library, but we wanted something lighter and easier to quickly set up and run. Data Expectations can do less, but it does it with a fraction of the effort and has zero dependencies.
- Use Data Expectations was as a step in data processing pipelines, testing the data conforms to expectations before it is committed to the warehouse.
- Use Data Expectations to simplify validating user supplied values.
- expect_column_to_exist (column)
- expect_column_values_to_not_be_null (column)
- expect_column_values_to_be_of_type (column, expected_type, ignore_nulls:true)
- expect_column_values_to_be_in_type_list (column, type_list, ignore_nulls:true)
- expect_column_values_to_be_more_than (column, threshold, ignore_nulls:true)
- expect_column_values_to_be_less_than (column, threshold, ignore_nulls:true)
- expect_column_values_to_be_between (column, maximum, minimum, ignore_nulls:true)
- expect_column_values_to_be_increasing (column, ignore_nulls:true)
- expect_column_values_to_be_decreasing (column, ignore_nulls:true)
- expect_column_values_to_be_in_set (column, symbols, ignore_nulls:true)
- expect_column_values_to_match_regex (column, regex, ignore_nulls:true)
- expect_column_values_to_match_like (column, like, ignore_nulls:true)
- expect_column_values_length_to_be (column, length, ignore_nulls:true)
- expect_column_values_length_to_be_between (column, maximum, minimum, ignore_nulls:true)
pip install data_expectations
Data Expectations has no external dependencies, can be used ad hoc and in-the-moment without complex set up.
Testing Python Dictionaries
import data_expectations as de
from data_expectations import Expectation
from data_expectations import Behaviors
TEST_DATA = {"name": "charles", "age": 12}
set_of_expectations = [
Expectation(Behaviors.EXPECT_COLUMN_TO_EXIST, column="name"),
Expectation(Behaviors.EXPECT_COLUMN_TO_EXIST, column="age"),
Expectation(Behaviors.EXPECT_COLUMN_VALUES_TO_BE_BETWEEN, column="age", config={"minimum": 0, "maximum": 120}),
]
expectations = de.Expectations(set_of_expectations)
try:
de.evaluate_record(expectations, TEST_DATA)
except de.errors.ExpectationNotMetError: # pragma: no cover
print("Data Didn't Meet Expectations")
Testing individual Values:
import data_expectations as de
from data_expectations import Expectation
from data_expectations import Behaviors
expectation = Expectation(Behaviors.EXPECT_COLUMN_VALUES_TO_BE_BETWEEN, column="age", config={"minimum": 0, "maximum": 120})
try:
expectation.test_value(55)
except de.errors.ExpectationNotMetError: # pragma: no cover
print("Data Didn't Meet Expectations")