Monitoring system for data teams. Computing health checks on data (via Airflow jobs), visualizing them over time, and alerting on them in Grafana.
Redata computes health metrics for your data, containing information like this:
- time since last record was added
- number of records added in last (hour/day/week/month)
- schema changes that recently happened
- number of nulls in columns over time
- other checks specific to columns in data and their types
If you have DevOps experience, you can think of it as: prometheus, telegraf for data teams
Having metrics in one common format, makes it possible to create dashboards automatically, for all (or chosen) tables in your data. Currently there are 2 types of dashboard redata creates:
- home dashboard, containing most important information about all tables
- table dashboard, containing information specific to given table and columns in it
Here are some examples of how generated Grafana dashboards look like:
Get a glimpse of what's happening in all your tables on one screen. If you see, any suspicious numbers click on the tile for more details on this specific table.
Get an in-depth view of your table, learn about any schema changes, volume fluctuations, nulls in columns, and other useful metrics.
No need to setup Airflow, Grafana or DB for storing metrics. Redata will setup all of those via Docker images, you need to deploy only one thing. Check out deplying on production section for info how to easily deploy redata on AWS or GCP.
Grafana supports PostgreSQL and lot of others DBs, so what are benefits of using redata over setting monitoring yourself with couple of SQL queries? Here is a our list :)
-
Visualizing all tables together in one dashbard - Computing metrics layer make it really easy to do visulizations for many/all tables at once and showing them under one dashboard.
-
Visualizing new, previously impossible things - Things like schema changes, cannot be queried from DB, but computing metrics over time makes showing those possible.
-
Visualizing how things change over time - If you are doing any updates to DB, like updating row status etc. it's impossible to visualize how things looked liked in the past and compare it to now (for alerting purposes etc.), adding metrics layer makes it easy.
-
Automatic and up to date dashboards - It's normally quite cumbersome to setup proper monitoring for all tables and keeping it up to date is hard - redata can do that for you, detecting new tables and columns and automatically creating dashboards/panels for them.
git clone https://github.com/redata-team/redata.git
cp env_template .env
# create REDATA_SOURCE_DB_URL_YOUR_DB_NAME variables (at the end of .env file)
# you can add multiple variables for many DBs you want to observe here
# if just want to test redata, without your data yet, just paste
# REDATA_SOURCE_DB_URL_REDATA=${REDATA_METRICS_DB_URL}
# as url, you will starting with monitoring redata itself :)
docker-compose up
Add this point Grafana should be running on http://localhost:3000 (or you docker IP in case of running docker via virtualbox)
First screen you will see there, is login screen. Default password is admin/admin, but if you want can you can change that in .env file (need to be done when staring docker)
From the main dashboard named: Home (generated)
you can go to any table specific dashboard, just by clicking tile that shows stats for given table
Airflow should be running and available under: http://localhost:8080/ (or you docker IP, default password is also admin/admin if it wasn't changed in .env)
You should see validation dag
there, turn in on and it will start running (every 10 minutes or other frequency if specified in settings.py
file)
You can also manually trigger running dag (by clicking first icon on Link tab)
Redata uses docker
and docker-compose
for deployment, this makes it easy to deploy in the cloud, or in your on premise enviroment.
Look at sample setup instructions for specfic cloud providers:
Join Slack for general questions about using redata, problems, and discussions with people making it :)
Here are integrations we support or work on now. Let us know if you'd really like to pritize something or your DB is not included on the list.
Integration | Status | |
---|---|---|
PostgreSQL | Supported | |
MySQL | Supported | |
Exasol | Supported | |
BigQuery | Supported | |
Apache Airflow | Supported, view all your checks in Airflow | |
Grafana | Supported, view metrics here | |
Other SQL DBs | Experimental support via using SQLAlchemy | |
AWS Redshift | In development | |
AWS S3 | In development | |
Excel | Planned | |
Snowflake | Planned |
Redata is licensed under the MIT license. See the LICENSE file for licensing information.
We love all contributions, bigger and smaller.
Checkout our list of good first issues and see if you like anything from there. Also feel welcome to join our Slack and suggest ideas, or setup no pressure session with Redata here.
More details on how to tests your changes under: CONTRIBUTING