Skip to content
/ redata Public
forked from re-data/re-data

Monitoring system for data teams. Computing health checks on data, visualizing and alerting on them in Grafana.

License

Notifications You must be signed in to change notification settings

zuba0/redata

 
 

Repository files navigation

Redata

Monitoring system for data teams. Computing health checks on data (via Airflow jobs), visualizing them over time, and alerting on them in Grafana.

Key features

Metrics layer

Redata computes health metrics for your data, containing information like this:

  • time since last record was added
  • number of records added in last (hour/day/week/month)
  • schema changes that recently happened
  • number of nulls in columns over time
  • other checks specific to columns in data and their types

If you have DevOps experience, you can think of it as: prometheus, telegraf for data teams

Automatic dashboards

Having metrics in one common format, makes it possible to create dashboards automatically, for all (or chosen) tables in your data. Currently there are 2 types of dashboard redata creates:

  • home dashboard, containing most important information about all tables
  • table dashboard, containing information specific to given table and columns in it

Here are some examples of how generated Grafana dashboards look like:


Get a glimpse of what's happening in all your tables on one screen. If you see, any suspicious numbers click on the tile for more details on this specific table.


Get an in-depth view of your table, learn about any schema changes, volume fluctuations, nulls in columns, and other useful metrics.

Batteries included

No need to setup Airflow, Grafana or DB for storing metrics. Redata will setup all of those via Docker images, you need to deploy only one thing. Check out deplying on production section for info how to easily deploy redata on AWS or GCP.

Benefits over doing monitoring yourself

Grafana supports PostgreSQL and lot of others DBs, so what are benefits of using redata over setting monitoring yourself with couple of SQL queries? Here is a our list :)

  • Visualizing all tables together in one dashbard - Computing metrics layer make it really easy to do visulizations for many/all tables at once and showing them under one dashboard.

  • Visualizing new, previously impossible things - Things like schema changes, cannot be queried from DB, but computing metrics over time makes showing those possible.

  • Visualizing how things change over time - If you are doing any updates to DB, like updating row status etc. it's impossible to visualize how things looked liked in the past and compare it to now (for alerting purposes etc.), adding metrics layer makes it easy.

  • Automatic and up to date dashboards - It's normally quite cumbersome to setup proper monitoring for all tables and keeping it up to date is hard - redata can do that for you, detecting new tables and columns and automatically creating dashboards/panels for them.

Getting started (local machine setup)

git clone https://github.com/redata-team/redata.git
cp env_template .env

# create REDATA_SOURCE_DB_URL_YOUR_DB_NAME variables (at the end of .env file)
# you can add multiple variables for many DBs you want to observe here

# if just want to test redata, without your data yet, just paste
# REDATA_SOURCE_DB_URL_REDATA=${REDATA_METRICS_DB_URL}
# as url, you will starting with monitoring redata itself :)

docker-compose up

Grafana

Add this point Grafana should be running on http://localhost:3000 (or you docker IP in case of running docker via virtualbox)

First screen you will see there, is login screen. Default password is admin/admin, but if you want can you can change that in .env file (need to be done when staring docker)

From the main dashboard named: Home (generated) you can go to any table specific dashboard, just by clicking tile that shows stats for given table

Airflow

Airflow should be running and available under: http://localhost:8080/ (or you docker IP, default password is also admin/admin if it wasn't changed in .env)

You should see validation dag there, turn in on and it will start running (every 10 minutes or other frequency if specified in settings.py file)

You can also manually trigger running dag (by clicking first icon on Link tab)

Deploying on production

Redata uses docker and docker-compose for deployment, this makes it easy to deploy in the cloud, or in your on premise enviroment.

Look at sample setup instructions for specfic cloud providers:

Community

Join Slack for general questions about using redata, problems, and discussions with people making it :)

Integrations

Here are integrations we support or work on now. Let us know if you'd really like to pritize something or your DB is not included on the list.

Integration Status
PostgreSQLSupported
MySQLSupported
ExasolSupported
BigQuerySupported
Apache AirflowSupported, view all your checks in Airflow
GrafanaSupported, view metrics here
Other SQL DBsExperimental support via using SQLAlchemy
AWS RedshiftIn development
AWS S3In development
ExcelPlanned
SnowflakePlanned

License

Redata is licensed under the MIT license. See the LICENSE file for licensing information.

Contributing

We love all contributions, bigger and smaller.

Checkout our list of good first issues and see if you like anything from there. Also feel welcome to join our Slack and suggest ideas, or setup no pressure session with Redata here.

More details on how to tests your changes under: CONTRIBUTING

About

Monitoring system for data teams. Computing health checks on data, visualizing and alerting on them in Grafana.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.4%
  • Shell 1.9%
  • Other 0.7%