This datasource is to visualise time-series data stored in Cassandra/DSE, if you are looking for Cassandra metrics, you may need datastax/metric-collector-for-apache-cassandra instead.
To see the datasource in action, please follow the Quick Demo steps. Documentation is available here
Supports:
- Grafana
- 7.4+, 8.x, 9.x, 10.x are fully supported (plugin version 3.x)
- 5.x, 6.x, 7.0-7.3 are deprecated (works with plugin versions 1.x/2.x, but we recommend upgrading)
- Cassandra 3.x, 4.x, 5.x
- DataStax Enterprise 6.x
- DataStax Astra (docs)
- AWS Keyspaces (limited support) (docs)
- Linux, OSX (incl. M1), Windows
Features:
- Connect to Cassandra using auth credentials and TLS
- Query configurator
- Raw CQL query editor
- Table mode
- Variables
- Annotations
- Alerting
Contacts:
TOC
You can find more detailed instructions in the datasource wiki.
- Install the plugin using grafana console tool:
grafana-cli plugins install hadesarchitect-cassandra-datasource
. The plugin will be installed into your grafana plugins directory; the default is/var/lib/grafana/plugins
. Alternatively, download the plugin using latest release, please downloadcassandra-datasource-VERSION.zip
and uncompress a file into the Grafana plugins directory (grafana/plugins
). - Add the Apache Cassandra Data Source as a data source at the datasource configuration page.
- Configure the datasource specifying contact point and port like "10.11.12.13:9042", username and password. It's strongly recommended to use a dedicated user with read-only permissions only to the table you have to access.
- Push the "Save and Test" button, if there is an error message, check the credentials and connection.
There are two ways to query data from Cassandra/DSE, Query Configurator and Query Editor. Configurator is easier to use but has limited capabilities, Editor is more powerful but requires understanding of CQL.
Query Configurator is the easiest way to query data. At first, enter the keyspace and table name, then pick proper columns. If keyspace and table names are given correctly, the datasource will suggest the column names automatically.
- Time Column - the column storing the timestamp value, it's used to answer "when" question.
- Value Column - the column storing the value you'd like to show. It can be the
value
,temperature
or whatever property you need. - ID Column - the column to uniquely identify the source of the data, e.g.
sensor_id
,shop_id
or whatever allows you to identify the origin of data.
After that, you have to specify the ID Value
, the particular ID of the data origin you want to show. You may need to enable "ALLOW FILTERING" although we recommend to avoid it.
Example Imagine you want to visualise reports of a temperature sensor installed in your smart home. Given the sensor reports its ID, time, location and temperature every minute, we create a table to store the data and put some values there:
CREATE TABLE IF NOT EXISTS temperature (
sensor_id uuid,
registered_at timestamp,
temperature int,
location text,
PRIMARY KEY ((sensor_id), registered_at)
);
insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:21:59.001+0000, 18, "kitchen");
insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:22:59.001+0000, 19, "kitchen");
insert into temperature (sensor_id, registered_at, temperature, location) values (99051fe9-6a9c-46c2-b949-38ef78858dd0, 2020-04-01T11:23:59.001+0000, 20, "kitchen");
In this case, we have to fill the configurator fields the following way to get the results:
- Keyspace - smarthome (keyspace name)
- Table - temperature (table name)
- Time Column - registered_at (occurence)
- Value Column - temperature (value to show)
- ID Column - sensor_id (ID of the data origin)
- ID Value - 99051fe9-6a9c-46c2-b949-38ef78858dd0 ID of the sensor
- ALLOW FILTERING - FALSE (not required, so we are happy to avoid)
In case of a few origins (multiple sensors) you will need to add more rows. If your case is as simple as that, query configurator will be a good choice, otherwise please proceed to the query editor.
Query Editor is more powerful way to query data. To enable query editor, press "toggle text edit mode" button.
Query Editor unlocks all possibilities of CQL including Used-Defined Functions, aggregations etc.
Example (using the sample table from the Query Configurator case):
SELECT sensor_id, temperature, registered_at, location FROM test.test WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd1, 99051fe9-6a9c-46c2-b949-38ef78858dd0) AND registered_at > $__timeFrom and registered_at < $__timeTo
- Order of fields in the SELECT expression doesn't matter except
ID
field. This field used to distinguish different time series, so it is important to keep it or any other column with low cardinality on the first position.
- Identifier - the first property in the SELECT expression should be the ID, something that uniquely identifies the data (e.g.
sensor_id
) - Value - There should be at least one numeric value among returned fields, if query result will be used to draw graph.
- Timestamp - There should be one timestamp value, if query result will be used to draw graph.
- There could be any number of additional fields, however be cautious when using multiple numeric fields as they are interpreted as values by grafana and therefore are drawn on TimeSeries graph.
- Any field returned by query is available to use in
Alias
template, e.g.{{ location }}
. Datasource interpolates such strings and updates graph legend. - Datasource will try to keep all the fields, however it is not always possible since cassandra and grafana use different sets of supported types. Unsupported fields will be removed from response.
- To filter data by time, use
$__timeFrom
and$__timeTo
placeholders as in the example. The datasource will replace them with time values from the panel. Notice It's important to add the placeholders otherwise query will try to fetch data for the whole period of time. Don't try to specify the timeframe on your own, just put the placeholders. It's grafana's job to specify time limits.
In addition to TimeSeries mode datasource supports Table mode to draw tables using Cassandra query results. Use Merge
, Sort by
, Organize fields
and other transformations to shape the table in any desirable way.
There are two ways to plot not a whole timeseries but only last(most rescent) values.
- Inefficient way
In case if table created with default ascending ordering the most recent value is always stored in the end of partition. To retrieve it ORDER BY
and LIMIT
clauses must be used in query:
SELECT sensor_id, temperature, registered_at, location
FROM test.test
WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__timeFrom and registered_at < $__timeTo
ORDER BY registered_at
LIMIT 1
Note that WHERE IN ()
clause could not be used with ORDER BY
, so query must be duplicated for any additional sensor_id
.
- Efficient way
To query the most recent values efficiently ordering must be specified during the table creation:
CREATE TABLE IF NOT EXISTS temperature (
sensor_id uuid,
registered_at timestamp,
temperature int,
location text,
PRIMARY KEY ((sensor_id), registered_at)
) WITH CLUSTERING ORDER BY (registered_at DESC);
After that the most recent value will always be stored in the beginning of partition and could be queried with just LIMIT
clause:
SELECT sensor_id, temperature, registered_at, room_name
FROM test.test
WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd0, 99051fe9-6a9c-46c2-b949-38ef78858dd0)
AND registered_at > $__timeFrom AND registered_at < $__timeTo
PER PARTITION LIMIT 1
Note that PER PARTITION LIMIT 1
used instead of LIMIT 1
to query one row for each partition and not just one row total.
Grafana Variables documentation
Grafana Annotations documentation
Alerting is supported, however it has some limitations. Grafana does not support long(narrow) series in alerting, so query result must be converted to wide series before handing it over to grafana. Datasource performs it in pretty simple way - it creates labels using all the non-timeseries field and then removes that fields from response. Basically, this query(using example table)
SELECT sensor_id, temperature, registered_at, location
FROM test.test
WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd0, 99051fe9-6a9c-46c2-b949-38ef78858dd0)
AND registered_at > $__timeFrom AND registered_at < $__timeTo
will produce two wide series for alerting
99051fe9-6a9c-46c2-b949-38ef78858dd0 {location="kitchen", sensor_id="99051fe9-6a9c-46c2-b949-38ef78858dd0"}
99051fe9-6a9c-46c2-b949-38ef78858dd1 {location="bedroom", sensor_id="99051fe9-6a9c-46c2-b949-38ef78858dd1"}
More information on series types in grafana developers documentation.
Grafana Alerting documentation
Usually there are no problems - Cassandra can store timestamps using different formats as shown in documentation. However, it is not always enough. One of possible cases could be unix time, which is just number of seconds or milliseconds and usually stored as integer type.
- If time is stored as a number of milliseconds in a
bigint
column, then it should be converted into thetimestamp
type before return the data to grafana:
SELECT sensor_id, temperature, dateOf(maxTimeuuid(registered_at)), location
FROM test.test WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__timeFrom AND registered_at < $__timeTo
This query returns proper timestamp even if it stored as number of milliseconds.
- If time is stored as a number of seconds, then it is not possible to convert it into the timestamp natively, but there is a trick:
SELECT sensor_id, temperature, dateOf(maxTimeuuid(registered_at*1000)), location
FROM test.test WHERE sensor_id = 99051fe9-6a9c-46c2-b949-38ef78858dd0
AND registered_at > $__unixEpochFrom AND registered_at < $__unixEpochTo
- There are two important parts in this query:
dateOf(maxTimeuuid(registered_at*1000))
used to convert seconds to milliseconds(registered_at*1000
) and then to convert milliseconds totimestamp
type, which is handed over to grafana.$__unixEpochFrom
and$__unixEpochTo
are variables with unix time in the seconds format that are used to fill out conditions part of the query.
Cassandra stores data in partitions
which are minimal storage units for the DB. It means that using the example table
CREATE TABLE IF NOT EXISTS temperature (
sensor_id uuid,
registered_at timestamp,
temperature int,
PRIMARY KEY ((sensor_id), registered_at)
);
will lead to partitions bloating and performance degradation, because all the data for all time for specific sensor_id
is stored in just one partition(first part of PRIMARY KEY
is PARTITION KEY
).
To avoid that there is a technique called bucketing
, which basically means that partitions are split up into smaller pieces.
For instance, we can split that example table partitions by time: year, month, day, or even hour and less. What to choose depends on how
much data stored in each partition. To achieve that the example table has to be modified like this:
CREATE TABLE IF NOT EXISTS temperature (
sensor_id uuid,
date date,
registered_at timestamp,
temperature int,
PRIMARY KEY ((sensor_id, date), registered_at)
);
After that change the database schema became more effective because of bucketing by date, and queries will have a form of
SELECT sensor_id, temperature, registered_at
FROM temperature
WHERE sensor_id IN (99051fe9-6a9c-46c2-b949-38ef78858dd1, 99051fe9-6a9c-46c2-b949-38ef78858dd0)
AND date = '${__from:date:YYYY-MM-DD}'
AND registered_at > $__timeFrom
AND registered_at < $__timeTo
Note that $__from
/$__to
variables are used. They are grafana built-in variables, and they have formatting capabilities which are perfect for our case.
In case when time range includes more than one day, each day has to be added into AND date IN (...)
predicate. Another way to make it more convenient is to consider using larger buckets, e.g. month instead of day-size.