Skip to content

ubun3t/telegraf-erd-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

86 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Monitor ERD Node with Telegraf+Grafana

In this guide we are going to see how to install and configure Grafana + Influxdb + Telegraf to monitor a Elrond node based on Ubuntu 18.04.

Watch the video

Prerequisites ๐Ÿ“‹

This document does not cover Ubuntu installation or Elrond node. There are very good guides for this.

Install a Elrond node :

https://docs.elrond.com/validators/system-requirements

Agenda

  1. Add necessary repositories to install Grafana + Influxdb + Telegraf

  2. Install packages. Depending on your design, everything will be done on the same server where you have the node or on separate nodes. Telegraf should always run on the node. Grafana and Influxdb can run outside on another server.

  3. Create database on Influxdb + login user.

  4. Configure Telegraf to read node information and send it to the newly created Influxdb database.

  5. Configure Grafana and add the newly created data source of Influxdb to query the data that is stored there.

  6. Import dashboard to have useful information on node status.

  7. Alerts via Telegram.

Starting ๐Ÿš€

We are going to add the necessary repositories:

Influxdb + Telegraf :

wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

Grafana :

We add the stable branch of the enterprise version that has the same as the "open source" but allows us to subscribe at any time in the future without doing anything.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/enterprise/deb stable main"

To get the add-apt-repository command, install the software-properties-common package:

sudo apt-get install software-properties-common

2. Install packages

Influxdb + Telegraf :

sudo apt-get update && sudo apt-get install apt-transport-https
sudo apt-get update && sudo apt-get install telegraf influxdb
sudo service telegraf start 
sudo service influxdb start 

Grafana : (https://grafana.com/docs/grafana/latest/installation/debian/)

sudo apt-get install -y software-properties-common wget
sudo apt-get update
sudo apt-get install grafana
sudo service grafana-server start

Configure the Grafana server to start at boot:

sudo systemctl enable grafana-server.service

3. We are going to create the database in Influxdb so that Telegraf can save all the information related to the node

With this command we enter the influxdb console to be able to launch commands, create databases, users, etc.

     influx 

We create the database called "telegraf".

    create database telegraf   

We create the user "telegraf" with password "whatever". Here you can put the user / pass you want, it is not relevant. We will use it in the telegraf.conf file to insert the database.

    create user telegraf with password 'password-change'  

Show available databases, including ours :

     show databases                    
        > show databases
        name: databases
        name
        ----
        _internal
        telegraf

Users :

     show users                        
        user     admin
        ----     -----
        telegraf false

The default port for influx is 8086 for http and 8088 for RPC. If you run 7 or 9 nodes in same machine you will have a conflict with your node 7 or 9 because this node will try to use the same ports, 8086/8088. In this case you can change the influxdb listens ports to another ports.

    cd /etc/influxdb/
    vim /etc/influxdb/influxdb.conf
# Bind address to use for the RPC service for backup and restore.
# bind-address = "127.0.0.1:8088"

# The bind address used by the HTTP service.
# bind-address = ":8086"

4. Config Telegraf

Now that we have influxdb waiting for data, we are going to configure telegraf to read node metrics and send them to the database. The telegraf configuration file is at "/etc/telegraf/telegraf.conf". This file by default has many "inputs" that allow metrics to be read from all kinds of services (mysql, apache, nginx, postfix, network, cpu, etc ...). We are going to save this file as a backup and we are going to create a cleaner file from 0 and only with the inputs that we need. This will make everything easier :)

 cd /etc/telegraf
 mv telegraf.conf telegraf.conf_ori
 vim telegraf.conf
  ##################### Global Agent Configuration #########################
    [agent]
    hostname = "erd.node"           
    flush_interval = "60s"        
    interval = "60s"               

    # Input Plugins                
    [[inputs.cpu]]
        percpu = true
        totalcpu = true
        collect_cpu_time = false
        report_active = false
    [[inputs.disk]]
        ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
    [[inputs.io]]
    [[inputs.mem]]
    [[inputs.net]]
    [[inputs.system]]
    [[inputs.swap]]
    [[inputs.netstat]]
    [[inputs.processes]]
    [[inputs.kernel]]

    # Output Plugin InfluxDB       
    [[outputs.influxdb]]           
    database = "telegraf"          
    urls = [ "http://127.0.0.1:8086" ]
    username = "telegraf"        
    password = "loquesea"         
    
    [[inputs.exec]]                     
    commands = ["/etc/telegraf/check_erd_node_metrics_0"]
    timeout = "5s"                        
    name_override = "node0_stats"       
    data_format = "json"            
    json_string_fields = ["erd_node_type","erd_peer_type"]

Important points of the file :

Name that you want to send to the database and it will be the one that you later use in the queries in Grafana.

hostname = "erd.node"  

Interval, how often do you want to read the info.

interval = "60s"

InfluxDB connection. We are going to declare an "output" based on influexdb to tell telegraf to use it to store metrics there. We use the data from point 3 of this guide.

If our server influxdb is on the same machine as telegraph.

urls = [ "http://127.0.0.1:8086" ]   

If we have installed influxdb and grafana on another server

urls = [ "http://YOUR-SERVER-IP:8086" ]  

We are going to define an "input" of the exec type. This type of input is a plugin that tells telegraf that it must execute a command in each interval and it does an "output" in the format that we call it.

[[inputs.exec]]

Path to the script that will serve as input and will give us data to send to influxdb. You can put the name you want.

commands = ["/etc/telegraf/check_erd_node_metrics_0"]

Metric name. This name is the one that we are going to see in Grafana and the one that we are going to select to access all the metrics. Imagine it as a table within the database.

name_override = "node0_stats"

Important: format in which we will receive the script information. In our case it will be json.

data_format = "json"

This option allows us to send text strings as output. Without this config the variables read erd_node_type "," erd_peer_type "that have text strings would not be stored and we would not have them available in grafana to be able to show them in our dashboards.

json_string_fields = ["erd_node_type","erd_peer_type"]

If we wanted to configure more nodes you should add more inputs with: the script that will read from the node and we change the name. For example :

  [[inputs.exec]]
    commands = ["/etc/telegraf/check_erd_node_metrics_1"]
    timeout = "5s"
    name_override = "node1_stats"
    data_format = "json"
    json_string_fields = ["erd_node_type","erd_peer_type"]

    [[inputs.exec]]
    commands = ["/etc/telegraf/check_erd_node_metrics_2"]
    timeout = "5s"
    name_override = "node2_stats"
    data_format = "json"
    json_string_fields = ["erd_node_type","erd_peer_type"]

4.1 Script to read node information..

By default, when installing a node, several directories are created within the home of the user that we have used to install. One of these folders is "/ elrond-utils" where we have two tools that help us have a real-time view of the node using the CLI: logviewer and termui. Each node when it starts launches a service listening on port 8080 for the first node, 8081 for the second node, 808X for the following ones. We can access that service using the following command:

cd /home/tu-usuario/elrond-utils/
./termui -address localhost:8080

What the check_erd_node_metrics_X script does is make use of that information in a very simple way:

cd /etc/telegraf/
vim check_erd_node_metrics_0   

We paste the following content:

#!/bin/bash

OUTPUT=`curl -s 127.0.0.1:8080/node/status 2>/dev/null | jq ".data // empty"` # returns "" when null  
  
ret=$?
if [ -z "${OUTPUT}" ] || [ ${ret} -ne 0 ]; then
   echo "NODE NOT RUNNING!!"
   exit 2 
fi
echo ${OUTPUT}

Save the changes, make the file executable and make the owner telegraf:

chmod +x check_erd_node_metrics_0
chown telegraf check_erd_node_metrics_0 

We test that everything works:

sudo telegraf telegraf --config telegraf.conf

If you see something like the following, it is that everything went well:

2020-05-17T17:57:32Z I! Starting Telegraf 1.14.2
2020-05-17T17:57:32Z I! Using config file: /etc/telegraf/telegraf.conf
2020-05-17T17:57:32Z I! Loaded inputs: exec exec exec diskio net swap kernel netstat processes cpu disk mem system
2020-05-17T17:57:32Z I! Loaded aggregators: 
2020-05-17T17:57:32Z I! Loaded processors: 
2020-05-17T17:57:32Z I! Loaded outputs: influxdb
2020-05-17T17:57:32Z I! Tags enabled: host=erd.node
2020-05-17T17:57:32Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"erd.node", Flush Interval:1m0s

5. Configurar Grafana.

Grafana by default listens on port 3000. So you will have to write the ip-your-server: 3000 to access its web environment:

 http://IP-address:3000

login

The default username and password are admin / admin. It will ask you to change the password.

password

data_source

Now we have to add a data source: InfluxDB in our case.

InfluxDB Settings Grafana

6. Import the dashboard.

Use the .json erd_dashboard.json that I share as a template to quickly have information on your dashboard. You will have to make some adjustments in the queries of the different graphs if you have given another name to your node.

import_json

Dashboard :

erd_dashboard_02erd_node_performace

7. Alerts via Telegram.

To receive notifications on telegram weโ€™ll need to create a new Telegram bot.

Create your bot

Open your telegram app and search for the user @BotFather and write this message:

/newbot

This is a command that tells the @BotFather to create you a new bot.

telegram_bot

Save your "Token ID". Now, create a new group in telegram, for example : Erd Alerts. Add to this group your bot, in this example "My first bot" was the name that used it. To know your chat-id you can add a @RawDataBot. This bot send to group a message with all info related to group. Something like this :

"chat": {
        "id": -457484388,    <-- this is your chat-id
        "title": "Alerts ERD",
        "type": "group",

Now in Grafana we go to create a new "Notification Channel". This image explain itself.

erd_node_telegram

If you want to see a image in your telegram alerts you need install a plugin and some dependecies on Ubuntu.

grafana-cli plugins install grafana-image-renderer
sudo apt install libx11-6 libx11-xcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrender1 libxtst6 libglib2.0-0 libnss3 libcups2  libdbus-1-3 libxss1 libxrandr2 libgtk-3-0 libgtk-3-0 libasound2

Creating Alerts.

The .json has configured somes alerts but if you want to know how this works, visit https://grafana.com/docs/grafana/latest/alerting/create-alerts/

Add or edit an alert rule

  1. Navigate to the panel you want to add or edit an alert rule for, click the title, and then click Edit.
  2. On the Alert tab, click Create Alert. If an alert already exists for this panel, then you can just edit the fields on the Alert tab.
  3. Fill out the fields. Descriptions are listed below in Alert rule fields.
  4. When you have finished writing your rule, click Save in the upper right corner to save alert rule and the dashboard.
  5. (Optional but recommended) Click Test rule to make sure the rule returns the results you expect.

Rule

  1. Name - Enter a descriptive name. The name will be displayed in the Alert Rules list.
  2. Evaluate every - Specify how often the scheduler should evaluate the alert rule. This is referred to as the evaluation interval.
  3. For - Specify how long the query needs to violate the configured thresholds before the alert notification triggers.

grafana_alerts

About

Monitor your ERD Node with Telegraf+Grafana

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages