Skip to content
Dor Laor edited this page Jun 25, 2015 · 10 revisions

Info

Collectd is a deamon / set of plug-ins to collect and aggregate counter metrics from various sources for either (semi-)realtime or later viewing and analysis.

Installing

  1. On fedora (and probably other redhat derivatives): sudo yum install collectd
  2. In the office we have a preinstalled collectd+graphite you can use
    1. system.cloudius:18080
    2. run seastar with --collectd=1 --collectd-address=10.0.0.4:25826 --collectd-hostname=<your_host_name>
  3. Install locally on your descktop using docker
    1. sudo docker pull lopter/collectd-graphite
    2. sudo docker run --net=host lopter/collectd-graphite
    3. web localhost:8080
    4. run seastar with --collectd=1 --collectd-address=127.0.0.1:25826

Config (collectd server)

Vanilla installation will neither listen nor record any data. Again, on fedora-ish installs, edit a file /etc/collectd.d/<myconfig>.conf. A minimal setup (according to manual) to get data collected and stored would be:

LoadPlugin network
LoadPlugin logfile
LoadPlugin rrdtool

<Plugin "network">
    Listen "0.0.0.0" "25826"
</Plugin>
<Plugin "rrdtool">
    DataDir "/var/lib/collectd/rrd"
    CacheFlush 120
    WritesPerSecond 50
</Plugin>

The network plugin section tells the daemon to listen for data packets on unicast port 25826 (you can also set it to listen on the default multicast address, but that is not yet supported by the seastar IP stack). Note that multiple "Listen" entries can be added to gather data from more than one ip/interface.

The RRD tool section will enable data being written in this format, organized by host, plugin, plugin-instance, type and type-instance. From this it can be later plotted and analyzed.

(Note: the rrd files will be incrementally written to, using caching, flushing and whatnot, thus "realtime" graphing via this will have quite some delay, which can (probably) be tuned by the above cache and writing parameters. At the price of performance (?))

Remember to permit 25826/udp through the server firewall (if any).

Taken from Collectd networking and Collectd rrd plugin

Starting

sudo systemctl start collectd

Config (seastar -- collectd client)

The command line options

--collectd 1 --collectd-address my.ip.ad.dr:25826 --collectd-hostname a-unique-id

will send data to the collectd server previously configured, assuming that my.ip.ad.dr is reachable from the seastar IP stack. With dpdk, this is limited to any machine on the same subnet (which excludes the host seastar is running on). With virtio, the seastar host is also reachable, as the IP address of the bridge seastar is using.

Note these options can be stored in ~/.config/seastar/seastar.conf.

Generate graphs

See related projects for a list of visualizers. Collectd also comes with a (sample) visualizer package, using perl + webserver to graph collected data. On (again) fedora-ish it can be installed by sudo yum install collectd-web and sart with sudo systemctl start httpd. Now you can navigate counter sets and get graphs by going to http://<your-host>/collectd/. To allow remote access to this url edit /etc/httpd/conf.d/collectd.conf and replace all Require local with Require all granted before starting the httpd server.

Adding counters

Metric data is recorded through primitive counters which are registered under an ID comprised of:

<plugin> The component or subsystem collect for. For example "cpu", or "interface" (network)
<plugin-instance> (Optional) The individual instance of a plugin being collected. For example, for the cpu case, this would simply be 0, 1, 2... etc. For network interfaces this would be the the interface name (eth0).
<type> The [data type](http://collectd.org/documentation/manpages/types.db.5.shtml) being collected. New data types can be defined (and you do not really need to use a defined type at all, but the RRD plugin for example will not work with types not pre-defined. Either select one of the built-in types from `/usr/share/collectd/types.db`, or create a new database (but this requires all clients to use this db and enable it in config, so using existing types is highly recommended).

For cpu, the cpu type is highly appropriate. It defines that the counter consists of a single value which is an absolute (instant) value.

For a network interface, several types exist, for example if_packets which is defined as two values, RX and TX. Both derivate, which basically means that when looking at the value we are interested in the difference of the value now compared to before, i.e. the derivate.

<type-instance> (Optional) For the cpu example: `idle`, `user`, `kernel` etc.

For a usage example, you can check out reactor.cc:

    uint64_t tasks_processed = 0;
    scollectd::registration regs[] = {
            // queue_length     value:GAUGE:0:U
            // Absolute value of num tasks in queue.
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "queue_length", "tasks-pending")
                    , scollectd::make_typed(scollectd::data_type::GAUGE
                            , std::bind(&decltype(_pending_tasks)::size, &_pending_tasks))
            ),
            // total_operations value:DERIVE:0:U
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "total_operations", "tasks-processed")
                    , scollectd::make_typed(scollectd::data_type::DERIVE, tasks_processed)
            ),
            // queue_length     value:GAUGE:0:U
            // Absolute value of num timers in queue.
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "queue_length", "timers-pending")
                    , scollectd::make_typed(scollectd::data_type::GAUGE
                            , std::bind(&decltype(_timers)::size, &_timers))
            ),
    };

scollectd::registration is an anchor type which will ensure the counter is removed once the anchor goes out of scope. In the reactor loop case, the counters only exist in the run() function. In most other cases they and the anchors would be type members.

##misc Since UDP packets may get lost on the internet you can capture collectd locally (or on a near by server) and access it from remote using ssh tunnel: ssh -L 8080:localhost:8080 XYZ