Go-based tooling used to monitor processes.
- Project home
- Overview
- Features
- Changelog
- Requirements
- Installation
- Configuration
- Process states
- Examples
- License
- References
See our GitHub repo for the latest code, to file an issue or submit improvements for review and potential inclusion into the project.
This repo is intended to provide various tools used to monitor processes.
Tool Name | Overall Status | Description |
---|---|---|
check_process |
Alpha | Nagios plugin used to monitor processes for problematic states. |
lsps |
Alpha | Small CLI tool to list processes with known problematic states. |
Initial support has been added for emitting Performance Data / Metrics, but refinement suggestions are welcome.
Consult the table below for the metrics implemented thus far.
Please add to an existing Discussion thread (if applicable) or open a new one with any feedback that you may have. Thanks in advance!
Emitted Performance Data / Metric | Meaning |
---|---|
time |
Runtime for plugin |
problem_processes |
Number of overall "problem" processes |
running |
Number of running processes |
sleeping |
Number of sleeping processes |
uninterruptible_disk_sleep |
Number of (uninterruptible) disk sleep processes |
stopped |
Number of sleeping processes |
zombie |
Number of zombie processes |
dead |
Number of dead processes |
tracing_stop |
Number of tracing stop processes |
wakekill |
Number of wakekill processes |
waking |
Number of waking processes |
idle |
Number of idle processes |
parked |
Number of parked processes |
NOTE: Not all process types are available for all kernel versions. Consult the Known states section for more information.
Nagios plugin (check_process
) used to monitor for problematic process states
on Linux distros.
NOTE: The intent is to support multiple operating systems, but as of this writing Linux is the only supported OS
-
Optional branding "signature"
- used to indicate what Nagios plugin (and what version) is responsible for the service check result
-
Optional, leveled logging using
rs/zerolog
packagelogfmt
format output (tostderr
)- choice of
disabled
,panic
,fatal
,error
,warn
,info
(the default),debug
ortrace
NOTE: This tool ignores its own process entry when reporting running processes.
Small CLI tool to list processes with known problematic processes.
-
Optional expanded or "all" listing of processes grouped by process state
- NOTE: This may produce a LOT of output
-
Optional branding "signature"
- used to indicate what Nagios plugin (and what version) is responsible for the service check result
-
Optional, leveled logging using
rs/zerolog
packagelogfmt
format output (tostderr
)- choice of
disabled
,panic
,fatal
,error
,warn
,info
(the default),debug
ortrace
NOTE: This tool ignores its own process entry when reporting running processes.
See the CHANGELOG.md
file for the changes associated with
each release of this application. Changes that have been merged to master
,
but not yet an official release may also be noted in the file under the
Unreleased
section. A helpful link to the Git commit history since the last
official release is also provided for further review.
The following is a loose guideline. Other combinations of Go and operating systems for building and running tools from this repo may work, but have not been tested.
- Go
- see this project's
go.mod
file for preferred version - this project tests against officially supported Go
releases
- the most recent stable release (aka, "stable")
- the prior, but still supported release (aka, "oldstable")
- see this project's
- GCC
- if building with custom options (as the provided
Makefile
does)
- if building with custom options (as the provided
make
- if using the provided
Makefile
- if using the provided
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 8
- Ubuntu 20.04
- Download Go
- Install Go
- Clone the repo
cd /tmp
git clone https://github.com/atc0005/check-process
cd check-process
- Install dependencies (optional)
- for Ubuntu Linux
sudo apt-get install make gcc
- for CentOS Linux
sudo yum install make gcc
- for Ubuntu Linux
- Build
- manually, explicitly specifying target OS and architecture
GOOS=linux GOARCH=amd64 go build -mod=vendor ./cmd/check_process/
GOOS=linux GOARCH=amd64 go build -mod=vendor ./cmd/lsps/
- most likely this is what you want (if building manually)
- substitute
amd64
with the appropriate architecture if using different hardware (e.g.,arm64
or386
)
- using Makefile
linux
recipemake linux
- generates x86 and x64 binaries
- using Makefile
release-build
recipemake release-build
- generates the same release assets as provided by this project's releases
- manually, explicitly specifying target OS and architecture
- Locate generated binaries
- if using
Makefile
- look in
/tmp/check-process/release_assets/check_process/
- look in
/tmp/check-process/release_assets/lsps/
- look in
- if using
go build
- look in
/tmp/check-process/
- look in
- if using
- Copy the applicable binaries to whatever systems needs to run them so that they can be deployed
NOTE: Depending on which Makefile
recipe you use the generated binary
may be compressed and have an xz
extension. If so, you should decompress the
binary first before deploying it (e.g., xz -d check_process-linux-amd64.xz
).
- Download the latest release binaries
- Decompress binaries
- e.g.,
xz -d check_process-linux-amd64.xz
- e.g.,
- Copy the applicable binaries to whatever systems needs to run them so that they can be deployed
NOTE:
DEB and RPM packages are provided as an alternative to manually deploying binaries.
- Place
check_process
in a location where it can be executed by the monitoring agent- Usually the same place as other Nagios plugins
- For example, on a default Red Hat Enterprise Linux system using
check_nrpe
thecheck_process
plugin would be deployed to/usr/lib64/nagios/plugins/check_process
or/usr/local/nagios/libexec/check_process
- Place
lsps
in a location where it can be easily accessed- Usually the same place as other custom tools installed outside of your package manager's control
- e.g.,
/usr/local/bin/lsps
NOTE:
DEB and RPM packages are provided as an alternative to manually deploying binaries.
- Use the
-h
or--help
flag to display current usage information. - Flags marked as
required
must be set via CLI flag. - Flags not marked as required are for settings where a useful default is already defined, but may be overridden if desired.
Flag | Required | Default | Repeat | Possible | Description |
---|---|---|---|---|---|
branding |
No | false |
No | branding |
Toggles emission of branding details with plugin status details. This output is disabled by default. |
h , help |
No | false |
No | h , help |
Show Help text along with the list of supported flags. |
version |
No | false |
No | version |
Whether to display application version and then immediately exit application. |
ll , log-level |
No | info |
No | disabled , panic , fatal , error , warn , info , debug , trace |
Log message priority filter. Log messages with a lower level are ignored. |
Flag | Required | Default | Repeat | Possible | Description |
---|---|---|---|---|---|
branding |
No | false |
No | branding |
Toggles emission of branding details with plugin status details. This output is disabled by default. |
h , help |
No | false |
No | h , help |
Show Help text along with the list of supported flags. |
version |
No | false |
No | version |
Whether to display application version and then immediately exit application. |
show-all |
No | false |
No | show-all |
Toggles listing of all processes. WARNING: This may produce a LOT of output. Disabled by default. |
ll , log-level |
No | info |
No | disabled , panic , fatal , error , warn , info , debug , trace |
Log message priority filter. Log messages with a lower level are ignored. |
Red Hat Enterprise Linux 6 running a 2.6.32 version kernel is the baseline test environment for this project.
The valid process states for a 2.6.32 kernel differs from the process states for a 3.10 kernel (RHEL 7) which in turn differs from a 4.18 (RHEL 8) and newer kernel. This project attempts to evaluate processes in all supported states. In an effort to simplify use, some assumptions are made regarding which process states map to which monitoring plugin state.
The state details in this section were pulled directly from the source code for each of the upstream kernel versions for RHEL releases that this project was tested against. See the References section for additional details.
/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char *task_state_array[] = {
"R (running)", /* 0 */
"S (sleeping)", /* 1 */
"D (disk sleep)", /* 2 */
"T (stopped)", /* 4 */
"T (tracing stop)", /* 8 */
"Z (zombie)", /* 16 */
"X (dead)" /* 32 */
};
/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char * const task_state_array[] = {
"R (running)", /* 0 */
"S (sleeping)", /* 1 */
"D (disk sleep)", /* 2 */
"T (stopped)", /* 4 */
"t (tracing stop)", /* 8 */
"Z (zombie)", /* 16 */
"X (dead)", /* 32 */
"x (dead)", /* 64 */
"K (wakekill)", /* 128 */
"W (waking)", /* 256 */
"P (parked)", /* 512 */
};
/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char * const task_state_array[] = {
/* states in TASK_REPORT: */
"R (running)", /* 0x00 */
"S (sleeping)", /* 0x01 */
"D (disk sleep)", /* 0x02 */
"T (stopped)", /* 0x04 */
"t (tracing stop)", /* 0x08 */
"X (dead)", /* 0x10 */
"Z (zombie)", /* 0x20 */
"P (parked)", /* 0x40 */
/* states beyond TASK_REPORT: */
"I (idle)", /* 0x80 */
};
/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char * const task_state_array[] = {
/* states in TASK_REPORT: */
"R (running)", /* 0x00 */
"S (sleeping)", /* 0x01 */
"D (disk sleep)", /* 0x02 */
"T (stopped)", /* 0x04 */
"t (tracing stop)", /* 0x08 */
"X (dead)", /* 0x10 */
"Z (zombie)", /* 0x20 */
"P (parked)", /* 0x40 */
/* states beyond TASK_REPORT: */
"I (idle)", /* 0x80 */
};
// kernel 2.6.32 (RHEL 6)
"R (running)"
"S (sleeping)"
"D (disk sleep)"
"T (stopped)"
"T (tracing stop)"
"Z (zombie)"
"X (dead)"
// kernel 3.10 (RHEL 7)
"R (running)"
"S (sleeping)"
"D (disk sleep)"
"T (stopped)"
"t (tracing stop)"
"Z (zombie)"
"X (dead)"
"x (dead)"
"K (wakekill)"
"W (waking)"
"P (parked)"
// kernel 4.18/5.14 (RHEL 8/9)
"R (running)"
"S (sleeping)"
"D (disk sleep)"
"T (stopped)"
"t (tracing stop)"
"X (dead)"
"Z (zombie)"
"P (parked)"
"I (idle)"
Process State | Monitoring State |
---|---|
D (disk sleep) |
CRITICAL |
Z (zombie) |
WARNING |
This output is emitted by the plugin when no problematic processes are found.
$ ./check_process
OK: No problematic processes found (364 evaluated)
Process Summary:
- R (running) [1]
- S (sleeping) [363]
--------------------------------------------------
Problems:
- None
| 'dead'=0;;;; 'idle'=0;;;; 'parked'=0;;;; 'problem_processes'=0;;;; 'running'=1;;;; 'sleeping'=363;;;; 'stopped'=0;;;; 'time'=18ms;;;; 'tracing_stop'=0;;;; 'uninterruptible_disk_sleep'=0;;;; 'wakekill'=0;;;; 'waking'=0;;;; 'zombie'=0;;;;
Regarding the output:
- The last line beginning with a space and the
|
symbol are performance data metrics emitted by the plugin. Depending on your monitoring system, these metrics may be collected and exposed as graphs/charts. - This output was captured on a Red Hat Enterprise Linux 6 system (baseline OS for testing). The output is comparable to other Linux distros.
This output is emitted by the plugin when problematic processes of a WARNING state are found.
TODO: Provide example output when this scenario is encountered.
This output is emitted by the plugin when problematic processes of a CRITICAL state are found.
In the case of the rsync
entries below, the activity is fairly normal for
this system (daily, early AM backups). To work around this, you can either
modify the timeperiod used for notifications to exclude this scenario (until
D
state processes are found outside of that window) or increase the number
of retries so that an alert is not raised until after all retry attempts have
been exceeded.
$ ./check_process
CRITICAL: 2 problematic processes found (D (disk sleep) [2], R (running) [7], S (sleeping) [368], evaluated [377])
Process Summary:
- D (disk sleep) [2]
- R (running) [7]
- S (sleeping) [368]
--------------------------------------------------
Problems:
- Name: rsync [Parent: backup.sh (6761), State: D (disk sleep), Pid: 16431, PPid: 6761, Threads: 1]
- Name: rsync [Parent: backup.sh (6761), State: D (disk sleep), Pid: 18321, PPid: 6761, Threads: 1]
| 'dead'=0;;;; 'idle'=0;;;; 'parked'=0;;;; 'problem_processes'=0;;;; 'running'=7;;;; 'sleeping'=368;;;; 'stopped'=0;;;; 'time'=18ms;;;; 'tracing_stop'=0;;;; 'uninterruptible_disk_sleep'=2;;;; 'wakekill'=0;;;; 'waking'=0;;;; 'zombie'=0;;;;
See the LICENSE file for details.
- proc filesystem (usually mounted at
/proc
) - valid process states
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?h=v2.6.32#n136
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?h=v3.10#n135
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?h=v4.18#n130
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?h=v5.14#n130