-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checks executed twice and no recovery notifications are sent #9995
Comments
Do those frames correlate with your Director deployments? |
We don't use Director. But anyway no, and it also happened other few times with less servers (even one single event at the time) |
We have experienced this additional few times. We have no clue how to debug it (db query?), or how to replicate it, therefore any suggestions are appreciated. Just to sum up, and add 2 additional new info:
|
Describe the bug
We have observed a couple of times in the last 3 weeks a weird behaviour where the checks are performed twice, the notifications sent twice (at least the Problem one), but at the same time we also saw that no Recovery notifications were ever sent.
Every time it happened in a small time frame (for e.g. between 8am and 9am), on different number of servers/services with no common pattern between them.
The
checker
andnotifications
features are enable in HA on both master. On both of them, from the icinga2.log (is it normal that they log the same? are they doing the same action in parallel?) I see the following lines, where a Problem notification is sent but not the Recovery one:Screenshots
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): r2.14.1-1icinga2 feature list
): api-users api checker command graphite ido-mysql mainlog notificationicinga2 daemon -C
): OKzones.conf
file:Additional context
SLES12.5
(Icinga 2.10.3) toRHEL9
(Icinga 2.14.0) around 2 months agojemalloc-5.2.1-2.el9.x86_64
We have started to see the error in the last 3 weeks, but we don't know if it was introduced by the last minor update to 2.14.1, or if it was already present since the first migration, but as we had fewer servers and less important, it might have been ignored.
The text was updated successfully, but these errors were encountered: