Harden against monitor db failures #1545
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Do not crash if the monitor database is not available
While monitoring database is important, it isn't critical for the entire system
to function. Even the monitor itself can continue to work in degraded mode. So,
if the monitoring database is not available, we should not crash the entire
system. This commit catches Sequel::DatabaseConnectionError that can be raised
while trying to connect to the monitoring database and logs an error message.
Ignore errors while trying to save last_known_lsn
Even if the last_known_lsn cannot saved (potentially due to unavailability of
the monitoring database), the monitor should still be able to record pulses.
Otherwise, pulse checking would stop for all PostgreSQL databases when the
monitoring database is down. This commit ensures that we properly handle the
exceptions that can be raised when trying to save the last_known_lsn.
Of course, we shouldn't perform failovers if the last_known_lsn is unknown.
That is still the case, because the last_known_lsn is only used to at the
time of failover to determine the failover target.