Harden against monitor db failures #1545

byucesoy · 2024-05-03T19:48:44Z

Do not crash if the monitor database is not available
While monitoring database is important, it isn't critical for the entire system
to function. Even the monitor itself can continue to work in degraded mode. So,
if the monitoring database is not available, we should not crash the entire
system. This commit catches Sequel::DatabaseConnectionError that can be raised
while trying to connect to the monitoring database and logs an error message.

Ignore errors while trying to save last_known_lsn
Even if the last_known_lsn cannot saved (potentially due to unavailability of
the monitoring database), the monitor should still be able to record pulses.
Otherwise, pulse checking would stop for all PostgreSQL databases when the
monitoring database is down. This commit ensures that we properly handle the
exceptions that can be raised when trying to save the last_known_lsn.

Of course, we shouldn't perform failovers if the last_known_lsn is unknown.
That is still the case, because the last_known_lsn is only used to at the
time of failover to determine the failover target.

While monitoring database is important, it isn't critical for the entire system to function. Even the monitor itself can continue to work in degraded mode. So, if the monitoring database is not available, we should not crash the entire system. This commit catches Sequel::DatabaseConnectionError that can be raised while trying to connect to the monitoring database and logs an error message.

Even if the last_known_lsn cannot saved (potentially due to unavailability of the monitoring database), the monitor should still be able to record pulses. Otherwise, pulse checking would stop for all PostgreSQL databases when the monitoring database is down. This commit ensures that we properly handle the exceptions that can be raised when trying to save the last_known_lsn. Of course, we shouldn't perform failovers if the last_known_lsn is unknown. That is still the case, because the last_known_lsn is only used to at the time of failover to determine the failover target.

byucesoy added 2 commits May 3, 2024 21:25

byucesoy requested a review from a team May 3, 2024 19:48

byucesoy changed the title ~~Harden against monitor db failure~~ Harden against monitor db failures May 3, 2024

byucesoy self-assigned this May 3, 2024

furkansahin approved these changes May 6, 2024

View reviewed changes

byucesoy merged commit 8e080ab into main May 6, 2024
6 checks passed

byucesoy deleted the harden-against-monitor-db-failure branch May 6, 2024 11:07

github-actions bot locked and limited conversation to collaborators May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden against monitor db failures #1545

Harden against monitor db failures #1545

byucesoy commented May 3, 2024

Harden against monitor db failures #1545

Harden against monitor db failures #1545

Conversation

byucesoy commented May 3, 2024