-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum lag for replica access (REST API enhancement) #1249
Comments
Patroni on the master publishes its WAL position to the DCS ( Implementing it should not be hard, but you'll have to understand that it will not work so precise as you expect from it.
The only way to get the time delay is to execute In any case feel free to implement it and open a PR. |
Yes you are right! But it seems to me that this is a very necessary function. Although not easy to implement as we would like.
Yes. I check replication lag in seconds with the following query (on replica servers):
What I understood from our conversation:
|
Trying to measure delay in Patroni seems like it will not provide any useful guarantees and at best results in a system that mostly works, but fails under any kind of adverse conditions. Perhaps you should take a look at helping to push this patch along: https://commitfest.postgresql.org/23/1589/ |
I had today situation when one of replica went down for a longer period of time. When it was back online, master already recycled necesary WAL (FATAL: could not receive data from WAL stream: ERROR: requested WAL segment has already been removed) , but patroni reported this replica as Running with Lag 5GB. Why ? In this case checking of replica lag seems to be mandatory. Or maybe better checking of postgres state. |
whats about: select client_addr, write_lag, flush_lag, replay_lag from pg_stat_replication; |
Great news! Patroni version 2.0.0 adds Enhanced GET / replica and GET / async REST API health-checks (#1599): But, it doesn't take into account yet spikes of replication lag (see maximum_lag_on_replica_delay). |
Dear colleagues!
Today we have one wonderful parameter such as:
maximum_lag_on_failover
: the maximum bytes a follower may lag to be able to participate in leader election.I ask you to implement the parameter of the maximum lag of the replica from the master, which will allow more detailed control of read access to the replicas in the cluster.
Example (something like this):
maximum_lag_on_replica
: the maximum bytes (default 1048576) of lag that on replica can be in order to allow access to the databases in this replica.maximum_lag_on_replica_delay
: this is the time in milliseconds (default 100 ms) during which the Patroni REST API will continue to returning a response code "200". To ignore momentary lag surges, if appropriate.The logic is as follows:
If the value of "maximum_lag_on_replica" and "maximum_lag_on_replica_delay" exceeds the specified threshold, the Patroni REST API immediately stops returning a response code "200" for /replica and /async endpoints.
I use HAProxy to perform Patroni REST API checks and to provide read-only access for applications.
An example of schema (TypeA) and configurations that I use:
https://github.com/vitabaks/postgresql_cluster (if links are not allowed you can delete it)
Thanks!
The text was updated successfully, but these errors were encountered: