Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum lag for replica access (REST API enhancement) #1249

Open
vitabaks opened this issue Oct 24, 2019 · 6 comments
Open

Maximum lag for replica access (REST API enhancement) #1249

vitabaks opened this issue Oct 24, 2019 · 6 comments

Comments

@vitabaks
Copy link

Dear colleagues!

Today we have one wonderful parameter such as:
maximum_lag_on_failover: the maximum bytes a follower may lag to be able to participate in leader election.

I ask you to implement the parameter of the maximum lag of the replica from the master, which will allow more detailed control of read access to the replicas in the cluster.

Example (something like this):
maximum_lag_on_replica: the maximum bytes (default 1048576) of lag that on replica can be in order to allow access to the databases in this replica.

maximum_lag_on_replica_delay: this is the time in milliseconds (default 100 ms) during which the Patroni REST API will continue to returning a response code "200". To ignore momentary lag surges, if appropriate.

The logic is as follows:
If the value of "maximum_lag_on_replica" and "maximum_lag_on_replica_delay" exceeds the specified threshold, the Patroni REST API immediately stops returning a response code "200" for /replica and /async endpoints.

I use HAProxy to perform Patroni REST API checks and to provide read-only access for applications.
An example of schema (TypeA) and configurations that I use:
https://github.com/vitabaks/postgresql_cluster (if links are not allowed you can delete it)

Thanks!

@CyberDem0n
Copy link
Member

maximum_lag_on_replica

Patroni on the master publishes its WAL position to the DCS (/optime/leader) once per loop_wait, it means that the value could be quite outdated. In most cases you will see that replica is actually ahead of the master. Asking it from the master for every check looks like a huge overkill.

Implementing it should not be hard, but you'll have to understand that it will not work so precise as you expect from it.

maximum_lag_on_replica_delay

The only way to get the time delay is to execute now()-pg_catalog.pg_last_xact_replay_timestamp(), but it doesn't work very reliably. Basically replay_timestamp will not grow if there is no activity on the master.

In any case feel free to implement it and open a PR.

@vitabaks
Copy link
Author

vitabaks commented Oct 24, 2019

Implementing it should not be hard, but you'll have to understand that it will not work so precise as you expect from it.

Yes you are right! But it seems to me that this is a very necessary function. Although not easy to implement as we would like.
This leads to thought, but what if we reduce loop_wait to 1-5 second?

Basically replay_timestamp will not grow if there is no activity on the master.

Yes. I check replication lag in seconds with the following query (on replica servers):

select case pg_is_in_recovery()
  when 't'
    then (select case
            when pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
              then 0
              else extract(epoch from now() - pg_last_xact_replay_timestamp())
            end)
    else '0'
  end as lag_delay;

What I understood from our conversation:

  1. maximum_lag_on_replica - not so difficult to implement if you read the WAL position to the DCS (/ optime / leader), although it depends heavily on loop_wait.

  2. maximum_lag_on_replica_delay - is already a little more difficult to implement, as it requires Patroni to send additional checks to PostgreSQL. Although at first glance it should not be too much load on Patroni and the database.

  3. You don’t have time for this :)
    Well, well, I hope someone can take on this task. Or give more recommendations ...

@ants
Copy link
Collaborator

ants commented Oct 24, 2019

Trying to measure delay in Patroni seems like it will not provide any useful guarantees and at best results in a system that mostly works, but fails under any kind of adverse conditions.

Perhaps you should take a look at helping to push this patch along: https://commitfest.postgresql.org/23/1589/

@mszpulak
Copy link

I had today situation when one of replica went down for a longer period of time. When it was back online, master already recycled necesary WAL (FATAL: could not receive data from WAL stream: ERROR: requested WAL segment has already been removed) , but patroni reported this replica as Running with Lag 5GB. Why ? In this case checking of replica lag seems to be mandatory. Or maybe better checking of postgres state.

@mszpulak
Copy link

whats about: select client_addr, write_lag, flush_lag, replay_lag from pg_stat_replication;

@vitabaks
Copy link
Author

vitabaks commented Sep 3, 2020

Great news!

Patroni version 2.0.0 adds Enhanced GET / replica and GET / async REST API health-checks (#1599):
Checks now support optional keyword ?lag=<max-lag> and will respond with 200 only if the lag is smaller than the supplied value.

But, it doesn't take into account yet spikes of replication lag (see maximum_lag_on_replica_delay).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants