List order of sync_standby is inconsistent with synchronous_standby_names #3061

XiuhuaRuan · 2024-05-10T04:22:57Z

What happened?

We want to bind standby VIP to the sync standby with highest sync_priority according to the sync_standby list in DCS. But sync_standby list order is inconsistent with synchronous_standby_names. It is sorted by application name not by sync_ priority. Besides, synchronous_standby_names is not stable enough when changing synchronous_node_count.

To make sync_standby and synchronous_standby_names order more stable, I suggest to consider sync_priority as the third sort condition following sync_state and lsn when collecting ReplicaList from pg_stat_replication. Please help evaluate this proposal. Thanks.

How can we reproduce it (as minimally and precisely as possible)?

Set up a cluster with 3 nodes
Set synchronous_mode to true via patronictl edit-config, ee_03 was selected as sync standby
Set synchronous_node_count to 2 via patronictl edit-config, ee_03 and ee_02 were selected as sync standby
check sync standby list
postgresql.conf:
synchronous_standby_names = '2 (ee_03,ee_02)'
/sync key in etcd:
{"leader":"ee_01","sync_standby":"ee_02,ee_03"}
set synchronous_node_count to 1, sometimes ee_02 was selected as sync standby, sometimes ee_03 was selected as sync standby

What did you expect to happen?

sync_standby list order is consistent with synchronous_standby_names
synchronous_standby_names is changed consistently when changing synchronous_node_count. For example, when decrease synchronous_node_count, sync standby with higher sync_priority is expected to remain in the list.

Patroni/PostgreSQL/DCS version

Patroni version: 3.1.0
PostgreSQL version: 13.0
DCS (and its version): etcd3.5.9

Patroni configuration file

scope: postgres-cluster
namespace: /service/
name: ee_01

restapi:
  listen: 192.168.61.105:8008
  connect_address: 192.168.61.105:8008

etcd:
  hosts: 192.168.61.105:2379,192.168.61.106:2379,192.168.61.107:2379

log:
  level: INFO
  traceback_level: INFO
  dir: /home/postgres/patroni
  file_num: 10
  file_size: 104857600

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_size: 100
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
        archive_mode: "off"
        archive_timeout: 1800s
        logging_collector: "on"

postgresql:
  database: postgres
  listen: 0.0.0.0:5432
  connect_address: 192.168.61.105:5432
  bin_dir: /usr/local/pgsql/bin
  data_dir: /usr/local/pgsql/data
  pgpass: /home/postgres/tmp/.pgpass

  authentication:
    replication:
      username: postgres
      password: postgres
    superuser:
      username: postgres
      password: postgres
    rewind:
      username: postgres
      password: postgres

  pg_hba:
  - local   all             all                                     trust
  - host    all             all             0.0.0.0/0               trust
  - host    all             all             ::1/128                 trust
  - local   replication     all                                     trust
  - host    replication     all             0.0.0.0/0               trust
  - host    replication     all             ::1/128                 trust

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

failsafe_mode: true
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    archive_command: test ! -f /home/postgres/wal_archive/%f && cp %p /home/postgres/wal_archive/%f
    archive_mode: 'on'
    archive_timeout: 1800s
    hot_standby: 'on'
    logging_collector: 'on'
    max_replication_slots: 10
    wal_keep_size: 100
    wal_level: logical
    wal_log_hints: 'on'
  use_pg_rewind: true
retry_timeout: 10
synchronous_mode: true
synchronous_node_count: 1
ttl: '30'

Patroni log files

From the patroni log, the list order changed when changing synchronous_node_count from 1 to 2. This caused some confusion. At the begining, it showed ['ee_03', 'ee_02'], because list(picked) was ['ee_03', 'ee_02'] as candidates. Then it showd ['ee_02', 'ee_03'], because list(allow_promote) was changed to ['ee_02', 'ee_03'] as sync_nodes. When sync_state and lsn are the same, if we use sync_priority as the third sort condition of ReplicaList, the list(allow_promote) will be ['ee_03', 'ee_02'] same as list(picked).

2024-05-09 16:12:23,737 INFO: Assigning synchronous standby status to ['ee_03']
2024-05-09 16:12:26,024 INFO: Synchronous standby status assigned to ['ee_03']
2024-05-09 16:12:26,080 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:33,723 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:43,765 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:48,987 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:59,085 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:08,985 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:19,042 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:28,938 INFO: Lock owner: ee_01; I am ee_01
**2024-05-09 16:13:28,992 INFO: Assigning synchronous standby status to ['ee_03', 'ee_02']
2024-05-09 16:13:31,303 INFO: Synchronous standby status assigned to ['ee_02', 'ee_03']**
2024-05-09 16:13:31,356 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:39,031 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:49,091 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:59,009 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:59,010 INFO: Lock owner: ee_01; I am ee_01
2024-05-09 16:13:59,059 INFO: Updating synchronous privilege temporarily from ['ee_02', 'ee_03'] to ['ee_02']
2024-05-09 16:13:59,106 INFO: Assigning synchronous standby status to ['ee_02']
2024-05-09 16:13:59,431 INFO: no action. I am (ee_01), the leader with the lock

PostgreSQL log files

From the postgresql log, ee_02 with lower priority remained when changing synchronous_node_count from 2 to 1. For the priority-based synchronous replication, the standbys with higher priority will be considered as sync and other standbys may be considered as potential. Although we set precise node number in synchronous_standby_names, it is more reasonable to keep original higher priority node in the list.

2024-05-09 16:12:23.865 CST [6977] LOG:  parameter "synchronous_standby_names" changed to "ee_03"
**2024-05-09 16:12:23.975 CST [7013] LOG:  standby "ee_03" is now a synchronous standby with priority 1**
2024-05-09 16:12:23.975 CST [7013] STATEMENT:  START_REPLICATION SLOT "ee_03" 0/ED000000 TIMELINE 50
2024-05-09 16:12:49.111 CST [6977] LOG:  received SIGHUP, reloading configuration files
2024-05-09 16:13:29.116 CST [6977] LOG:  parameter "synchronous_standby_names" changed to "2 (ee_03,ee_02)"
**2024-05-09 16:13:29.235 CST [17473] LOG:  standby "ee_02" is now a synchronous standby with priority 2**
2024-05-09 16:13:29.235 CST [17473] STATEMENT:  START_REPLICATION SLOT "ee_02" 0/F3000000 TIMELINE 50
2024-05-09 16:13:59.235 CST [6977] LOG:  received SIGHUP, reloading configuration files
2024-05-09 16:13:59.236 CST [6977] LOG:  parameter "synchronous_standby_names" changed to "ee_02"
2024-05-09 16:13:59.550 CST [6977] LOG:  received SIGHUP, reloading configuration files

Have you tried to use GitHub issue search?

Yes

Anything else we need to know?

No response

XiuhuaRuan added the bug label May 10, 2024

XiuhuaRuan mentioned this issue May 10, 2024

add sync_priority as sorting criteria to sync ReplicaList #3062

Closed

CyberDem0n removed the bug label Jun 4, 2024

CyberDem0n closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List order of sync_standby is inconsistent with synchronous_standby_names #3061

List order of sync_standby is inconsistent with synchronous_standby_names #3061

XiuhuaRuan commented May 10, 2024 •

edited

List order of sync_standby is inconsistent with synchronous_standby_names #3061

List order of sync_standby is inconsistent with synchronous_standby_names #3061

Comments

XiuhuaRuan commented May 10, 2024 • edited

What happened?

How can we reproduce it (as minimally and precisely as possible)?

What did you expect to happen?

Patroni/PostgreSQL/DCS version

Patroni configuration file

patronictl show-config

Patroni log files

PostgreSQL log files

Have you tried to use GitHub issue search?

Anything else we need to know?

XiuhuaRuan commented May 10, 2024 •

edited