Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_lr_with_slow_safekeeper is flaky because logical_replication_sync not waiting for tablesync #10242

Open
alexanderlaw opened this issue Dec 25, 2024 · 0 comments
Labels
external A PR or Issue is created by an external user t/bug Issue Type: Bug

Comments

@alexanderlaw
Copy link

Multiple failures of test_lr_with_slow_safekeeper, e.g.:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10238/12486246775/index.html#/testresult/455c25d1ecd98aac
with the following diagnostics:

test_runner/regress/test_logical_replication.py:317: in test_lr_with_slow_safekeeper
    assert [r[0] for r in vanilla_pg.safe_psql("select * from t")] == [1]
E   assert [] == [1]
E     Right contains one more item: 1
E     Full diff:
E     - [1]
E     ?  -
E     + []

pgdata-vanilla/pg.log:

2024-12-24 21:29:09.658 UTC [92514] LOG:  logical replication apply worker for subscription "sub1" has started
2024-12-24 21:29:09.666 UTC [92226] LOG:  received fast shutdown request
2024-12-24 21:29:09.666 UTC [92576] LOG:  logical replication table synchronization worker for subscription "sub1", table "t" has started
2024-12-24 21:29:09.666 UTC [92226] LOG:  aborting any active transactions
2024-12-24 21:29:09.666 UTC [92576] FATAL:  terminating logical replication worker due to administrator command

Or https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10238/12486246775/index.html#/testresult/4efed0c5fa7cc95b

test_runner/regress/test_logical_replication.py:317: in test_lr_with_slow_safekeeper
    assert [r[0] for r in vanilla_pg.safe_psql("select * from t")] == [1]
E   assert [] == [1]
E     Right contains one more item: 1
E     Full diff:
E     - [1]
E     ?  -
E     + []

pgdata-vanilla/pg.log:

2024-12-24 21:29:04.111 UTC [88099] LOG:  logical replication apply worker for subscription "sub1" has started
2024-12-24 21:29:04.123 UTC [88067] LOG:  received fast shutdown request
2024-12-24 21:29:04.126 UTC [88067] LOG:  aborting any active transactions
2024-12-24 21:29:04.126 UTC [88099] FATAL:  terminating logical replication worker due to administrator command

show that the logical_replication_sync call in the test:

    vanilla_pg.safe_psql("create table t(a int)")
    connstr = endpoint.connstr().replace("'", "''")
    vanilla_pg.safe_psql(f"create subscription sub1 connection '{connstr}' publication pub")
    logical_replication_sync(vanilla_pg, endpoint)

    vanilla_pg.stop()

doesn't wait for the table synchronization to finish. On a successful run, pg.log contains:

2024-12-25 14:33:55.575 EET [2152526] LOG:  database system is ready to accept connections
2024-12-25 14:33:55.868 EET [2152610] LOG:  logical replication apply worker for subscription "sub1" has started
2024-12-25 14:33:55.918 EET [2152625] LOG:  logical replication table synchronization worker for subscription "sub1", table "t" has started
2024-12-25 14:33:55.952 EET [2152625] LOG:  logical replication table synchronization worker for subscription "sub1", table "t" has finished
2024-12-25 14:33:56.420 EET [2152526] LOG:  received fast shutdown request

This test failure can be easily reproduced with a sleep added inside TablesyncWorkerMain():

@@ -1714,6 +1714,7 @@ TablesyncWorkerMain(Datum main_arg)
 
     SetupApplyOrSyncWorker(worker_slot);
 
+pg_usleep(1000000);
     run_tablesync_worker();
@alexanderlaw alexanderlaw added the t/bug Issue Type: Bug label Dec 25, 2024
@github-actions github-actions bot added the external A PR or Issue is created by an external user label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external A PR or Issue is created by an external user t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant