Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgresql-17 hangs waiting on LWLock in pg_stat_monitor (PortalCleanup phase) #500

Open
1 task done
ayder opened this issue Dec 18, 2024 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@ayder
Copy link

ayder commented Dec 18, 2024

Description

We are experiencing an issue where a PostgreSQL instance becomes stuck while using the pg_stat_monitor extension. A core dump shows that the backend is waiting on a lightweight lock (LWLockAcquire()) invoked within pg_stat_monitor’s pgsm_store() function. The backend never completes, causing all queries on the backend to hang indefinitely.

•	Any guidance on what might cause pgsm_store() to block in LWLockAcquire()?
•	Suggestions for debug settings, patches, or configuration changes?
•	Is this a known issue with the current version of pg_stat_monitor?

Thank you for your help!

Expected Results

The backend should not become stuck. Queries should complete, and pg_stat_monitor should not cause indefinite blocking.

Actual Results

• All backend processes is stuck, and queries hang, do not complete.
• Attaching gcore and examining the core file with gdb shows the following stack trace snippet:

#1  0x00007fa9e516af18 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00000000007ce242 in PGSemaphoreLock ()
#3  0x000000000085528c in LWLockAcquire ()
#4  0x00007fa9de56e63d in pgsm_store () from /usr/pgsql-17/lib/pg_stat_monitor.so
#5  0x00007fa9de56fe1b in pgsm_ExecutorEnd () from /usr/pgsql-17/lib/pg_stat_monitor.so
#6  0x0000000000673491 in PortalCleanup ()
#7  0x00000000009cc7b4 in PortalDrop ()
#8  0x00000000008675ce in exec_simple_query ()
#9  0x0000000000868db4 in PostgresMain ()

Version

Environment:
• PostgreSQL Version: postgresql17-server-17.2-1PGDG.rhel8.x86_64
• pg_stat_monitor Version: pg_stat_monitor_17-2.1.0-1PGDG.rhel8.x86_64
• Operating System: Oracle Linux Server release 8.10 / 5.15.0-302.167.6.el8uek.x86_64
• Installation Method: RPM

Steps to reproduce

1.	Running queries under heavy busy conditions. (Many truncate and inserts)
2.	Possibly occurs after a certain procedure calls.
3.	Eventually, backend hangs during PortalCleanup() with pg_stat_monitor code in the call stack.

Relevant logs

No response

Code of Conduct

  • I agree to follow Percona Community Code of Conduct
@ayder ayder added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant