Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too Many Pending Connections #5196

Open
MohammadrezaNasrabadi opened this issue Aug 22, 2024 · 4 comments
Open

Too Many Pending Connections #5196

MohammadrezaNasrabadi opened this issue Aug 22, 2024 · 4 comments

Comments

@MohammadrezaNasrabadi
Copy link

Title: Too Many Pending Connections

On a Couchdb cluster with three nodes, the number of connections in
CLOSE-WAIT state to one of the nodes gradually increases. This is
what I get right now (Couchdb has been running for about four days):

$ ss -nt | grep CLOSE-WAIT | wc -l
4514
$ ss -nt | grep CLOSE-WAIT | head
CLOSE-WAIT 1      0       192.168.40.2:35938    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:44694    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:60596    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:34240    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:40818    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:43376    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:56616    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:47966    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:48366    192.168.40.2:5984
CLOSE-WAIT 1      0       192.168.40.2:59302    192.168.40.2:5984

Maybe Couchdb does not close connections sometimes? We use Couchdb 3.3.3.

@nickva
Copy link
Contributor

nickva commented Aug 22, 2024

From what I remember from a while back, close-wait means the remote client closed the connection, then, our (server) kernel found out about it, and is now waiting on the application code (CouchDB) to notice as well and close it.

So, perhaps maybe that one node is overwhelmed, sometimes is blocked? (maybe a firewall?), or not getting enough CPU time to run?. Check your resource usage (cpu, memory, open file handles) and the logs to see if there are exceptions or errors on that one node. At least you have a few other nodes to compare to where this doesn't happen.

@MohammadrezaNasrabadi
Copy link
Author

MohammadrezaNasrabadi commented Aug 27, 2024

We don't apply any configuration about the firewall to have limitation or control over the connections.

On the other hand, We didn't see huge CPU or RAM usage on the nodes hosting CouchDB.

Do you have any benchmark results based on the number of shards, replicas, amount of database requests and the available resources? It will be useful to compare our CouchDB infrastructure status with the ideal resource requirements.

I will share the detailed information of our infrastructure if needed.

@nickva
Copy link
Contributor

nickva commented Aug 28, 2024

It's hard to compare architectures, it depends on the requirements. For instance some of our larger clusters have 30+ nodes and our load balancer handles 100k concurrent open connections. Even there, on the load balancer usually there almost no CLOSE_WAIT connections. At most I could see a short temporary increase in CLOSE_WAIT up to 3k out of 100k established connections, then they get cleared back to a few hundred only.

It's good that your resource usage is low it seems. Since you have other nodes to compare to, and others don't seem to show the same behavior try to see what's different between them (configuration, hardware resources). Maybe check your logs for any errors on some vs others?

@MohammadrezaNasrabadi
Copy link
Author

We performed experiments to find the main reason for this problem.
The problem (connections in CLOSE-WAIT state) happens even without
a CouchDB cluster and with a local loopback connection. It seems
that CouchDB does not close some of its connections after the
client closes it; therefore, the connections remain in CLOSE-WAIT
state.

We connect to CouchDB from an Elixir application using the hackney
library. After a few weeks of running, we have thousands of CouchDB
connections in CLOSE-WAIT state that disappear only by restarting
CouchDB. We have not yet found which of the connections made by
hackney remain in CLOSE-WAIT state when hackney closes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants