Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many open files" #282

Open
CogumelosMaravilha opened this issue Oct 15, 2018 · 2 comments
Open

"Too many open files" #282

CogumelosMaravilha opened this issue Oct 15, 2018 · 2 comments

Comments

@CogumelosMaravilha
Copy link

With Slowloris stress I'm getting:

thread 'gotham-worker-3' panicked at 'socket error = Os { code: 24, kind: Other, message: "Too many open files" }', gotham/src/lib.rs:128:22

With ab, httperf and siege, congrats fantastic benchmarks.
Thanks

@whitfin
Copy link
Contributor

whitfin commented Oct 16, 2018

@CogumelosMaravilha this might be file descriptor limits on your machine. What does ulimit -n give you?

@Darksonn
Copy link

In a previous experiment of mine where I created a simple web server on top of hyper, I studied this issue quite a bit. The problem is that if someone connects and requests a file larger than a few tcp chunks, but never reads the response while leaving the socket open, they will indefinitely use a file descriptor on your end. While stress testing my server in this way I found that it's a way to make the server completely unresponsive by using all of its file descriptors.

Increasing the file descriptor limits only postpones this problem. You have to actively close connections if you run out of file descriptors, and you cannot have infinitely many.

Note that if your server socket returns this error, the incoming connection is not removed from the stream of incoming connections. This means that if you immediately retry after receiving the error, you will have a loop using 100% cpu attempting to retrieve the same connection again and again.

The way I fixed this was the following:

  1. Wrap the connection future in something, which can be killed when file descriptors are used up. I used this code to implement the wrapper in my project.
  2. When you receive the "Too many open files" error, call kill on the KillSwitch from the wrapper above.
  3. After calling kill, sleep for 120 ms before fetching the next new connection.

An important thing to notice here is that with my implementation, kill doesn't immediately kill the threads — instead the graceful_shutdown method is called on the connection (this just disables keep-alive for the connection) and then a tokio timer is used to allow the connection 100 ms to finish.

The importance of this timeout is that regular users won't have their connections suddenly killed, avoiding any inconvenience beyond the extra 120 ms sleep in the incoming connections loop.

A few comments:

The code currently includes a tokio Interval, which is active even when the kill switch isn't activated. This is necessary, because the problematic threads have no activity going on, meaning the future won't be polled and thus it won't notice the kill switch. The Interval fixes this by adding another source of polls.

You could probably enhance the code to only kill threads which have no activity, meaning that long but active downloads won't be killed. This snippet just has a timeout of 1000 seconds on all connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants