Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) #171

CrsiX · 2024-04-24T21:07:10Z

SSH-MITM Version

SSH-MITM 4.0.0

Platform detail

Linux gw 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

Arguments used to start SSH-MITM

venv/bin/python3 -m sshmitm --paramiko-log-level debug server --session-log-dir session-logs --store-ssh-session --remote-host $REMOTE_HOST --remote-port 22 --host-key host_key_file --banner-name "$BANNER" --listen-port 22

SSH client used

All of them. MitM proxy was reachable by global traffic.

SSH server used

SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2

What steps will reproduce the bug?

Let the server run publicly and be reachable by global ingress. The below error in "additional details" was produced after running the MitM proxy for around 4 hours to proxy incoming traffic on port 22 to another machine.
I observed lots of traffic of approximately one connection every 10 seconds. They were, in fact, all short-living connections and terminated in a few seconds.
However, they allocate a socket and a thread of a server. My ss listed hundreds of open listening sockets spawned by the MitM proxy. At some point, it could not spawn more threads and this error crashed the server when trying to accept a new incoming connection.
The KeyboardInterrupt was made by me after I found out that the server was effectively down.

Note:
I was using my own fork which had slight modifications.
Last commit from you was eabae16. The software was running 57c1001 from my fork.
I do not expect that these modifications are the cause of the error, but the summary of these modifications are: adding a few command line arguments, added a tracking mechanism that made HTTP requests with login credentials in one thread (it did not spawn new threads!) and a provisioning mechanism that created the requested user on the target, in combination with a retry mechanism. This way, it accepts all usernames and all passwords and provides a valid shell on the target machine (in my configuration, since the "provisioning command" is a useradd on the remote machine).

This is not the only such error, I will post a second issue later, maybe. TL;DR: Same behavior, but the error was OSError: Too many open files due to the number of open sockets (see above) filled all available file descriptors (default: 1024), before I increased the shell's ulimit.

What should have happened?

The server should not go down. When the proxied TCP connection ends, the socket and thread should be freed to avoid this problem (if the problem is what I suspect).

Additional information

If you need more details, I can provide more logs; this is an extract.

    ERROR    Unknown exception: can't start new thread          
    ERROR    Traceback (most recent call last):                 
    ERROR      File                                             
             "/home/mitm-nl/mitm/sshmitm/workarounds/transport.p
             y", line 160, in transport_run                     
    ERROR        self.packetizer.start_handshake(self.handshake_
             timeout)                                           
    ERROR      File                                             
             "/home/mitm-nl/mitm/venv/lib/python3.11/site-packag
             es/paramiko/packet.py", line 252, in               
             start_handshake                                    
    ERROR        self.__timer.start()                           
    ERROR      File "/usr/lib/python3.11/threading.py", line    
             957, in start                                      
    ERROR        _start_new_thread(self._bootstrap, ())         
    ERROR    RuntimeError: can't start new thread               
    ERROR                                                       
    ERROR    internal error, abort authentication!              
             ╭─────── Traceback (most recent call last) ───────╮
             │ /home/mitm-nl/mitm/sshmitm/authentication.py:36 │
             │ 6 in authenticate                               │
             │                                                 │
             │   363 │   │   │   │   │   self.session.remote_a │
             │   364 │   │   │   │   )                         │
             │   365 │   │   │   if self.session.password:     │
             │ ❱ 366 │   │   │   │   return self.auth_password │
             │   367 │   │   │   │   │   self.session.username │
             │   368 │   │   │   │   │   self.session.remote_a │
             │   369 │   │   │   │   │   self.session.remote_a │
             │                                                 │
             │ /home/mitm-nl/mitm/sshmitm/authentication.py:54 │
             │ 2 in auth_password                              │
             │                                                 │
             │   539 │   │   return self.connect(username, hos │
             │   540 │                                         │
             │   541 │   def auth_password(self, username: str │
             │ ❱ 542 │   │   return self.connect(username, hos │
             │       password=password)                        │
             │   543 │                                         │
             │   544 │   def auth_publickey(self, username: st │
             │   545 │   │   """                               │
             │                                                 │
             │ /home/mitm-nl/mitm/sshmitm/authentication.py:48 │
             │ 7 in connect                                    │
             │                                                 │
             │   484 │   │   )                                 │
             │   485 │   │   self.pre_auth_action()            │
             │   486 │   │   try:                              │
             │ ❱ 487 │   │   │   first_success = client.connec │
             │   488 │   │   │   if first_success:             │
             │   489 │   │   │   │   self.session.ssh_client = │
             │   490 │   │   │   │   auth_status = paramiko.co │
             │                                                 │
             │ /home/mitm-nl/mitm/sshmitm/clients/ssh.py:121   │
             │ in connect                                      │
             │                                                 │
             │   118 │   │                                     │
             │   119 │   │   try:                              │
             │   120 │   │   │   if self.method is Authenticat │
             │ ❱ 121 │   │   │   │   self.transport.connect(us │
             │   122 │   │   │   elif self.method is Authentic │
             │   123 │   │   │   │   self.transport.connect(us │
             │       pkey=self.key)                            │
             │   124 │   │   │   elif self.method is Authentic │
             │                                                 │
             │ /home/mitm-nl/mitm/venv/lib/python3.11/site-pac │
             │ kages/paramiko/transport.py:1351 in connect     │
             │                                                 │
             │   1348 │   │   │   gssapi_requested=gss_kex or  │
             │   1349 │   │   )                                │
             │   1350 │   │                                    │
             │ ❱ 1351 │   │   self.start_client()              │
             │   1352 │   │                                    │
             │   1353 │   │   # check host key if we were give │
             │   1354 │   │   # If GSS-API Key Exchange was pe │
             │                                                 │
             │ /home/mitm-nl/mitm/venv/lib/python3.11/site-pac │
             │ kages/paramiko/transport.py:704 in start_client │
             │                                                 │
             │    701 │   │   │   if not self.active:          │
             │    702 │   │   │   │   e = self.get_exception() │
             │    703 │   │   │   │   if e is not None:        │
             │ ❱  704 │   │   │   │   │   raise e              │
             │    705 │   │   │   │   raise SSHException("Nego │
             │    706 │   │   │   if event.is_set() or (       │
             │    707 │   │   │   │   timeout is not None and  │
             │                                                 │
             │ /home/mitm-nl/mitm/sshmitm/workarounds/transpor │
             │ t.py:160 in transport_run                       │
             │                                                 │
             │   157 │   │   │   # shell.                      │
             │   158 │   │   │   # Make sure we can specify a  │
             │   159 │   │   │   # Re-use the banner timeout f │
             │ ❱ 160 │   │   │   self.packetizer.start_handsha │
             │   161 │   │   │   self._send_kex_init()         │
             │   162 │   │   │   self._expect_packet(MSG_KEXIN │
             │   163                                           │
             │                                                 │
             │ /home/mitm-nl/mitm/venv/lib/python3.11/site-pac │
             │ kages/paramiko/packet.py:252 in start_handshake │
             │                                                 │
             │   249 │   │   """                               │
             │   250 │   │   if not self.__timer:              │
             │   251 │   │   │   self.__timer = threading.Time │
             │ ❱ 252 │   │   │   self.__timer.start()          │
             │   253 │                                         │
             │   254 │   def handshake_timed_out(self):        │
             │   255 │   │   """                               │
             │                                                 │
             │ /usr/lib/python3.11/threading.py:957 in start   │
             │                                                 │
             │    954 │   │   with _active_limbo_lock:         │
             │    955 │   │   │   _limbo[self] = self          │
             │    956 │   │   try:                             │
             │ ❱  957 │   │   │   _start_new_thread(self._boot │
             │    958 │   │   except Exception:                │
             │    959 │   │   │   with _active_limbo_lock:     │
             │    960 │   │   │   │   del _limbo[self]         │
             ╰─────────────────────────────────────────────────╯
             RuntimeError: can't start new thread
    INFO     ❗ Shutting down server ...             
^CTraceback (most recent call last):                                                                                    
  File "/home/mitm-nl/mitm/sshmitm/server/__init__.py", line 288, in start
    thread.start()
  File "/usr/lib/python3.11/threading.py", line 957, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/mitm-nl/mitm/sshmitm/__main__.py", line 3, in <module>
    main()
  File "/home/mitm-nl/mitm/sshmitm/cli.py", line 194, in main
    available_subcommands[args.subparser_name].run_func(args)
  File "/home/mitm-nl/mitm/sshmitm/server/cli.py", line 200, in run_server
    proxy.start()
  File "/home/mitm-nl/mitm/sshmitm/server/__init__.py", line 303, in start
    thread.join()
  File "/usr/lib/python3.11/threading.py", line 1112, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1583, in _shutdown
    lock.acquire()
KeyboardInterrupt:

Thanks for your patience ;)

The text was updated successfully, but these errors were encountered:

manfred-kaiser · 2024-04-25T05:16:33Z

Thanks for the bug report.

SSH-MITM has some known problems when closing the connections #167.

During audits, this should not be a serious problem because in such cases you have only a few connections open.

If you need to audit a large network or using it during an exercise with a large number of automated connections you can reach the limits.

For a quick workaround you can increase the limits of the host system, which is running SSH-MITM (https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/) If you have enough resources on your host system this should give you some more hours to keep SSH-MITM running.

Another option is to create a cron job an restart SSH-MITM after some time, which will reset the used open files. This will reset open connections, but since your connections or short-lived this should not be a serious problem.

CrsiX · 2024-04-25T15:15:18Z

I looked into the issues and 167 wasn't falling into my eye for this problem ;)

Anyways. I did exactly that, killed the process every 6 hours. But still, since it happened faster (around 4 hours or so), this is not really ideal.

Also, this just should not happen. I do not expect a server to die because it keeps all connections open. Otherwise, please at least state this somewhere in the documentation that this could happen. It was not an obvious fault; as such, I didn't detect it at first or that the server actually was not running for a few days even though I thought it would be after starting it for the first time.

Note that I increased the ulimit already beyond 10'000. This is what I already addressed when I had the error OSError: Too many open files. But when using this many threads, the server run into the issue mentioned above.

manfred-kaiser · 2024-05-01T06:58:05Z

I'm sorry to hear about the troubles you've been experiencing with SSH-MITM. It's crucial to note that this tool is intended for use in controlled environments, and isn't suitable for deployment on public networks where it might intercept traffic from unsuspecting clients.

For cases where you're looking to attract and analyze potentially malicious SSH traffic, a honeypot could be a more appropriate solution.

Just like Burp Suite serves for HTTP/S, SSH-MITM is designed for the interception and analysis of traffic primarily for security testing purposes. It's important to clarify that SSH-MITM is not a honeypot, but rather a tool for actively intercepting and modifying SSH traffic, similar to how Burp Suite intercepts and manipulates HTTP/S traffic.

SSH-MITM opens a large number of connections and files, which are used not only for intercepting the connection between the client and the server but also to allow the auditor access to the session. Additionally, multiple connections, such as those to SSH agents, are required to facilitate various authentication methods. Unfortunately, it’s not straightforward to properly close all these connections because an unexpected termination of a connection could cause the intercepted session to be dropped. For this reason, there is a tendency to keep unnecessary connections open rather than risk an unexpected disconnection.

I'm looking into this issue more closely to find a better solution. Thank you again for your detailed analysis of this error.

CrsiX · 2024-05-01T21:37:14Z

Thank you for these details. I should add that I specifically used SSH-MITM as a honeypot in my case; that's why I had it running publicly in the first place. And I got very valuable information about the traffic/behavior/sessions out of it, so I do think it can be used that way. I had two more somewhat unfortunate issues with it, but they may or may not find their ways to other tickets :D

As for the reason why I chose SSH-MITM, it serves the perfect purpose as a honeypot software. It is not the honeypot itself, though. It is the middle layer between the actual honeypot and Malory; as such the honeypot itself is an entirely unmodified system. I tried Cowrie out before, but had other implementation-specific issues with it as well; I found SSH-MITM to be very capable for my use case. If you know other (FOSS) SSH honeypot software, I would be up to check them as well.

That said, if you say

it's crucial to note that this tool is intended for use in controlled environments

then should you maybe add a hint about that in the README or the documentation? If such boundaries are not clarified, people like me may use it in an unintended way. Which may or may not work as expected by the user.

Additionally, multiple connections, such as those to SSH agents, are required to facilitate various authentication methods. Unfortunately, it’s not straightforward to properly close all these connections because an unexpected termination of a connection could cause the intercepted session to be dropped. For this reason, there is a tendency to keep unnecessary connections open rather than risk an unexpected disconnection.

Thank you for the clarification. So, there are multiple connections for one session, if I understand it correctly. Is there a "main" connection in the session (= the first connection of a session, because it initiates anything after that)? If that "main" session was dropped (either by server or client), then all other connections belonging to that session could be dropped as well, right? I haven't used the session hijacking where you could tamper with connections that much, though, and I don't know if it may be desired to keep connections open if the "main" one is dropped for whatever reason; possibly as long as another such TCP connection exists, the session and related connections can not be dropped.

manfred-kaiser · 2024-05-05T07:41:44Z

As for the reason why I chose SSH-MITM, it serves the perfect purpose as a honeypot software. It is not the honeypot itself, though. It is the middle layer between the actual honeypot and Malory; as such the honeypot itself is an entirely unmodified system. I tried Cowrie out before, but had other implementation-specific issues with it as well; I found SSH-MITM to be very capable for my use case. If you know other (FOSS) SSH honeypot software, I would be up to check them as well.

That's an interesting use case 👍🏻 but it's true, that in your case the honeypot does not have any software installed and an attacker is not able to find out that it's only a honeypot and it's not possible to alter the logdata.

it's crucial to note that this tool is intended for use in controlled environments

then should you maybe add a hint about that in the README or the documentation? If such boundaries are not clarified, people like me may use it in an unintended way. Which may or may not work as expected by the user.

The reason is, because audits need consent and in most countries you are not allowed to intercept all connections on a public network.

So, there are multiple connections for one session, if I understand it correctly. Is there a "main" connection in the session (= the first connection of a session, because it initiates anything after that)? If that "main" session was dropped (either by server or client), then all other connections belonging to that session could be dropped as well, right? I haven't used the session hijacking where you could tamper with connections that much, though, and I don't know if it may be desired to keep connections open if the "main" one is dropped for whatever reason; possibly as long as another such TCP connection exists, the session and related connections can not be dropped.

There are at least 2 main connections. One for the traffic to the client and the other for the traffic to the server. Handling both sessions is not a trivial tasks because an unintended connection abort can break the other session. There are some additional sessions like the pre-authentication session. This session is needed, because SSH-MITM must check if a user is allowed to login with public key authentication against the remote server. With the next release the behavior will change and this connection will kept open until the main connections are closed. This is done, because security tools like "fail2ban" can block logins if too many failed login attempts happen. At the moment the default rules of fail2ban does not match the pre-authentication phase of SSH-MITM, but a lot of entries in the authentication logs are made. When keeping the pre-authentication session open, the log entries are made after the main session ended. This reduces noise in the logs and keeps SSH-MITM undetected for a longer time.

I will try to find a solution to reduce the number of open connections and properly close them. This will take a lot of tests which takes time.

Thanks again for the bug report.

CrsiX changed the title ~~Crash due to RuntimeError: can't start new thread (Denial of Service after a few incoming thousand connections)~~ Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) Apr 24, 2024

manfred-kaiser added the bug Something isn't working label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) #171

Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) #171

CrsiX commented Apr 24, 2024 •

edited

manfred-kaiser commented Apr 25, 2024

CrsiX commented Apr 25, 2024

manfred-kaiser commented May 1, 2024

CrsiX commented May 1, 2024

manfred-kaiser commented May 5, 2024

Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) #171

Crash due to RuntimeError: can't start new thread (Denial of Service after a few thousand incoming connections) #171

Comments

CrsiX commented Apr 24, 2024 • edited

SSH-MITM Version

Platform detail

Arguments used to start SSH-MITM

SSH client used

SSH server used

What steps will reproduce the bug?

What should have happened?

Additional information

manfred-kaiser commented Apr 25, 2024

CrsiX commented Apr 25, 2024

manfred-kaiser commented May 1, 2024

CrsiX commented May 1, 2024

manfred-kaiser commented May 5, 2024

CrsiX commented Apr 24, 2024 •

edited