Multi-process concurrency errors #2196

krypterro · 2023-10-16T06:12:47Z

krypterro
Oct 16, 2023

I've had to switch a large application over from undetected_chromedriver to Seleniumbase, by basically replacing:

import undetected_chromedriver as uc
driver = uc.Chrome(
                use_subprocess=True,
                version_main=chrome_version,
                service=ser,
                options=options)

with

from seleniumbase import Driver
driver = Driver(
                uc=True,
                headed=headed,
                devtools=False,
                remote_debug=False,
                driver_version=chrome_version,
                binary_location=chrome_path,
                proxy=f"socks5://{use_proxy}",
                no_sandbox=True
                )

And everything works great, at least for a single process. However when we get up to a few hundred concurrent instantiations of the webdriver running, we get the error:

[Mon, 16 Oct 2023 00:10:37] ERROR [Bot2.py.create_driver:125] Error in driver creation

[Mon, 16 Oct 2023 00:10:37] ERROR [Bot2.py.create_driver:127] Message: unknown error: cannot connect to chrome at 127.0.0.1:48175
from chrome not reachable
Stacktrace:
#0 0x55e317026fb3 <unknown>
#1 0x55e316cfa2f6 <unknown>
#2 0x55e316ce5ffa <unknown>
#3 0x55e316d31a3c <unknown>
#4 0x55e316d292a9 <unknown>

I think it's process or port collision maybe? With UC we use the subprocess feature, and I'm running hundreds of simultaneous instances on a large Ubuntu 22.04 server, each from it's own directory (so we can track and kill rogue processes by path).

So the question is, what's the best practice for large production usage of Seleniumbase as a replacement for webdriver, without the pytest multi-processing?

Answered by mdmintz

Oct 16, 2023

For large-scale production multithreading, pytest-xdist is so far the only reliable way to make it work (Eg. pytest -n8 for 8 SeleniumBase threads). That library contains some powerful code to keep resources from overlapping each other.

I talked about that here: #2006 (comment)

If I figured out a way to make massive multithreading with UC Mode work at scale reliably without requiring pytest, then I would definitely post something about it.

I do have a question for others: What's wrong with using pytest as a test runner? There shouldn't be any limitations with using it.

View full answer

mdmintz · 2023-10-16T13:41:21Z

mdmintz
Oct 16, 2023
Maintainer

For large-scale production multithreading, pytest-xdist is so far the only reliable way to make it work (Eg. pytest -n8 for 8 SeleniumBase threads). That library contains some powerful code to keep resources from overlapping each other.

I talked about that here: #2006 (comment)

If I figured out a way to make massive multithreading with UC Mode work at scale reliably without requiring pytest, then I would definitely post something about it.

I do have a question for others: What's wrong with using pytest as a test runner? There shouldn't be any limitations with using it.

0 replies

krypterro · 2023-10-16T17:16:02Z

krypterro
Oct 16, 2023
Author

I'm using cron to launch a Python application every few minutes, the application is stand-alone, and in it's own directory, which is parallel to dozens of others. The problem is the processes are running on the same server, but not being launched from the same instantiation at the same time. I had this same problem with uc years ago.

I'm not sure what chrome is doing, but it leaves a lot of processes hung up when the bot crashes, and they eventually load up and kill the application, or it's ability to talk to chrome.

Here's my brutal fix that's working, but I would certainly prefer better process management to this:

import subprocess
import psutil
import time
from loguru import logger

def get_process_info_and_kill():
    # List of process names to kill
    process_names = ["sh", "chrome", "python", "uc_driver", "rsync", "chrome_crashpad_handler"]
    # Get current time
    current_time = time.time()
    # Iterate over all running processes
    for proc in psutil.process_iter(['name', 'create_time']):
        # Check if the process name is in the list and if it has been running for more than 30 minutes
        if proc.info['name'] in process_names:
            uptime = current_time - proc.info['create_time']
            logger.info(f"Process {proc.info['name']} has been running for {uptime} seconds")
            if uptime > 1800:
                # Kill the process
                proc.kill()
                logger.warning(f"Killed process {proc.info['name']}")
                
            
if __name__ == "__main__":
    get_process_info_and_kill()

As you can see, I'm literally just killing old processes, regardless of which might still be doing their job, but these are the ones left hanging open.

How does pytest control chrome's processes within Seleniumbase differently,?

There is a lot of interest in Seleniumbase as a more to robust solution compared to uc and selenium, especially as it's working for Cloudflare when uc isn't (at the moment), the fact that it's basically a drop-in replacement for all of us coming from raw selenium or uc is awesome, but many of us are obviously not using SeleniumBase for "testing" and thus might not be so inclined to refactor to utilize pytest.

1 reply

mdmintz Oct 16, 2023
Maintainer

SeleniumBase uses fasteners.InterProcessLock to make sure that parallel processes don't overlap when attempting to use the same resource at the same time. This is used in multiple places. It works great when all your SeleniumBase subprocesses are spun up by the same process, but not so great when different processes are used.

On top of that, there appears to be a bug with ThreadPoolExecutor, as several people have encountered that with undetected-chromedriver: https://github.com/search?q=repo%3Aultrafunkamsterdam%2Fundetected-chromedriver%20ThreadPoolExecutor&type=code
SeleniumBase multithreaded tests using pytest-xdist avoid that issue completely because pytest-xdist has its own threading library that doesn't use ThreadPoolExecutor at all. I don't fully understand how pytest-xdist threading works, but I do know that it does work, and it works rather well.

It appears you have a workaround, so it's probably best that you continue using that in the meantime if it works.

krypterro · 2023-10-16T20:07:53Z

krypterro
Oct 16, 2023
Author

Actually I'm a glutton for punishment and efficiency, so I'm experimenting with building a single application launcher that will take advantage of the pytest threading library. However, I don't fully understand exactly which processes headless chrome spins up, or for what purpose. But you've seen the pytest utilization handling very large concurrency with Seleniumbase successfully? Like over a few hundred?

1 reply

mdmintz Oct 16, 2023
Maintainer

Headless Chrome utilizes Xvfb on Linux machines for the headless display in a headless environment. That's probably the only difference, process-wise. As for multiple processes, I've personally seen 64 simultaneous SeleniumBase Chrome processes. However, I've heard from others who tried that they could do over a hundred simultaneous processes, and they used pytest-xdist with SeleniumBase in order to do it successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-process concurrency errors #2196

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Multi-process concurrency errors #2196

krypterro Oct 16, 2023

Replies: 3 comments · 2 replies

mdmintz Oct 16, 2023 Maintainer

krypterro Oct 16, 2023 Author

mdmintz Oct 16, 2023 Maintainer

krypterro Oct 16, 2023 Author

mdmintz Oct 16, 2023 Maintainer

krypterro
Oct 16, 2023

Replies: 3 comments 2 replies

mdmintz
Oct 16, 2023
Maintainer

krypterro
Oct 16, 2023
Author

mdmintz Oct 16, 2023
Maintainer

krypterro
Oct 16, 2023
Author

mdmintz Oct 16, 2023
Maintainer