Skip to content

Commit

Permalink
Merge pull request #2882 from seleniumbase/uc-mode-updates-and-refact…
Browse files Browse the repository at this point in the history
…oring

UC Mode updates and refactoring
  • Loading branch information
mdmintz authored Jun 28, 2024
2 parents 0fb6618 + 8681140 commit 5a23445
Show file tree
Hide file tree
Showing 14 changed files with 125 additions and 133 deletions.
73 changes: 47 additions & 26 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,34 @@
# SeleniumBase Docker Image
FROM ubuntu:22.04
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

#======================
# Locale Configuration
#======================
RUN apt-get update
RUN apt-get install -y --no-install-recommends tzdata locales
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV TZ="America/New_York"

#======================
# Install Common Fonts
#======================
RUN apt-get update
RUN apt-get install -y \
fonts-liberation \
fonts-open-sans \
fonts-mononoki \
fonts-roboto \
fonts-lato

#============================
# Install Linux Dependencies
#============================
RUN apt-get update && apt-get install -y \
fonts-liberation \
RUN apt-get update
RUN apt-get install -y \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
Expand All @@ -17,60 +40,57 @@ RUN apt-get update && apt-get install -y \
libgtk-3-0 \
libnspr4 \
libnss3 \
libu2f-udev \
libvulkan1 \
libwayland-client0 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxkbcommon0 \
libxrandr2 \
libu2f-udev \
libvulkan1 \
xdg-utils
libxrandr2

#==========================
# Install useful utilities
#==========================
RUN apt-get update
RUN apt-get install -y xdg-utils

#=================================
# Install Bash Command Line Tools
#=================================
RUN apt-get update
RUN apt-get -qy --no-install-recommends install \
curl \
sudo \
unzip \
vim \
wget \
xvfb \
&& rm -rf /var/lib/apt/lists/*
xvfb

#================
# Install Chrome
#================
RUN curl -LO https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN apt-get install -y ./google-chrome-stable_current_amd64.deb
RUN apt-get update
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb
RUN apt-get -fy --no-install-recommends install
RUN rm google-chrome-stable_current_amd64.deb

#================
# Install Python
#================
RUN apt-get update -y
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev
RUN apt-get update
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev python3-tk
RUN alias python=python3
RUN echo "alias python=python3" >> ~/.bashrc
RUN apt-get -qy --no-install-recommends install python3.10
RUN rm /usr/bin/python3
RUN ln -s python3.10 /usr/bin/python3

#=============================================
# Allow Special Characters in Python Programs
#=============================================
RUN export PYTHONIOENCODING=utf8
RUN echo "export PYTHONIOENCODING=utf8" >> ~/.bashrc

#===========================
# Configure Virtual Display
#===========================
RUN set -e
RUN echo "Starting X virtual framebuffer (Xvfb) in background..."
RUN Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
RUN export DISPLAY=:99
RUN exec "$@"
#===============
# Cleanup Lists
#===============
RUN rm -rf /var/lib/apt/lists/*

#=====================
# Set up SeleniumBase
Expand All @@ -89,6 +109,7 @@ RUN find . -name '*.pyc' -delete
RUN pip install --upgrade pip setuptools wheel
RUN cd /SeleniumBase && ls && pip install -r requirements.txt --upgrade
RUN cd /SeleniumBase && pip install .
RUN pip install pyautogui

#=======================
# Download chromedriver
Expand Down
3 changes: 1 addition & 2 deletions examples/raw_cdp_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
try:
url = "seleniumbase.io/apps/turnstile"
driver.uc_open_with_reconnect(url, 2)
driver.switch_to_frame("iframe")
driver.uc_click("span")
driver.uc_gui_handle_cf()
driver.sleep(3)
pprint(driver.get_log("performance"))
finally:
Expand Down
6 changes: 3 additions & 3 deletions examples/raw_form_turnstile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/form_turnstile"
sb.driver.uc_open_with_reconnect(url, 2)
sb.uc_open_with_reconnect(url, 2)
sb.press_keys("#name", "SeleniumBase")
sb.press_keys("#email", "[email protected]")
sb.press_keys("#phone", "1-555-555-5555")
Expand All @@ -12,8 +12,8 @@
sb.click('span:contains("9:00 PM")')
sb.highlight_click('input[value="AR"] + span')
sb.click('input[value="cc"] + span')
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.scroll_to("iframe")
sb.uc_gui_handle_cf()
sb.highlight("img#captcha-success", timeout=3)
sb.highlight_click('button:contains("Request & Pay")')
sb.highlight("img#submit-success")
Expand Down
18 changes: 5 additions & 13 deletions examples/raw_nopecha.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,11 @@
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
sb.driver.uc_open_with_reconnect("nopecha.com/demo/turnstile", 4)
if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
if not sb.is_element_visible("circle.success-circle"):
sb.driver.uc_click("span", reconnect_time=3)
sb.switch_to_frame("#example-container0 iframe")
sb.switch_to_default_content()

sb.switch_to_frame("#example-container5 iframe")
sb.driver.uc_click("span", reconnect_time=2.5)
sb.switch_to_frame("#example-container5 iframe")
sb.assert_element("svg#success-icon", timeout=3)
sb.switch_to_parent_frame()
sb.uc_open_with_disconnect("nopecha.com/demo/turnstile", 3.5)
sb.uc_gui_press_keys("\t\t ")
sb.sleep(3.5)
sb.connect()
sb.uc_gui_handle_cf("#example-container5 iframe")

if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
Expand Down
20 changes: 3 additions & 17 deletions examples/raw_turnstile.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,9 @@
from seleniumbase import SB


def open_the_turnstile_page(sb):
with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/turnstile"
sb.driver.uc_open_with_reconnect(url, reconnect_time=2)


def click_turnstile_and_verify(sb):
sb.driver.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_open_with_reconnect(url, reconnect_time=2)
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)


with SB(uc=True, test=True) as sb:
open_the_turnstile_page(sb)
try:
click_turnstile_and_verify(sb)
except Exception:
open_the_turnstile_page(sb)
click_turnstile_and_verify(sb)
sb.set_messenger_theme(location="top_left")
sb.post_message("SeleniumBase wasn't detected", duration=3)
5 changes: 2 additions & 3 deletions examples/uc_cdp_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,15 @@ def add_cdp_listener(self):
)

def click_turnstile_and_verify(sb):
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)
sb.highlight("img#captcha-success", loops=8)

def test_display_cdp_events(self):
if not (self.undetectable and self.uc_cdp_events):
self.get_new_driver(undetectable=True, uc_cdp_events=True)
url = "seleniumbase.io/apps/turnstile"
self.driver.uc_open_with_reconnect(url, 2)
self.uc_open_with_reconnect(url, 2)
self.add_cdp_listener()
self.click_turnstile_and_verify()
self.sleep(1)
Expand Down
85 changes: 32 additions & 53 deletions help_docs/uc_mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

* Automatically changing user agents to prevent detection.
* Automatically setting various chromium args as needed.
* Has special methods. Eg. `driver.uc_click(selector)`
* Has special `uc_*()` methods.

👤 Here's an example with the <b><code translate="no">Driver</code></b> manager:

Expand Down Expand Up @@ -67,22 +67,11 @@ with SB(uc=True, test=True) as sb:
```python
from seleniumbase import SB

def open_the_turnstile_page(sb):
with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/turnstile"
sb.driver.uc_open_with_reconnect(url, reconnect_time=2)

def click_turnstile_and_verify(sb):
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_open_with_reconnect(url, reconnect_time=2)
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)

with SB(uc=True, test=True) as sb:
open_the_turnstile_page(sb)
try:
click_turnstile_and_verify(sb)
except Exception:
open_the_turnstile_page(sb)
click_turnstile_and_verify(sb)
sb.set_messenger_theme(location="top_left")
sb.post_message("SeleniumBase wasn't detected", duration=3)
```
Expand Down Expand Up @@ -129,6 +118,27 @@ with SB(uc=True, test=True, ad_block_on=True) as sb:

<img src="https://seleniumbase.github.io/other/ttm_bypass.png" title="SeleniumBase" width="540">

👤 <b>On Linux</b>, use `sb.uc_gui_handle_cf()` to handle Cloudflare Turnstiles:

```python
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
url = "https://www.virtualmanager.com/en/login"
sb.uc_open_with_reconnect(url, 4)
print(sb.get_page_title())
sb.uc_gui_handle_cf() # Ready if needed!
print(sb.get_page_title())
sb.assert_element('input[name*="email"]')
sb.assert_element('input[name*="login"]')
sb.set_messenger_theme(location="bottom_center")
sb.post_message("SeleniumBase wasn't detected!")
```

<a href="https://github.com/mdmintz/undetected-testing/actions/runs/9637461606/job/26576722411"><img width="540" alt="uc_gui_handle_cf on Linux" src="https://github.com/seleniumbase/SeleniumBase/assets/6788579/6aceb2a3-2a32-4521-b30a-f79446d2ce28"></a>

The 2nd `print()` should output "Virtual Manager", which means that the automation successfully passed the Turnstile.

--------

👤 In <b translate="no">UC Mode</b>, <code translate="no">driver.get(url)</code> has been modified from its original version: If anti-bot services are detected from a <code translate="no">requests.get(url)</code> call that's made before navigating to the website, then <code translate="no">driver.uc_open_with_reconnect(url)</code> will be used instead. To open a URL normally in <b translate="no">UC Mode</b>, use <code translate="no">driver.default_get(url)</code>.
Expand All @@ -144,6 +154,7 @@ with SB(uc=True, test=True, ad_block_on=True) as sb:
<img src="https://seleniumbase.github.io/other/pixelscan.jpg" title="SeleniumBase" width="540">

### 👤 Here are some UC Mode examples that bypass CAPTCHAs when clicking is required:
* [SeleniumBase/examples/raw_pyautogui.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_pyautogui.py)
* [SeleniumBase/examples/raw_turnstile.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_turnstile.py)
* [SeleniumBase/examples/raw_form_turnstile.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_form_turnstile.py)
* [SeleniumBase/examples/uc_cdp_events.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/uc_cdp_events.py)
Expand Down Expand Up @@ -214,11 +225,6 @@ driver.reconnect("breakpoint")

(Note that while the special <b><code translate="no">UC Mode</code></b> breakpoint is active, you can't use <b><code translate="no">Selenium</code></b> commands in the browser, and the browser can't detect <b><code translate="no">Selenium</code></b>.)

👤 The two main causes of getting detected in <b translate="no">UC Mode</b> (which are both easily handled) are:

<li>Timing. (<b translate="no">UC Mode</b> methods let you customize default values that aren't good enough for your environment.)</li>
<li>Not using <b><code translate="no">driver.uc_click(selector)</code></b> when you need to remain undetected while clicking something.</li>

👤 On Linux, you may need to use `driver.uc_gui_handle_cf()` to successfully bypass a Cloudflare CAPTCHA. If there's more than one iframe on that website (and Cloudflare isn't the first one) then put the CSS Selector of that iframe as the first arg to `driver.uc_gui_handle_cf()`. This method uses `pyautogui`. In order for `pyautogui` to focus on the correct element, use `xvfb=True` / `--xvfb` to activate a special virtual display on Linux.

👤 To find out if <b translate="no">UC Mode</b> will work at all on a specific site (before adjusting for timing), load your site with the following script:
Expand Down Expand Up @@ -268,46 +274,15 @@ with ThreadPoolExecutor(max_workers=len(urls)) as executor:

--------

👥 <b>Double Duty:</b> Here's an example of handling two CAPTCHAs on one page:

<img src="https://seleniumbase.github.io/other/nopecha.png" title="SeleniumBase" align="center" width="630">

```python
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
sb.driver.uc_open_with_reconnect("nopecha.com/demo/turnstile", 3.4)
if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
if not sb.is_element_visible("circle.success-circle"):
sb.driver.uc_click("span", reconnect_time=3)
sb.switch_to_frame("#example-container0 iframe")
sb.switch_to_default_content()

sb.switch_to_frame("#example-container5 iframe")
sb.driver.uc_click("span", reconnect_time=2.5)
sb.switch_to_frame("#example-container5 iframe")
sb.assert_element("svg#success-icon", timeout=3)
sb.switch_to_parent_frame()

if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
sb.assert_element("circle.success-circle")
sb.switch_to_parent_frame()

sb.set_messenger_theme(location="top_center")
sb.post_message("SeleniumBase wasn't detected!", duration=3)
```

--------

👤 <b>What makes UC Mode work?</b>

Here are the 3 primary things that <b translate="no">UC Mode</b> does to make bots appear human:

<ul>
<li>Modifies <b><code translate="no">chromedriver</code></b> to rename <b translate="no">Chrome DevTools Console</b> variables.</li>
<li>Launches <b translate="no">Chrome</b> browsers before attaching <b><code translate="no">chromedriver</code></b> to them.</li>
<li>Disconnects <b><code translate="no">chromedriver</code></b> from <b translate="no">Chrome</b> during stealthy actions.</li>
</ul>

For example, if the <b translate="no">Chrome DevTools Console</b> variables aren't renamed, you can expect to find them easily when using <b><code translate="no">selenium</code></b> for browser automation:

Expand All @@ -321,13 +296,17 @@ While <b><code translate="no">chromedriver</code></b> is connected to <b transla

Links to those <a href="https://github.com/SeleniumHQ/selenium">raw <b>Selenium</b></a> method definitions have been provided for reference (but you don't need to call those methods directly):

<ul>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/common/service.py#L135">driver.service.stop()</a></code></b></li>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/common/service.py#L91">driver.service.start()</a></code></b></li>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/remote/webdriver.py#L284">driver.start_session(capabilities)</a></code></b></li>
</ul>

Also note that <b><code translate="no">chromedriver</code></b> isn't detectable in a browser tab if it never touches that tab. Here's a JS command that lets you open a URL in a new tab (from your current tab):

<ul>
<li><b><code translate="no">window.open("URL");</code></b> --> (Info: <a href="https://www.w3schools.com/jsref/met_win_open.asp" target="_blank">W3Schools</a>)</li>
</ul>

The above JS method is used within <b translate="no"><code>SeleniumBase</code></b> <b translate="no">UC Mode</b> methods for opening URLs in a stealthy way. Since some websites try to detect if your browser is a bot on the initial page load, this allows you to bypass detection in those situations. After a few seconds (customizable), <b translate="no">UC Mode</b> tells <b><code translate="no">chromedriver</code></b> to connect to that tab so that automated commands can now be issued. At that point, <b><code translate="no">chromedriver</code></b> could be detected if websites are looking for it (but generally websites only look for it during specific events, such as page loads, form submissions, and button clicks).

Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
pip>=24.0;python_version<"3.8"
pip>=24.1;python_version>="3.8"
pip>=24.1.1;python_version>="3.8"
packaging>=24.0;python_version<"3.8"
packaging>=24.1;python_version>="3.8"
setuptools>=68.0.0;python_version<"3.8"
setuptools>=70.1.0;python_version>="3.8"
setuptools>=70.1.1;python_version>="3.8"
wheel>=0.42.0;python_version<"3.8"
wheel>=0.43.0;python_version>="3.8"
attrs>=23.2.0
Expand Down
2 changes: 1 addition & 1 deletion seleniumbase/__version__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# seleniumbase package
__version__ = "4.28.0"
__version__ = "4.28.1"
Loading

0 comments on commit 5a23445

Please sign in to comment.