Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ProxyHandler to support CIDR ranges for no_proxy #83085

Open
wants to merge 7 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions changelogs/fragments/urls-cidr-no-proxy.yml
@@ -0,0 +1,2 @@
minor_changes:
- urls.py - Support CIDR ranges for no_proxy
52 changes: 50 additions & 2 deletions lib/ansible/module_utils/urls.py
Expand Up @@ -37,6 +37,7 @@
import email.policy
import email.utils
import http.client
import ipaddress
import mimetypes
import netrc
import os
Expand Down Expand Up @@ -67,6 +68,7 @@
from ansible.module_utils.basic import missing_required_lib
from ansible.module_utils.common.collections import Mapping, is_sequence
from ansible.module_utils.common.text.converters import to_bytes, to_native, to_text
from ansible.module_utils.common import warnings

try:
import ssl
Expand Down Expand Up @@ -300,6 +302,51 @@ def http_open(self, req):
return self.do_open(UnixHTTPConnection(self._unix_socket), req)


class ProxyHandler(urllib.request.ProxyHandler):
_SPLITPORT_RE = re.compile('(.*):([0-9]*)', re.DOTALL)
sivel marked this conversation as resolved.
Show resolved Hide resolved

@classmethod
def _splitport(cls, host):
# Copied from cpython urllib.parse
match = cls._SPLITPORT_RE.fullmatch(host)
if match:
host, port = match.groups()
if port:
return host, port
return host, None
sivel marked this conversation as resolved.
Show resolved Hide resolved

@staticmethod
def _matches(host, port, bypass_network, bypass_port, scheme):
if not port:
port = '443' if scheme == 'https' else '80'
if bypass_port and port != bypass_port:
return False
return host in bypass_network

def proxy_open(self, req, *args):
Copy link
Member Author

@sivel sivel Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest concern I have about this implementation, is in overriding this method. proxy_open is not a documented method at https://docs.python.org/3/library/urllib.request.html

At closest it the <protocol>_open methods are described as part of https://docs.python.org/3/library/urllib.request.html#protocol-open but proxy isn't really a protocol like http, https, file, etc are.

And further more, it's highly dependent on urllib.request.ProxyHandler.__init__: https://github.com/python/cpython/blob/8f25cc992021d6ffc62bb110545b97a92f7cb295/Lib/urllib/request.py#L770-L774

However, the use of proxy_open has been around, unchanged for 25+ years, so it seems relatively safe and stable: python/cpython@6d7e47b

I guess at a minimum, this is why we have tests, which this PR includes.

"""Implements proxy bypassing for cidr ranges"""
hostonly, port = self._splitport(req.host)
try:
req_ip = ipaddress.ip_address(hostonly)
except ValueError:
return super().proxy_open(req, *args)

no_proxy = self.proxies.get('no') or ''
for bypass in map(str.strip, no_proxy.split(',')):
if '/' not in bypass:
continue
bypass_host, bypass_port = self._splitport(bypass)
try:
bypass_network = ipaddress.ip_network(bypass_host)
except ValueError as e:
warnings.warn(f'no_proxy entry appears to be a CIDR range, but could not be parsed: {e}')
sivel marked this conversation as resolved.
Show resolved Hide resolved
continue
if self._matches(req_ip, port, bypass_network, bypass_port, req.type):
return None

return super().proxy_open(req, *args)


class ParseResultDottedDict(dict):
'''
A dict that acts similarly to the ParseResult named tuple from urllib
Expand Down Expand Up @@ -844,9 +891,10 @@ def open(self, method, url, data=None, headers=None, use_proxy=None,
headers.update(auth_headers)
handlers.extend(auth_handlers)

proxies = None
if not use_proxy:
proxyhandler = urllib.request.ProxyHandler({})
handlers.append(proxyhandler)
proxies = {}
handlers.append(ProxyHandler(proxies))

if not context:
context = make_context(
Expand Down
12 changes: 12 additions & 0 deletions test/integration/targets/setup_proxy/files/hamsandwich.py
@@ -0,0 +1,12 @@
from __future__ import annotations

from proxy.http.proxy import HttpProxyBasePlugin


class HamSandwichPlugin(HttpProxyBasePlugin):

def handle_upstream_chunk(self, chunk):
headers, sep, body = chunk.tobytes().partition(b'\r\n\r\n')
if not sep:
return chunk
return memoryview(bytearray(headers + b'\r\nX-Sandwich: ham' + sep + body))
2 changes: 2 additions & 0 deletions test/integration/targets/setup_proxy/handlers/main.yml
@@ -0,0 +1,2 @@
- name: stop proxy.py
command: kill {{ proxy_py_pid.content|b64decode }}
sivel marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions test/integration/targets/setup_proxy/meta/main.yml
@@ -0,0 +1,2 @@
dependencies:
- setup_remote_tmp_dir
35 changes: 35 additions & 0 deletions test/integration/targets/setup_proxy/tasks/main.yml
@@ -0,0 +1,35 @@
- name: install proxy.py
pip:
name: proxy.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to pin it to the most recent release candidate (otherwise it may end up being pretty old). FYI.

virtualenv: '{{ remote_tmp_dir }}/proxy_py'
virtualenv_command: "{{ ansible_python_interpreter }} -m venv"
notify: stop proxy.py

- name: get venv site-packages
command: >-
{{ remote_tmp_dir }}/proxy_py/bin/python -c 'import site; print(site.getsitepackages()[0])'
register: proxy_py_site_packages

- name: install proxy.py plugin
copy:
src: hamsandwich.py
dest: '{{ proxy_py_site_packages.stdout }}/hamsandwich.py'

- name: start proxy.py
command: >-
{{ remote_tmp_dir }}/proxy_py/bin/proxy --port 8080 --log-file "{{ remote_tmp_dir }}/proxy_py/proxy_py.log"
--plugins hamsandwich.HamSandwichPlugin --pid-file "{{ remote_tmp_dir }}/proxy_py/proxy_py.pid"
async: 120
poll: 0
register: proxy_py

- name: wait for proxy.py to start
wait_for:
port: 8080
connect_timeout: 1
timeout: 10

- name: get proxy.py pid
slurp:
path: '{{ remote_tmp_dir }}/proxy_py/proxy_py.pid'
register: proxy_py_pid
1 change: 1 addition & 0 deletions test/integration/targets/uri/aliases
@@ -1,3 +1,4 @@
destructive
shippable/posix/group1
needs/httptester
needs/target/setup_proxy
3 changes: 3 additions & 0 deletions test/integration/targets/uri/tasks/main.yml
Expand Up @@ -727,6 +727,9 @@
- name: Test unix socket
import_tasks: install-socat-and-test-unix-socket.yml

- name: Test proxy
import_tasks: proxy.yml

- name: ensure skip action
uri:
url: http://example.com
Expand Down
117 changes: 117 additions & 0 deletions test/integration/targets/uri/tasks/proxy.yml
@@ -0,0 +1,117 @@
- include_role:
name: setup_proxy

- name: Get IP address for ansible.http.tests
command: >-
{{ ansible_python_interpreter }} -c 'import socket; print(socket.gethostbyname("{{ httpbin_host }}"))'
register: httpbin_ip

- name: Test http over http proxy
uri:
url: http://{{ httpbin_host }}/get
environment:
http_proxy: http://127.0.0.1:8080
register: http_over_http
failed_when: http_over_http.x_sandwich is undefined

- name: Test https over http proxy
uri:
url: https://{{ httpbin_host }}/get
environment:
https_proxy: http://127.0.0.1:8080
register: https_over_http
# failed_when:
# failure checking is handled by the assert at the bottom comparing logs
# because we aren't running a proxy that can inspect the https stream
# there won't be added headers

- name: Test request without a proxy
uri:
url: http://{{ httpbin_host }}/get
register: request_without_proxy
failed_when: request_without_proxy.x_sandwich is defined

- name: Test request with proxy and no_proxy=hostname
uri:
url: http://{{ httpbin_host }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_host }}'
register: no_proxy_hostname
failed_when: no_proxy_hostname.x_sandwich is defined

- name: Test request with proxy and no_proxy=ip
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_ip.stdout }}'
register: no_proxy_ip
failed_when: no_proxy_ip.x_sandwich is defined

- name: Test request with proxy and no_proxy=cidr/32
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_ip.stdout }}/32'
register: no_proxy_cidr_32
failed_when: no_proxy_cidr_32.x_sandwich is defined

- name: Test request with proxy and no_proxy=cidr/24
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_cidr }}'
register: no_proxy_cidr_24
vars:
httpbin_cidr: "{{ httpbin_ip.stdout.split('.')[:3]|join('.') }}.0/24"
failed_when: no_proxy_cidr_24.x_sandwich is defined

- name: Test request with proxy and non-matching no_proxy=cidr
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: 1.2.3.0/24
register: no_proxy_non_matching_cidr
failed_when: no_proxy_non_matching_cidr.x_sandwich is undefined

- name: Test request with proxy and no_proxy=cidr:port
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_ip.stdout }}/32:80'
register: no_proxy_cidr_port
failed_when: no_proxy_cidr_port.x_sandwich is defined

- name: Test request with proxy and non-matching no_proxy=cidr:port
uri:
url: http://{{ httpbin_ip.stdout }}/get
environment:
http_proxy: http://127.0.0.1:8080
no_proxy: '{{ httpbin_ip.stdout }}/32:8080'
register: no_proxy_non_matching_cidr_port
failed_when: no_proxy_non_matching_cidr_port.x_sandwich is undefined

- slurp:
path: "{{ remote_tmp_dir }}/proxy_py/proxy_py.log"
register: proxy_py_logs

- debug:
msg: '{{ proxy_py_logs.content|b64decode }}'

- assert:
that:
- >-
log_content is contains "CONNECT " ~ httpbin_host ~ ":443"
# https over http
- >-
log_content|regex_findall("CONNECT ")|length == 1
# 3 http over http
- >-
log_content|regex_findall('GET')|length == 3
vars:
log_content: '{{ proxy_py_logs.content|b64decode }}'