-
-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/impersonate 6.0 #163
Changes from 5 commits
5ae2216
04b4671
b150a56
efb322c
d424260
e1d486d
530455d
c7f360c
e745862
5069ea7
63b277b
fd3bfa7
b7203ad
6799e09
92a2ddf
5e8a477
905cf94
f361e91
1748982
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,8 @@ To install beta releases: | |
|
||
## Usage | ||
|
||
Use the latest impersonate versions, do NOT copy `chrome110` here without changing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section could be improved |
||
|
||
### requests-like | ||
|
||
```python | ||
|
@@ -74,14 +76,21 @@ print(r.json()) | |
# {'cookies': {'foo': 'bar'}} | ||
``` | ||
|
||
Supported impersonate versions, as supported by [curl-impersonate](https://github.com/lwthiker/curl-impersonate): | ||
Supported impersonate versions, as supported by my [fork](https://github.com/yifeikong/curl-impersonate) of [curl-impersonate](https://github.com/lwthiker/curl-impersonate): | ||
|
||
However, only Chrome-like browsers are supported. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could be more explicit here to why not supported others. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, it would be great to have some documentation explaining why we don't support firefox or other ones supported by the curl_impersonate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly simply adding a link to #59 (comment) will be enough There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Appologies for the missunderstanding, I'll make sure it's well documented in the new version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please never apologize! Thank you for the work on this mate its an amazing repository! Only reviewing to try help somewhat ^^ |
||
|
||
- chrome99 | ||
- chrome100 | ||
- chrome101 | ||
- chrome104 | ||
- chrome107 | ||
- chrome110 | ||
- chrome116 | ||
- chrome117 | ||
- chrome118 | ||
- chrome119 | ||
- chrome120 | ||
- chrome99_android | ||
- edge99 | ||
- edge101 | ||
|
@@ -140,7 +149,10 @@ print(body.decode()) | |
|
||
See the [docs](https://curl-cffi.readthedocs.io) for more details. | ||
|
||
If you are using scrapy, check out this middleware: [tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint) | ||
If you are using scrapy, check out these middlewares: | ||
|
||
- [tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint) | ||
- [jxlil/scrapy-impersonate](https://github.com/jxlil/scrapy-impersonate) | ||
|
||
## Acknowledgement | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
from typing import Any, List, Tuple, Union | ||
|
||
from ._wrapper import ffi, lib # type: ignore | ||
from .const import CurlHttpVersion, CurlInfo, CurlOpt | ||
from .const import CurlHttpVersion, CurlInfo, CurlOpt, CurlWsFlag | ||
|
||
try: | ||
import certifi | ||
|
@@ -107,6 +107,10 @@ def _set_error_buffer(self): | |
self.setopt(CurlOpt.VERBOSE, 1) | ||
lib._curl_easy_setopt(self._curl, CurlOpt.DEBUGFUNCTION, lib.debug_function) | ||
|
||
def debug(self): | ||
self.setopt(CurlOpt.VERBOSE, 1) | ||
lib._curl_easy_setopt(self._curl, CurlOpt.DEBUGFUNCTION, lib.debug_function) | ||
|
||
def __del__(self): | ||
self.close() | ||
|
||
|
@@ -335,3 +339,25 @@ def close(self): | |
self._curl = None | ||
ffi.release(self._error_buffer) | ||
self._resolve = ffi.NULL | ||
|
||
def ws_recv(self, n: int = 1024): | ||
buffer = ffi.new("char[]", n) | ||
n_recv = ffi.new("int *") | ||
p_frame = ffi.new("struct curl_ws_frame **") | ||
|
||
ret = lib.curl_ws_recv(self._curl, buffer, n, n_recv, p_frame) | ||
self._check_error(ret, "WS_RECV") | ||
frame = p_frame[0] | ||
# print(frame.offset, frame.bytesleft) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: remove commented print |
||
|
||
return ffi.buffer(buffer)[: n_recv[0]], frame | ||
|
||
def ws_send(self, payload: bytes, flags: CurlWsFlag = CurlWsFlag.BINARY) -> int: | ||
n_sent = ffi.new("int *") | ||
buffer = ffi.from_buffer(payload) | ||
ret = lib.curl_ws_send(self._curl, buffer, len(buffer), n_sent, 0, flags) | ||
self._check_error(ret, "WS_SEND") | ||
return n_sent | ||
|
||
def ws_close(self): | ||
self.ws_send(b"", CurlWsFlag.CLOSE) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import asyncio | ||
from curl_cffi.const import CurlECode, CurlWsFlag | ||
from curl_cffi.curl import CurlError | ||
|
||
|
||
class WebSocket: | ||
def __init__(self, session, curl): | ||
self.session = session | ||
self.curl = curl | ||
self._loop = None | ||
|
||
def recv_fragment(self): | ||
return self.curl.ws_recv() | ||
|
||
def recv(self): | ||
chunks = [] | ||
# TODO use select here | ||
while True: | ||
try: | ||
chunk, frame = self.curl.ws_recv() | ||
chunks.append(chunk) | ||
if frame.bytesleft == 0: | ||
break | ||
except CurlError as e: | ||
if e.code == CurlECode.AGAIN: | ||
pass | ||
else: | ||
raise | ||
|
||
return b"".join(chunks) | ||
|
||
def send(self, payload: bytes, flags: CurlWsFlag = CurlWsFlag.BINARY): | ||
return self.curl.ws_send(payload, flags) | ||
|
||
def close(self): | ||
# FIXME how to reset. or can a curl handle connect to two websockets? | ||
self.curl.close() | ||
|
||
@property | ||
def loop(self): | ||
if self._loop is None: | ||
self._loop = asyncio.get_running_loop() | ||
return self._loop | ||
|
||
async def arecv(self): | ||
return await self.loop.run_in_executor(None, self.recv) | ||
|
||
async def asend(self, payload: bytes, flags: CurlWsFlag = CurlWsFlag.BINARY): | ||
return await self.loop.run_in_executor(None, self.send, payload, flags) | ||
|
||
async def aclose(self): | ||
await self.loop.run_in_executor(None, self.close) | ||
self.curl.reset() | ||
self.session.push_curl(curl) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import asyncio | ||
from curl_cffi import requests | ||
|
||
with requests.Session() as s: | ||
w = s.connect("ws://localhost:8765") | ||
w.send(b"Foo") | ||
reply = w.recv() | ||
print(reply) | ||
assert reply == b"Hello Foo!" | ||
|
||
|
||
async def async_examples(): | ||
async with requests.AsyncSession() as s: | ||
w = await s.connect("ws://localhost:8765") | ||
await w.asend(b"Bar") | ||
reply = await w.arecv() | ||
print(reply) | ||
assert reply == b"Hello Bar!" | ||
|
||
|
||
asyncio.run(async_examples()) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import asyncio | ||
import websockets | ||
|
||
async def hello(websocket): | ||
name = (await websocket.recv()).decode() | ||
print(f"<<< {name}") | ||
|
||
greeting = f"Hello {name}!" | ||
|
||
await websocket.send(greeting) | ||
print(f">>> {greeting}") | ||
|
||
async def main(): | ||
async with websockets.serve(hello, "localhost", 8765): | ||
await asyncio.Future() # run forever | ||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently curl_impersonate only supports until version 116, are we not worried already providing support to 120 when it doesn't handle this yet?
Ref. https://github.com/lwthiker/curl-impersonate?tab=readme-ov-file#supported-browsers
On that note would it not make more sense to offer support for firefox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the usage table here, most users are using the latest versions of Chrome and Safari. For strict blocking strategy, it's reasonable to just block users with any older versions of browsers.
116 is mucher new than 110, but it does not make things significantly better, let alone that their fingerprints are actually the same. The insteresting part is in 117, when ECH was added.
Actually I have been working on this in my fork of curl-impersonate. Hopefully I could get it landed before Chrome 120 is main stream. I'm just too busy on other stuff recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of firefox, it's really challenging to pack an addtional
.so
file in a python wheel. There are two options to bypass this:At least one of them should work, just haven't had time to try them out. Maybe I can experiment them during the Chinese New Year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it would be nice to have the Chrome version117+ available sooner as this will help a lot more with the more challenging sites. (Already saw you are well on the way through all the different versions there on your fork.)
As for firefox probably the easier of the two options you mentioned would be to simply release a new package for firefox curl, but this would require maintenance of both packages simultaneously which seems a lot more effort on your part.
I would love to get closer to this project although I am extremely new to it, if there are any smaller issues for me to explore and help out on let me know and will try tackle it in my free time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be open to trying out the multiple package option. The opencv_python project builds 4 different packages (each with slightly different configurations) out of the same base repo, so I think it should be possible to minimize the maintenance overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related option could possibly be to factor out the ffi binding portion into its own standalone package, build chrome/firefox versions of that, and have curl_cffi import the bindings packages. This way, the requests/async interfaces that curl_cffi provides don't need be duplicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At here, IMO, simulating NSS on BoringSSL could have more priority than maintaining multiple packages.
This may need to have some patches on BoringSSL, but I think it's worth to try investigating on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try whatever you like, I'm open to merge them both since there is no conflict, actually.