Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPG parser is broken because cfscrape was broken by urllib3 update #27

Closed
cmrtdev opened this issue Jul 2, 2023 · 4 comments
Closed
Labels
good first issue Good for newcomers

Comments

@cmrtdev
Copy link

cmrtdev commented Jul 2, 2023

cfscrape (https://github.com/Anorov/cloudflare-scrape) doesn't look like it's being developed any longer. Meanwhile, urllib3 has continued changing, and in version 2.0.0 the removed support for DEFAULT_CIPHERS, which cfscrape relies on.

The result is that trying to run the EPG parser fails with

$ python3 main.py 
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    from epg_sources.teleboy import teleboy
  File "epg_sources/teleboy.py", line 7, in <module>
    import cfscrape
  File "lib/python3.11/site-packages/cfscrape/__init__.py", line 19, in <module>
    from urllib3.util.ssl_ import create_urllib3_context, DEFAULT_CIPHERS
ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_' (TV7_EPG_Parser/lib/python3.11/site-packages/urllib3/util/ssl_.py)

The best solution would be to no longer depend on cfscrape. I am not familiar with it, but among the 118 open issues there seems to be quite a few mentioning that it's broken and doesn't really bypass Cloudflare anymore (like 455, 458 and 459).

As a temporary measure I was able to make the parser work with a pip3 install urllib3==1.26.16 (the last release before 2.0.0); this may be a good addition to requirements.txt (urllib3 is installed by requests, but it doesn't seem to create a problem. Yet), but I expect this to break once and for all when requests starts requiring urllib3>=2.0.0 :(

Hope this helps!

@mathewmeconry mathewmeconry added the good first issue Good for newcomers label Jul 4, 2023
@mathewmeconry
Copy link
Owner

for now, the parser still seems to work and there is data in the data repo.
But definitely, something I need to look at in the near future. Thanks for the report!

Meanwhile, if somebody has experience with Cloudflare scraping I am open for pull requests

@rbkn
Copy link
Collaborator

rbkn commented Aug 8, 2023

Since the endpoint the scraper is hitting is an API endpoint (at least for teleboy), I wouldn't have thought this would have any cf anti-bot measures on it so cfscrape would likely not be required in any case.
EDIT: I see- 403 from direct request.

@mathewmeconry
Copy link
Owner

was removed by @rbkn and it seems to work after a small fix
https://app.circleci.com/pipelines/github/mathewmeconry/TV7_EPG_Parser/1273/workflows/88f1205d-0ebb-4191-9dae-69bc479bba32/jobs/2981

@cmrtdev
Copy link
Author

cmrtdev commented Aug 14, 2023

Thanks both! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants