You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Bug
As it seems to me firecrawl server recognizes non-English links as invalid and doesn't even try to load data from them. that's why I can't get data from links with cyrillic, with punycode formats.
File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 88, in scrape_url
self._handle_error(response, 'scrape URL')
File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)
requests.exceptions.HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]
Additional Context
We parse a lot of sites with different domain names and it is super critical for us to recognize all kinds of domains
The text was updated successfully, but these errors were encountered:
Describe the Bug
As it seems to me firecrawl server recognizes non-English links as invalid and doesn't even try to load data from them. that's why I can't get data from links with cyrillic, with punycode formats.
To Reproduce
Steps to reproduce the issue:
Expected Behavior
all two links are recognized as valid and processed
Environment (please complete the following information):
Logs
Example from Python SDK
File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 88, in scrape_url
self._handle_error(response, 'scrape URL')
File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)
requests.exceptions.HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]
Additional Context
We parse a lot of sites with different domain names and it is super critical for us to recognize all kinds of domains
The text was updated successfully, but these errors were encountered: