Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] important: non-English domains are not recognized (including punycode format) #702

Open
Hitreno opened this issue Sep 24, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Hitreno
Copy link

Hitreno commented Sep 24, 2024

Describe the Bug
As it seems to me firecrawl server recognizes non-English links as invalid and doesn't even try to load data from them. that's why I can't get data from links with cyrillic, with punycode formats.

To Reproduce
Steps to reproduce the issue:

  1. Go to https://firecrawl.dev
  2. Enter in cyrillic https://дом.рф
  3. Get answer "Please enter a valid URL"
  4. Enter in punycode https://xn--d1aqf.xn--p1ai
  5. Get answer "Please enter a valid URL"
  6. Also repeat with api and get the same errors

Expected Behavior
all two links are recognized as valid and processed

Environment (please complete the following information):

Logs
Example from Python SDK

File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 88, in scrape_url
self._handle_error(response, 'scrape URL')
File "C:\Users\user\Desktop\migr_web2table.venv\Lib\site-packages\firecrawl\firecrawl.py", line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)
requests.exceptions.HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]

Additional Context
We parse a lot of sites with different domain names and it is super critical for us to recognize all kinds of domains

@Hitreno Hitreno added the bug Something isn't working label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant