-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special characters in image file names #33
Comments
Hi @GitHub-Mike, please provide sample URLs directly from the HTML code of the source website. I need to know if this is related to query strings and the recently added option not to hash query strings. Alternatively, if you can, please provide the URL to the tested website. This would be very helpful for me to be able to debug this on a specific site. Thanks. |
No, the problem has existed since my first crawl on 06.12.2024 and has nothing to do with Issue #30. Here are 2 examples:
I don't want to publish the URL of the website here, but I can send it to you by e-mail. |
I got similar issue and the crawler treated the links with space in Crawler Version: 1.0.8.20240824 Example below: URL parsed in the 404 URLs report Actual link in the page under |
Some images were not saved because the file name contains special characters. These are spaces and German umlauts, but other special characters will certainly also cause problems.
I would suggest using URL Encoding (Percent-Encoding) according to RFC3986 to store file names. This should cause the fewest problems with the most common operating systems and file systems.
What do you think?
The text was updated successfully, but these errors were encountered: