Skip to content

Commit

Permalink
Set User-Agent: header field in HTTP request for curl downloads
Browse files Browse the repository at this point in the history
Some servers (for example wikimedia.org) don't allow downloads
with the default user agent of libcurl and send HTTP status 403,
so OCR for images on such servers fails.

Setting the user agent to "Tesseract OCR" allows OCR for images
on those servers.

Signed-off-by: Stefan Weil <[email protected]>
  • Loading branch information
stweil committed Jan 18, 2024
1 parent bcd6144 commit 1bb7250
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/api/baseapi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1184,6 +1184,10 @@ bool TessBaseAPI::ProcessPagesInternal(const char *filename, const char *retry_c
if (curlcode != CURLE_OK) {
return error("curl_easy_setopt");
}
curlcode = curl_easy_setopt(curl, CURLOPT_USERAGENT, "Tesseract OCR");
if (curlcode != CURLE_OK) {
return error("curl_easy_setopt");
}
curlcode = curl_easy_perform(curl);
if (curlcode != CURLE_OK) {
return error("curl_easy_perform");
Expand Down

0 comments on commit 1bb7250

Please sign in to comment.