Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: PDF file upload failed - Could not initialize tesseract #618

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

azaylamba
Copy link
Contributor

Issue #: #614

Description of changes:
Was getting error unstructured_pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

Added tesseract-eng in the docker file so that PDF files produced by Microsoft: Print to PDF can be processed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Was getting error unstructured_pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

1 participant