PDF to Text

PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

How does it worK?

flowchart LR

A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]

Upload your PDF.
Enable OCR (for scanned documents).
Select the PDF language.
Download your output file (zip/txt).

How to support the project

You can help support the project through feedback and/or buy me coffee.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
file_pages		file_pages
README.md		README.md
app.py		app.py
functions.py		functions.py
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_pages

file_pages

README.md

README.md

app.py

app.py

functions.py

functions.py

packages.txt

packages.txt

requirements.txt

requirements.txt

Repository files navigation

PDF to Text

How does it worK?

How to support the project

About

Releases

Packages

Languages

nainiayoub/pdf-text-data-extractor

Folders and files

Latest commit

History

Repository files navigation

PDF to Text

How does it worK?

How to support the project

About

Topics

Resources

Stars

Watchers

Forks

Languages