Skip to content

Simple Multilingual PDF text extraction, Also extracts from images

License

Notifications You must be signed in to change notification settings

FSOCIETY06/pdf2textlib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

pdf2textlib

PyPI Status Downloads

Simple Multilingual PDF text extraction, Also extracts from images

import pdf2textlib

print(pdf2textlib.getText("Demo.pdf","eng+tel+urd"))  
# parameter 1 : Path to the PDF file
# parameter 2 : string of language codes separated by '+' sign 

OS Dependencies

Debian, Ubuntu, and friends

sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

macOS

brew install pkg-config poppler

Conda users may also need libgcc:

conda install -c anaconda libgcc

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda:
    conda install -c conda-forge poppler
    

Install

pip install pdf2textlib

About

Simple Multilingual PDF text extraction, Also extracts from images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages