RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Updated
May 21, 2024 - Python
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Python script to translate a PDF file to DOCX or ODT
C# and VB.NET samples for Docotic.Pdf library
Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven
Aspose.PDF for Javascript via C++
io for nocodefunctions: csv, txt, pdf, and xlsx so far
The code base of the front-end of nocodefunctions.com
Sample code for the Datalogics .NET interface of the Adobe PDF Library
Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library
Sample code for the Datalogics C++ interface of the Adobe PDF Library
Build a RAG preprocessing pipeline
Python project that converts tables inside PDFs to CSV for convenient data manipulation. It has log and exception handling.
Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.
Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR
Pure javascript cross-platform module to extract texts from PDFs.
Graphlit Platform
A Multi Purpose PDF Toolkit
cli for extracting text from PDF files (and maybe possibly tables)
"PDF To Audio" is a Python tool that transforms PDF documents into audio files using OCR and Text-to-Speech technology. Ideal for accessibility and auditory learning, it supports multiple languages, parallel processing, and smart rate limit handling.
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."