⚡ Cloud-native, AI-powered, document processing pipelines on AWS.
-
Updated
Jun 12, 2024 - TypeScript
⚡ Cloud-native, AI-powered, document processing pipelines on AWS.
A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!
XmlDocumentProcessor: A .NET component for XML document processing. It analyzes XML content, performs keyword-based queries, and transforms data into HTML. Emphasizes design patterns like Strategy pattern, with a focus on class diagramming. Implements penalty for non-compliance.
A Python framework for multi-modal document understanding with Amazon Bedrock
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
TU Dublin Computer Science MSc. Final Project Group 3 - Accessibilator
AI-powered chatbot designed to simplify the job search process
An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.
ClearCouncil: Automated tools for collecting, organizing, and embedding publicly available local state county council documents (minutes, agendas) into LLMs. Python, JS, and wget scripts included for easy data retrieval and integration.
Use data from MongoDB in LangChain, Llama and OpenAI
Text line detection for Urdu OCR (UTRNet)
A module for creating stopword lists for any language, based on a set of documents.
Document Templater is a powerful tool for automated document generation. Streamline the process of creating standard documents, such as contracts, reports, and forms, using predefined templates. This repository contains the source code for Document Templater, allowing you to easily integrate this functionality into your projects and automate docs.
Pdf2xNet is a .NET library for seamless integration with Xpdf tools, enabling easy conversion of PDF documents to text, images, and HTML formats within your .NET applications.
Generative intent detection with Magick
Python tool for converting PDF files to text. Simplify your document processing tasks.
通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script
Minimize the time requirement of audit report analysis with a containerized file conversion and scraping system
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.
To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."