The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
-
Updated
May 20, 2024 - Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Extract files from any kind of container formats
[UNOFFICIAL MIRROR] A file archiver with a high compression ratio
A microservice for document conversion at scale
A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Research Data Management Platform (RDMP) is an open source application for the loading,linking,anonymisation and extraction of datasets stored in relational databases.
⚡️ Build quick LLM pipelines for AI applications
DocILE: Document Information Localization and Extraction Benchmark
Tools for whole slide image (WSI) processing. Especially for (pairwise) patch extraction, annotation parsing and data preparation for deep learning purposes.
An R package for multivariate signal extraction
Matlab integration of WinRAR allowing opening (compression) and creation (extraction) of RAR archive file types.
Extract your filaments from Spoolman to be compatible with SpoolmanDB 🎉
Line Chart Data Extraction: Official code for LineFormer - ICDAR23 Paper
Library for patching destination data with source data only if destination data remains valid after that
extract internal monitoring data from application logs for collection in a timeseries database
Dump Discord's cache and identify files
Go Library for Queuing and Extracting Archives: Rar, Zip, 7zip, Gz, Tar, Tgz, Bz2, Tbz2
Analysis over years for real estate across Canada
Add a description, image, and links to the extraction topic page so that developers can more easily learn about it.
To associate your repository with the extraction topic, visit your repo's landing page and select "manage topics."