USC DSCI 550 Assignment 3 - Spring 2021
-
Updated
May 1, 2021 - Jupyter Notebook
USC DSCI 550 Assignment 3 - Spring 2021
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
Веб-приложение, которое предсказывает тип документа по его содержанию 📝
This project showcase the application of LDA Topic Modelling and KMeans Clustering for extracting information from the PDF documents
Extracting information from PDF files.
The Distributed Release Audit Tool (DRAT) for code analysis and verification.
tika-python as Debian GNU/Linux and Ubuntu Linux package
python module for extracting texts from URL and PDF
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
Interactive Image similarity and Visual Search and Retrieval application
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Add a description, image, and links to the tika-python topic page so that developers can more easily learn about it.
To associate your repository with the tika-python topic, visit your repo's landing page and select "manage topics."