news-please - an integrated web crawler and information extractor for news that just works
-
Updated
May 15, 2024 - Python
news-please - an integrated web crawler and information extractor for news that just works
Universal Extractor 2 is a tool to extract files from any type of archive or installer.
Free Zip / Unzip software and Rar file extractor. Cross-platform file and archive manager. Features volume spanning, compression, authenticated encryption. Supports 7Z, 7-Zip sfx, ACE, ARJ, Brotli, BZ2, CAB, CHM, CPIO, DEB, GZ, ISO, JAR, LHA/LZH, NSIS, OOo, PAQ/LPAQ, PEA, QUAD, RAR, RPM, split, TAR, Z, ZIP, ZIPX, Zstandard.
SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...
Wiktionary dump file parser and multilingual data extractor
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
A self-hosted, drag-and-drop & nosql file conversion server & share tool that supports 88 file formats in 13 languages.
A framework for creating semi-automatic web content extractors
Collect / retrieve Office365, AzureAD and DLP audit logs and output to PRTG, Azure Log Analytics Workspace, SQL, Graylog, Fluentd, and/or file output.
Babel plugin that statically extracts i18next and react-i18next translation keys.
A fast tool to fetch URLs from HTML attributes by crawl-in.
Add a description, image, and links to the extractor topic page so that developers can more easily learn about it.
To associate your repository with the extractor topic, visit your repo's landing page and select "manage topics."