Skip to content
Change the repository type filter

All

    Repositories list

    • estela

      Public
      estela, an elastic web scraping cluster 🕸
      TypeScript
      MIT License
      14000Updated Jan 22, 2025Jan 22, 2025
    • Proxy server to bypass Cloudflare protection
      Python
      MIT License
      731000Updated Jan 21, 2025Jan 21, 2025
    • More routines for operating on iterables, beyond itertools
      Python
      MIT License
      292000Updated Jan 21, 2025Jan 21, 2025
    • Python
      27000Updated Jan 21, 2025Jan 21, 2025
    • Scrapy+Splash for JavaScript integration
      Python
      BSD 3-Clause "New" or "Revised" License
      455000Updated Jan 21, 2025Jan 21, 2025
    • A special service that runs puputeer instances.
      JavaScript
      BSD 3-Clause "New" or "Revised" License
      5000Updated Jan 21, 2025Jan 21, 2025
    • net/http.Client like HTTP Client with options to select specific client TLS Fingerprints to use for requests.
      Go
      BSD 4-Clause "Original" or "Old" License
      173000Updated Jan 21, 2025Jan 21, 2025
    • w3lib

      Public
      Python library of web-related functions
      Python
      BSD 3-Clause "New" or "Revised" License
      106000Updated Jan 21, 2025Jan 21, 2025
    • iplist

      Public
      IP Address Collection and Management Service with multiple output formats: mikrotik, json, text, ipset, nfset, clashx, keenetic, switchy, amnezia
      PHP
      MIT License
      13000Updated Jan 21, 2025Jan 21, 2025
    • Python client for Zyte Data API
      Python
      BSD 3-Clause "New" or "Revised" License
      5000Updated Jan 20, 2025Jan 20, 2025
    • Zyte Data API integration for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      20000Updated Jan 20, 2025Jan 20, 2025
    • Luminati HTTP/HTTPS Proxy manager
      JavaScript
      194000Updated Jan 20, 2025Jan 20, 2025
    • A list of most common User Agent used on Internet.
      JavaScript
      MIT License
      16000Updated Jan 20, 2025Jan 20, 2025
    • creepjs

      Public
      Creepy device and browser fingerprinting
      TypeScript
      MIT License
      204000Updated Jan 18, 2025Jan 18, 2025
    • lexbor

      Public
      Lexbor is development of an open source HTML Renderer library. http://lexbor.com
      C
      Apache License 2.0
      106000Updated Jan 18, 2025Jan 18, 2025
    • hero

      Public
      The web browser that’s nearly impossible for bot blockers to block
      TypeScript
      MIT License
      47000Updated Jan 18, 2025Jan 18, 2025
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28000Updated Jan 17, 2025Jan 17, 2025
    • A python based HTML to text conversion library, command line client and Web service.
      Python
      Apache License 2.0
      29000Updated Jan 16, 2025Jan 16, 2025
    • TLS implementation in pure python, focused on interoperability testing
      Python
      Other
      81000Updated Jan 16, 2025Jan 16, 2025
    • Web data extraction tool implemented as chrome extension
      JavaScript
      GNU Lesser General Public License v3.0
      70000Updated Jan 16, 2025Jan 16, 2025
    • Contains the common item definitions used in Zyte.
      Python
      BSD 3-Clause "New" or "Revised" License
      8000Updated Jan 16, 2025Jan 16, 2025
    • fast python port of arc90's readability tool, updated to match latest readability.js!
      Python
      Apache License 2.0
      459000Updated Jan 16, 2025Jan 16, 2025
    • TrackMe

      Public
      Go
      GNU General Public License v3.0
      40000Updated Jan 15, 2025Jan 15, 2025
    • Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
      JavaScript
      Apache License 2.0
      1450010Updated Jan 15, 2025Jan 15, 2025
    • Python
      5000Updated Jan 14, 2025Jan 14, 2025
    • Spider templates for automatic crawlers.
      Python
      BSD 3-Clause "New" or "Revised" License
      4000Updated Jan 13, 2025Jan 13, 2025
    • normality

      Public
      A tiny library for Python text normalisation. Useful for ad-hoc text processing.
      Python
      MIT License
      18000Updated Jan 9, 2025Jan 9, 2025
    • calamus

      Public
      A JSON-LD Serialization Libary for Python
      Python
      Apache License 2.0
      12000Updated Jan 8, 2025Jan 8, 2025
    • lol-html

      Public
      Low output latency streaming HTML parser/rewriter with CSS selector-based API
      Rust
      BSD 3-Clause "New" or "Revised" License
      85000Updated Jan 6, 2025Jan 6, 2025
    • Python binding to Modest engine (fast HTML5 parser with CSS selectors).
      Cython
      MIT License
      71000Updated Jan 5, 2025Jan 5, 2025