0.1.0

vdusek released this 09 Jul 06:49

· 420 commits to master since this release

b13b89a

Crawlee is a web scraping and browser automation library.
Launching Crawlee for Python blog post

Features

Why is Crawlee the preferred choice for web scraping and crawling?

Why use Crawlee instead of just a random HTTP library with an HTML parser?

Unified interface for HTTP & headless browser crawling.
Automatic parallel crawling based on available system resources.
Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
Automatic retries on errors or when you’re getting blocked.
Integrated proxy rotation and session management.
Configurable request routing - direct URLs to the appropriate handlers.
Persistent queue for URLs to crawl.
Pluggable storage of both tabular data and files.
Robust error handling.

Why to use Crawlee rather than Scrapy?

Crawlee has out-of-the-box support for headless browser crawling (Playwright).
Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code.
Complete type hint coverage.
Based on standard Asyncio.

Assets 4