scrapy-poet
is the web-poet Page Object pattern implementation for Scrapy.
scrapy-poet
allows to write spiders where extraction logic is separated from the crawling one.
With scrapy-poet
is possible to make a single spider that supports many sites with
different layouts.
Read the documentation for more information.
License is BSD 3-clause.
- Documentation: https://scrapy-poet.readthedocs.io
- Source code: https://github.com/scrapinghub/scrapy-poet
- Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues
pip install scrapy-poet
Requires Python 3.9+ and Scrapy >= 2.6.0.
Add the following inside Scrapy's settings.py
file:
DOWNLOADER_MIDDLEWARES = {
"scrapy_poet.InjectionMiddleware": 543,
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
"scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
"scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
Setup your local Python environment via:
- pip install -r requirements-dev.txt
- pre-commit install
Now everytime you perform a git commit, these tools will run against the staged files:
- black
- isort
- flake8
You can also directly invoke pre-commit run --all-files or tox -e linters to run them without performing a commit.