Kafka-based components for Scrapy. There are 2 components:
- A custom
Spider
that waits for URLs to crawl via a Kafka topic. When there are no more messages to read for the topic, theSpider
just stays idle. - A custom
ItemPipeline
component that stores a JSON-ifiedItem
back into another Kafka topic.
Please see the example directory for how to use this.
Contributors to scrapy-kafka, listed alphabetically: