This project serves as an example of Python Scrapy project. It scrapes book data from books.toscrape.com.
To use this scraper, you need to install the Apify CLI. Follow the instructions here.
Make sure you have Python installed. If not, download it here. Any version supported by Apify SDK and Scrapy should be fine.
Additionally, install Virtualenv using the following command:
pip install virtualenv
Create a Python virtual environment by running:
python3.12 -m virtualenv .venv
Activate the virtual environment:
source .venv/bin/activate
Install Python dependencies:
pip install -r requirements.txt -r requirements-dev.txt
The project is still runnable as a Scrapy project. Execute the following command:
scrapy crawl book_spider -o books.json
Run the scraper as an Apify Actor using:
apify run --purge
You will need to provide your Apify API Token to complete this action.
apify login
This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.
apify push
To learn more about Apify and Actors, take a look at the following resources: