This project allows you to extract data from the Ozon website in json format using the Scrapy, Selenium libraries.
Download the program to your computer:
git clone https://github.com/murreds/ParserOzon.git
And install the required libraries.
python3 -m pip install -r requirements.txt
Note: Firefox browser installed is required.
- Extracting data from the first pages and by category of card products.
- Obtaining data on the characteristics of the each products by category.
First, go to the project directory and enter the command:
cd ./ozonscraper
To get data about each product card from the first pages, enter:
scrapy crawl cardproduct -a category={category} [-a page=page] [-a mode=full]
To get product characteristics data, enter:
scrapy crawl chproduct -a category={category}
The category must be one of 'smartphone, tv, tablets, laptop'.
Data is stored in a directory: /path/to/project/directory/ozonscraper/data
Important: first you need to use the first command then the second.
scrapy crawl cardproduct -a category=laptop -a page=2 && scrapy crawl chproduct -a category=laptop