This project performs scraping of quotes from the website Quotes to Scrape and stores the extracted information in a SQLite database. The script navigates through all pages of the site, extracts quotes, authors, and tags, and saves this data in a relational database with the corresponding tables.
-
main.py: Main file that coordinates the complete process flow. It checks if the database exists, and if not, creates it. Then, it performs the scraping of quotes and stores the data in the database.
-
create_db.py: Script that defines and creates the database structure (tables for authors, quotes, tags, and relationships between them).
-
db.py: Contains the necessary functions to interact with the database, including creating the connection, inserting authors, quotes, tags, and associating quotes with tags.
-
scraper.py: Responsible for performing the web scraping. It navigates through all available pages, extracts the necessary information (quotes, authors, tags), and organizes it for storage. Includes handling of common HTTP errors and avoids overloading the server through random pauses between requests.
-
get_quotes.py: Script to query and display quotes from a specific author from the database. It prompts the user for the author's name (with the option to use a default value) and displays all quotes associated with that author.
- Python 3.11+
-
Set up virtual environment:
- On macOS/Linux:
python3 -m venv venv source venv/bin/activate
- On Windows:
python -m venv venv .\venv\Scripts\activate
- On macOS/Linux:
-
Install dependencies:
pip install -r requirements.txt
-
Run scraping:
cd src python main.py
-
Get quotes by author:
python get_quotes.py
-
Deactivate virtual environment (optional):
deactivate