Web Scraping and Database Storage

Description

This project performs scraping of quotes from the website Quotes to Scrape and stores the extracted information in a SQLite database. The script navigates through all pages of the site, extracts quotes, authors, and tags, and saves this data in a relational database with the corresponding tables.

Project Files

main.py: Main file that coordinates the complete process flow. It checks if the database exists, and if not, creates it. Then, it performs the scraping of quotes and stores the data in the database.
create_db.py: Script that defines and creates the database structure (tables for authors, quotes, tags, and relationships between them).
db.py: Contains the necessary functions to interact with the database, including creating the connection, inserting authors, quotes, tags, and associating quotes with tags.
scraper.py: Responsible for performing the web scraping. It navigates through all available pages, extracts the necessary information (quotes, authors, tags), and organizes it for storage. Includes handling of common HTTP errors and avoids overloading the server through random pauses between requests.
get_quotes.py: Script to query and display quotes from a specific author from the database. It prompts the user for the author's name (with the option to use a default value) and displays all quotes associated with that author.

Database Schema

Requirements

Python 3.11+

Execution Instructions

Set up virtual environment:

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

On Windows:

python -m venv venv
.\venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Run scraping:
```
cd src
python main.py
```
Get quotes by author:
```
python get_quotes.py
```
Deactivate virtual environment (optional):
```
deactivate
```

Author

Juan De Luca

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
schema.png		schema.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping and Database Storage

Description

Project Files

Database Schema

Requirements

Execution Instructions

Author

About

Languages

delucajuan/scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping and Database Storage

Description

Project Files

Database Schema

Requirements

Execution Instructions

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages