PSavvateev / JobScrapingApp_Indeed.com Public

Notifications You must be signed in to change notification settings
Fork 5
Star 7

Web scraper to get information about posted jobs in the US from Indeed.com

7 stars 5 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_dumps		data_dumps
README.md		README.md
app.py		app.py
dumping.py		dumping.py
indeed_com_scraper.py		indeed_com_scraper.py
logger.py		logger.py
main.py		main.py
parameters.py		parameters.py
requirements.txt		requirements.txt

Repository files navigation

Indeed.com Jobs Scraping App

Overview

Program to scrape and store posted jobs in the United States from www.indeed.com

Gets the next information from the website:

original id generated by Indeed;
job title (job_title)
posting date (job_date)
location (job_loc)
short description (job_summary)
salary (or salary range) in a list format (job_salary)
url of the job (job_url)
company name (company_name)

Getting Started

Install all required packages from requirements.txt.
$ pip install -r requirements.txt

How to use

Assign search parameters in the parameters.py:

positions should be a list of strings with all positions names or key-words for search. Even if there is one word, keep it in the list: positions = ["auditor"]

Run the app.py
$ python3 app.py

Functionality:

Scraping jobs by the key parameters: search key-words
Cleaning / formatting data.
Each scraping session saves the results as a csv data dump to the data_dumps/ folder.
Each step of the scraping is logged into the log.txt with printing the outcomes in the console.

Architecture:

app.py - enter point
main.py - the main workflow of the program
indeed_com_scraper.py - scraping functionality module
dumping.py - data cleaning / formatting module + saving data dumps
logger.py - logging functionality
parameters.py - keeping scraping parameters in separate module for easy access.

Additional:

db_scheme.py or db_scheme.sql for initial database setup.
requirements.txt required python packages.

Requirements:

python 3

Packages:

pandas 1.4.2
requests 2.28.0
beautifulsoup4 4.11.1

About

Web scraper to get information about posted jobs in the US from Indeed.com

python web-scraper pandas web-scraping bs4 indeed beautifulsoup4 indeed-scraping

Report repository

Languages

Python 100.0%