Web Scraping "Welcome to the Jungle" website by Leo

Leonardo Cavalcante Araújo

Data Analytics Full-Time FEB2021, Paris & February 22nd 2021

Content

Project Description
Objective
Workflow
Organization
Links

Project Description

Amidst the difficult context to find job during Covid-19 pandemic and because of all of the frustration that Job Search can provide, I decided to use a Data-Driven approach to help me structure my Job Search Strategy.

Therefore, I decided to scrap my preferred Job Search Website : "Welcome to the Jungle" (WTTJ).

The data scraped would be stored in 2 different databases:

Jobs: one line per job position open and all the details related to it.
Companies: one line per company offering a position in Data, with all the relevant details from this company page in the WTTJ website.

Objective

Obtain exploitable data that could be used for analysis to help me guide in my Job search, by prioritizing the job references that corresponds the most to me and that provides me the most chance of succeeding.

Workflow

First, it was necessary to get Jobs data and clean it from the Jobs Search:

Web scraping "Welcome to the Jungle - Data Analyst Jobs " and storing it to Jobs dataframe.
Cleaning the Jobs dataframe.
Export the Jobs table to SQL.
Save data in a Pickle to avoid scraping everytime the session in Jupyter Notebook is restarted.

Secondly, it was necessary to get Companies data and clean it for every company in the previous scraping:

Web scraping the Jobs dataframe looping through each value of "https:/welcometothejungle.com/en/companies/" + Jobs["organization_slug]. Then storing all data collected in the Companies datafram.
Cleaning the Companies dataframe.
Export the Companies data to SQL.
Save data in a Pickle to avoid scraping everytime the session in Jupyter Notebook is restarted.

Third, it was necessary to formalize everything in functions so things could run smoothly by calling less than 10 functions.

Lastly, it was time to analyze the results using Python, Pandas and data vizualization tools.

Organization

Repository "https://github.com/leo-cavalcante/data-ft-par-labs/blob/main/Projects/Week-3/" : you may find my 2 main Python codes stored as a Jupyter Notebook format, also the pickles stored for each database and the Jobs.csv.

PS.: individual project.

Links

Here you may find the relevant links for my repository, my main code and presentation slides.

Jobs_and_Companies_vFinal
Another_way_of_scraping_Companies
GitHub Repository
Final Presentation - Google Slides

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Exported_data		Exported_data
.DS_Store		.DS_Store
Another_way_of_scraping_Companies.ipynb		Another_way_of_scraping_Companies.ipynb
Jobs_and_Companies_vFinal.ipynb		Jobs_and_Companies_vFinal.ipynb
README.md		README.md
README_Project_Instructions.md		README_Project_Instructions.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping "Welcome to the Jungle" website by Leo

Content

Project Description

Objective

Workflow

Organization

Links

About

Releases

Packages

Languages

leo-cavalcante/WTTJ-webscraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping "Welcome to the Jungle" website by Leo

Content

Project Description

Objective

Workflow

Organization

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages