Skip to content

leo-cavalcante/WTTJ-webscraping

Repository files navigation

Ironhack Logo

Web Scraping "Welcome to the Jungle" website by Leo

Leonardo Cavalcante Araújo

Data Analytics Full-Time FEB2021, Paris & February 22nd 2021

Content

Welcome to the Jungle Logo

Project Description

Amidst the difficult context to find job during Covid-19 pandemic and because of all of the frustration that Job Search can provide, I decided to use a Data-Driven approach to help me structure my Job Search Strategy.

Therefore, I decided to scrap my preferred Job Search Website : "Welcome to the Jungle" (WTTJ).

The data scraped would be stored in 2 different databases:

  1. Jobs: one line per job position open and all the details related to it.
  2. Companies: one line per company offering a position in Data, with all the relevant details from this company page in the WTTJ website.

Objective

Obtain exploitable data that could be used for analysis to help me guide in my Job search, by prioritizing the job references that corresponds the most to me and that provides me the most chance of succeeding.

Workflow

First, it was necessary to get Jobs data and clean it from the Jobs Search:

  1. Web scraping "Welcome to the Jungle - Data Analyst Jobs " and storing it to Jobs dataframe.
  2. Cleaning the Jobs dataframe.
  3. Export the Jobs table to SQL.
  4. Save data in a Pickle to avoid scraping everytime the session in Jupyter Notebook is restarted.

Secondly, it was necessary to get Companies data and clean it for every company in the previous scraping:

  1. Web scraping the Jobs dataframe looping through each value of "https:/welcometothejungle.com/en/companies/" + Jobs["organization_slug]. Then storing all data collected in the Companies datafram.
  2. Cleaning the Companies dataframe.
  3. Export the Companies data to SQL.
  4. Save data in a Pickle to avoid scraping everytime the session in Jupyter Notebook is restarted.

Third, it was necessary to formalize everything in functions so things could run smoothly by calling less than 10 functions.

Lastly, it was time to analyze the results using Python, Pandas and data vizualization tools.

Organization

PS.: individual project.

Links

Here you may find the relevant links for my repository, my main code and presentation slides.

Jobs_and_Companies_vFinal
Another_way_of_scraping_Companies
GitHub Repository
Final Presentation - Google Slides

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published