Skip to content

Analyzer of technology statistics based on job descriptions.

Notifications You must be signed in to change notification settings

AndriyKy/tech-trend-stat

Repository files navigation

TechTrendStat

Technology Trends Statistician is your go-to tool for real-time insights into the ever-changing technology landscape, combining web scraping and data analysis to track and analyze the latest trends in development job descriptions.

Features

Linux Installation

NOTE: Python version >3.8 is required.

Clone the repository:

git clone https://github.com/AndriyKy/tech-trend-stat.git
cd tech-trend-stat

Create a virtual environment, install dependencies and set the PYTHONPATH environment variable:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export PYTHONPATH="$(pwd):$(pwd)/techtrendanalysis:$(pwd)/techtrendanalysis"

Create a copy of the file .env.copy -> .env and set the appropriate variables (in the case of working with MongoDB).

Getting Started

MongoDB

If you decide to work with MongoDB, here is a tutorial on how to install it locally in a Docker container.

Here is also the instruction on how to create a cluster on the cloud.

Once the database has been successfully installed, you just need to run the following command to scrape the vacancies using the scrapy spider along with the Mongo pipeline:

scrapy crawl djinni -a categories="Python"

You can substitute "Python" for any other category, or a stack of categories separated by a " | ". See available specializations (categories) on the Djinni website.

To extract statistics from job descriptions, run the wrangler file, passing the desired category name.

CSV File

If you can't install MongoDB, just run the crawler script. It will scrape jobs in the category you passed and save them to the appropriate CSV file. After that, it will pull job descriptions from the generated file, extract the technology stack and write it to another CSV file.

Data Analysis

To see the visualization of the extracted statistics, please, head over to the analysis file and follow the instructions given there.

Here is an example of a visualized result Python technology statistics