TechTrendStat

Technology Trends Statistician is your go-to tool for real-time insights into the ever-changing technology landscape, combining web scraping and data analysis to track and analyze the latest trends in development job descriptions.

Features

Scraping jobs from Djinni by several specialization categories (e.g. Python, Java, DevOps, etc.).
Mongo client singleton.
Ability to work with local and cloud MongoDB, as well as with regular CSV files.
Using Pydantic models instead of standard items for better data validation.
Database templates to simplify connection to MongoDB.
Two pipelines (Mongo and CSV).
CSV pipeline that covers the entire ETL process.
Data Wrangling. Clean up text and extract technology statistics.

Linux Installation

NOTE: Python version >3.8 is required.

Clone the repository:

git clone https://github.com/AndriyKy/tech-trend-stat.git
cd tech-trend-stat

Create a virtual environment, install dependencies and set the PYTHONPATH environment variable:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export PYTHONPATH="$(pwd):$(pwd)/techtrendanalysis:$(pwd)/techtrendanalysis"

Create a copy of the file .env.copy -> .env and set the appropriate variables (in the case of working with MongoDB).

Getting Started

MongoDB

If you decide to work with MongoDB, here is a tutorial on how to install it locally in a Docker container.

Here is also the instruction on how to create a cluster on the cloud.

Once the database has been successfully installed, you just need to run the following command to scrape the vacancies using the scrapy spider along with the Mongo pipeline:

scrapy crawl djinni -a categories="Python"

You can substitute "Python" for any other category, or a stack of categories separated by a " | ". See available specializations (categories) on the Djinni website.

To extract statistics from job descriptions, run the wrangler file, passing the desired category name.

CSV File

If you can't install MongoDB, just run the crawler script. It will scrape jobs in the category you passed and save them to the appropriate CSV file. After that, it will pull job descriptions from the generated file, extract the technology stack and write it to another CSV file.

Data Analysis

To see the visualization of the extracted statistics, please, head over to the analysis file and follow the instructions given there.

Here is an example of a visualized result

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.vscode		.vscode
database		database
techtrendanalysis		techtrendanalysis
techtrendscrape		techtrendscrape
.env.copy		.env.copy
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TechTrendStat

Features

Linux Installation

Getting Started

MongoDB

CSV File

Data Analysis

About

Languages

AndriyKy/tech-trend-stat

Folders and files

Latest commit

History

Repository files navigation

TechTrendStat

Features

Linux Installation

Getting Started

MongoDB

CSV File

Data Analysis

About

Topics

Resources

Stars

Watchers

Forks

Languages