There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.
For this project, I was interestested in using Stack Overflow data from 2020 to better understand:
- What is the educational background of data scientists compared to different developers?
- What are the most used programming languages by a data scientist?
- How is the gender distribution among data Scientists, compared to other developer types?
- Who earns more, data scientists or other software developers?
- Are data Scientists more satisfied with their job compared to other developers?
There are 3 notebooks available here to showcase work related to the above questions. Each of the notebooks is exploratory in searching through the data pertaining to the questions showcased by the notebook title. Markdown cells were used to assist in walking through the thought process for individual steps.
The main findings of the code can be found at the post available here.
Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the Stack Overflow link available here. Otherwise, feel free to use the code here as you would like!