Welcome to my projects overview with short descriptions of every project. At the end of this file you find the technical requirements.
This notebook is about my first EDA project during the Data Science bootcamp at neue fische. The task of the project was to recommend suitable properties for a pre-selected client. The data set included information on the number of rooms and bathrooms, year of construction, size of the house and property, and budget. Each of the bootcamp participants could choose one customer before starting with the EDA, who was already presented with his/her budget and ideas a.o. regarding size, number of bathrooms and rooms.
Notebook: Jupyter Notebook
Images: Images, plots and maps.
Dataset: I used the King County House Sales dataset. Here, the focus is on EDA though it was required to demonstrate an entire Data Science Lifecycle using linear regression. The task will be to perform an extensive EDA and to train a explanatory linear regression model. The task is not only to explain the data but also to evaluate how well the model is fitting the data.
Stakeholder Presentation: PDF file
This notebook is about my second project during the Data Science bootcamp at neue fische.
The task of the project was the prediction of the air quality in the Ugandan capital Kampala - e.g. relevant for local TV stations - and thus the early warning of harmful fine dust concentrations. The data set included weather and wind data as well as measurement data from the external provider AirQo. The project included EDA and data visualization, a "small time series", the selection of different machine learning models and the presentation of results at the end of the project.
In contrast to the 1st project this was a group work with partly also pair programming and error analysis, as it also belongs to the normal working day of a Data Analyst and Data Scientist. The bootcamp participants could choose from given projects and get together in groups.
Notebook: Jupyter Notebook
Images: Images, plots and maps.
Dataset: The dataset is from a challenge which was created on Zindi, the data science competition platform with the mission of building the data science ecosystem in Africa. The objective of this challenge is to accurately forecast air quality (as measured by PM2.5 µ/m3) for each hour of the coming 25 hours across five locations in Kampala Uganda. Forecasts will be based on the past 5 days of hourly air quality measurements at each site. Zindi provided .csv files with train and test data but also meta data with location details. The meta data was excluded in the project.
Stakeholder Presentation: PDF file
pyenv python==3.9.4 Setup
For this purpose you use following commands:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt