Machine Learning & Data Mining Coursework (5DATA002W)

University of Westminster

School of Computer Science & Engineering

Module Leader: Dr. V.S. Kontogiannis
Academic Year: 2021/22

Overview

This repository contains the coursework implementation for the module 5DATA002W: Machine Learning & Data Mining. The project focuses on clustering and regression techniques using real-world datasets and is implemented in the R programming environment.

Key Objectives

Clustering Analysis
- Perform k-means clustering on a wine dataset with pre-processing steps.
- Apply Principal Component Analysis (PCA) for dimensionality reduction.
- Compare clustering performance with and without PCA.
Energy Forecasting
- Use a Multi-Layer Perceptron (MLP) Neural Network to predict electricity consumption based on time-series data.
- Evaluate models using statistical indices such as RMSE, MAE, and MAPE.

Learning Outcomes

This coursework demonstrates:

Preparation of realistic datasets for machine learning and data mining.
Evaluation, validation, and optimization of models.
Effective communication of models and analyses to diverse audiences.

Coursework Breakdown

Clustering Part

Dataset: White wine dataset containing 4710 samples with chemical properties and quality ratings.
Tasks:
1. Pre-process the dataset (scaling, outlier removal).
2. Define the optimal number of clusters using various methods (e.g., Elbow, Gap statistics, Silhouette).
3. Perform k-means clustering for (k = 2, 3, 4) and evaluate results.
4. Use PCA to reduce dimensions and repeat k-means clustering.
5. Compare clustering results before and after PCA.
Deliverables:
- R scripts and outputs for k-means clustering.
- Confusion matrix and metrics: accuracy, precision, recall.
- PCA analysis and transformed dataset clustering.

Energy Forecasting Part

Dataset: Daily electricity consumption data for the University Building at 115 New Cavendish Street (2018-2019).
Tasks:
1. Implement MLP Neural Networks using autoregressive (AR) and NARX approaches.
2. Normalize input/output matrices.
3. Experiment with different network structures (hidden layers, nodes, activation functions).
4. Evaluate models using RMSE, MAE, and MAPE indices.
5. Visualize prediction results and compare efficiency of different models.
Deliverables:
- R scripts for MLP implementation.
- Performance comparison tables for various models.
- Graphical plots of predictions vs actual data.

Repository Structure

|-- datasets/
    |-- whitewine_v2.xls
    |-- UoW_load.xlsx
|-- src/
    |-- clustering_analysis.R
    |-- energy_forecasting.R
|-- results/
    |-- clustering_outputs/
    |-- forecasting_outputs/
|-- docs/
    |-- coursework_report.pdf
    |-- appendices/
        |-- full_code.R
|-- README.md

Getting Started

Prerequisites

Software: R version 4.0+ and RStudio.
R Libraries:
- ggplot2
- cluster
- factoextra
- NbClust
- neuralnet

Installation

Clone this repository:

git clone https://github.com/your-username/ml-datamining-coursework.git

Navigate to the repository directory:
```
cd ml-datamining-coursework
```
Install required R libraries using the provided script:
```
source("src/install_packages.R")
```

Usage

Clustering Analysis

Open src/clustering_analysis.R in RStudio.
Run the script to:
- Pre-process the white wine dataset.
- Perform k-means clustering.
- Apply PCA and re-run clustering.
View outputs in the results/clustering_outputs/ folder.

Energy Forecasting

Open src/energy_forecasting.R in RStudio.
Run the script to:
- Train and test MLP models using AR and NARX approaches.
- Generate statistical performance indices.
View outputs in the results/forecasting_outputs/ folder.

Evaluation

The coursework will be evaluated based on:

Clustering implementation and results.
MLP model development and testing.
Discussion and justification of methodological decisions.
Presentation of findings in the coursework report.

References

Relevant literature and resources are cited within the report and code comments.
Dataset references: Provided by University of Westminster Estates Planning & Services Department.

License

This project is for academic use only and is subject to University of Westminster assessment regulations.

Contact

For queries, please contact the module leader or teaching assistant via the University of Westminster Blackboard portal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Machine Learning & Data Mining Coursework (5DATA002W)

University of Westminster

Overview

Key Objectives

Learning Outcomes

Coursework Breakdown

Clustering Part

Energy Forecasting Part

Repository Structure

Getting Started

Prerequisites

Installation

Usage

Clustering Analysis

Energy Forecasting

Evaluation

References

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Machine Learning & Data Mining Coursework (5DATA002W)

University of Westminster

Overview

Key Objectives

Learning Outcomes

Coursework Breakdown

Clustering Part

Energy Forecasting Part

Repository Structure

Getting Started

Prerequisites

Installation

Usage

Clustering Analysis

Energy Forecasting

Evaluation

References

License

Contact