INTRUSION DETECTION PREDICTION

Predicting the type of Cyber attack based on Network Packets (Intrusion) using Machine Learning models

🌟 EXPERIENCE HERE 🌟

https://huggingface.co/spaces/raghavtwenty/ids-prediction

PROTOTYPE VIDEO

video.mov

HOW TO EXECUTE

Terminal

git clone https://github.com/raghavtwenty/ids-prediction.git

cd ids-prediction/

pip install -r requirements.txt

cd gradio/

gradio ids_ml_gradio.py

Web Browser

http://127.0.0.1:7860/

PROBLEM

A firewall alone doesn’t provide adequate protection against modern cyber threats. Malware and other malicious content are often delivered using legitimate types of traffic, such as email, or web traffic. In order to solve this problem we need to step in further and examine the network traffic, this is where the Intrusion Detection System plays a major role.

WHAT IS IDS?

An Intrusion Detection System (IDS) is a network security technology originally built for detecting vulnerability exploits against a target application or computer. The IDS is a listen-only device. The IDS monitors traffic and reports results to an administrator.

WORKING OF IDS

Typical intrusion detection systems look for known attack, Signature-based IDS monitors inbound network traffic, looking for specific patterns and sequences that match known attack signatures or abnormal deviations from set norms. These anomalous patterns in the network traffic are then sent up in the stack for further investigation at the protocol and application layers of the OSI (Open Systems Interconnection) model.

An IDS is placed out of the real-time communication band (a path between the information sender and receiver) within your network infrastructure to work as a detection system. It instead leverages a SPAN or TAP port for network monitoring and analyzes a copy of inline network packets (fetched through port mirroring) to make sure the streaming traffic is not malicious or spoofed in any way. The IDS efficiently detects infected elements with the potential to impact your overall network performance, such as malformed information packets, DNS poisonings, port scans and more.
IDS is either installed on your network or a client system (host-based IDS)

OBJECTIVE

To predict the type of cyber attack that could have possibly occurred in a network. Having the past network logs from a server using machine learning models, We have to choose the best suitable model for the prediction. For the new input classify the type of cyber attack that has a higher chance of occurence.

END USERS

Security operations center (SOC) analysts.
Incident responders.
Cyber Security analysts.
A person with adequate knowledge on networking can experiment this.

OVERVIEW OF INITIAL DATASET

This dataset contains 5000 records of features extracted from Network Port Statistics to protect modern-day computer networks from cyber attacks and are thereby classified into 5 classes.

Switch ID - The switch through which the network flow passed.
Port Number - The switch port through which the flow passed.
Received Packets - Number of packets received by the port.
Received Bytes - Number of bytes received by the port.
Sent Bytes - Number of bytes sent by the port.
Sent Packets - Number of packets sent by the port.
Port alive Duration (S) - The time port has been alive in seconds.
Packets Rx Dropped - Number of packets dropped by the receiver.
Packets Tx Dropped - Number of packets dropped by the sender.
Packets Rx Errors - Number of transmit errors.
Delta Received Packets - Number of packets received by the port.
Delta Received Bytes - Number of bytes received by the port.
Delta Sent Bytes - Number of bytes sent by the port.
Delta Sent Packets - Number of packets sent by the port.
Delta Port alive Duration (S) - The time port has been alive in seconds.
Delta Packets Rx Dropped - Number of packets dropped by the receiver.
Delta Packets Tx Dropped - Number of packets dropped by the sender.
Delta Packets Rx Errors - Number of receive errors.
Delta Packets Tx Errors - Number of transmit errors.
Connection Point - Network connection point expressed as a pair of the network element identifier and port number.
Total Load/Rate - Obtain the current observed total load/rate (in bytes/s) on a link.
Total Load/Latest - Obtain the latest total load bytes counter viewed on that link.
Load/Rate - Obtain the current observed unknown-sized load/rate (in bytes/s) on a link.
Unknown Load/Latest - Obtain the latest unknown-sized load bytes counter viewed on that link.
Latest bytes counter - Latest bytes counted in the switch port.
Checkis_valit - Indicates whether this load was built on valid values.
vpn_keyTable ID - Returns the Table ID values.
Active Flow Entries - Returns the number of active flow entries in this table.
Packets Looked Up - Returns the number of packets looked up in the table.
Packets Matched - Returns the number of packets that successfully matched in the table.
Max Size - Returns the maximum size of this table.

TARGET --- Label - Label types for intrusions - Normal:0, Blackhole:1, TCP-SYN:2, PortScan:3, Diversion:4

PREPROCESSING (Techniques)

Exploratory Data Analysis (EDA)
Cleaning
Sampling
Scaling
Visualization

PREPROCESSING (Visualization)

Heatmap before scaling the columns
Heatmap after scaling the columns
Heatmap after cleaning

MACHINE LEARNING MODELS USED

Naive Bayes
Random Forest
XG Boost

MODEL BUILDING TECHNIQUES USED

Cross Validation
Hyper Parameter Tuning

EVALUATION METRICS USED

Accuracy
Confusion Matrix
Precision
Recall

RESULTS (Confusion Matrix)

Navie Bayes
Random Forest
XG Boost

PERFORMANCE

Navie Bayes
Random Forest
XG Boost

INFERENCE

Best hyperparameters for XG Boost
gamma: 0
learning_rate: 0.1
max_depth: 7
min_child_weight: 1
subsample: 0.9

After preprocessing the dataset, Naive Bayes algorithm, Random Forest algorithm, XG Boost had been used for classifying the test dataset. After multiple trials The XG Boost classified the test dataset and resulted in an average of 94 % accuracy, While other algorithms resulted in less accuracy. Since the XG Boost algorithm performed better than other models and because of it's high scalability, robustness and stable performance, It is chosen for the deployment process.

OUTPUTS

Home Screen
Predefined Examples
Prediction Label: NORMAL
Prediction Label: BLACKHOLE Attack
Prediction Label: TCP-SYN Attack
Prediction Label: PORTSCAN Attack
Prediction Label: DIVERSION Attack

FUTURE SCOPE

Companies realize the limitations of a standard IDS. Some are reacting to build bigger and better products for their customers. New IDS solutions may come with a lower administrative burden. They may rely on machine learning to lower the risk of false positives, So staff have less to examine every day and vendors may update them simultaneously, So the system always has access to up-to-date information in real time.

END OF README

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
datasets		datasets
documentation		documentation
gradio		gradio
hugging_face_deployments		hugging_face_deployments
models		models
outputs		outputs
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

raghavtwenty/ids-prediction

Folders and files

Latest commit

History

Repository files navigation