mitiBot

A Graph based machine learning approach to bot mitigation systems.

Datasets

Just run setup.sh to download the training and testing datasets. The datasets get downloaded into the ./datasets folder.

Using the data files

b = Build(['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv'])

Just pass the file names, it will read the files from the ./datasets directory and load the data.

e2e mode

To perform both training then testing use the e2e flag.

python3 model.py --e2e

Configuration used in the e2e mode:

# Training dataset

b = Build(['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv'])
b.data = b.build_train_set(b.non_bot_tuples, b.bot_tuples)
b.preprocess()

train_p1()
train_p2()

# Testing dataset

t = Build(['50.csv', '51.csv'])
t.data = t.build_test_set(t.non_bot_tuples, t.bot_tuples, 50)
t.preprocess()

test()

Total time:

  Avg: ~45m

k-fold mode

Perform K-fold cross-validation on the 9 datasets using the --kfold flag.

We use:

datasets = ['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '50.csv', '51.csv', '52.csv', '53.csv']

In each iteration we use one of the datasets for testing and the rest for training.

$ python3 model.py --kfold

Takes about ~8 hours in total to complete! (Check logs)

In the end prints the average accuracy for Logistic Regression and Naive Bayes using DBSCAN in phase 1.

DBSCAN + LR	DBSCAN + NB
97.46%	97.29%

Training

You can train the model in 2 ways, as it has PHASE 1 (UNSUPERVISED) and PHASE 2 (SUPERVISED)

This will peform both the phases one by one.

python3 model.py --train

If you want to perform the 2 phases separately (given the feature vectors are already saved in f.json and fvecs.json)

python3 model.py --phase1

and

python3 model.py --phase2

Once trained, it creates the pickle files of the model and saves it in the saved folder which is then used for the testing.

NOTE:

You could directly used the saved feature vectors store in the JSON format in folder saved_train and directly train phase2 of the training process inorder to fasten the training process!

The above, weights saved are trained on the following data files: ['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv'] in case you want to modify you'll have to train phase1 first whose weights once trained will we saved in the /saved folder.

Testing

Using the command below will use the pre-trained classifier saved in the pickle file in the saved folder.

python3 model.py --test

Cluster size maps

Kmeans (n_clusters=2, random_state=0)	DBScan (eps=0.4, min_samples=4)

DBSCAN + Naive Bayes Classifer

Tested on the data files 50.csv and 51.csv.

Test run:

Test time:

  Avg: ~7m

DBSCAN + Logistic Regression Classifer

Tested on the data files 50.csv and 51.csv.

Test run:

Test time:

  Avg: ~6m

Experimenting

Using only Unsupervised Learning as our learning technique we get :

DBScan (eps=1.0, min_samples=4)

Testing the cluster on the data files 50.csv and 51.csv

Using various number of clusters for KMeans

Testing the cluster on the data files 50.csv and 51.csv

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
__pycache__		__pycache__
experiments		experiments
frontend		frontend
saved		saved
saved_train		saved_train
screenshots		screenshots
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
Sem 7 Project PPT.pdf		Sem 7 Project PPT.pdf
Sem 7 Project Report.pdf		Sem 7 Project Report.pdf
app.py		app.py
build.py		build.py
data-analysis.py		data-analysis.py
experiment.py		experiment.py
kfold-output.txt		kfold-output.txt
kfold.logs		kfold.logs
kmeans.pkl		kmeans.pkl
model.py		model.py
setup.sh		setup.sh
tester.py		tester.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mitiBot

Datasets

Using the data files

e2e mode

k-fold mode

Training

Testing

Cluster size maps

DBSCAN + Naive Bayes Classifer

DBSCAN + Logistic Regression Classifer

Experimenting

DBScan (eps=1.0, min_samples=4)

Using various number of clusters for KMeans

About

Releases

Packages

Languages

CodHeK/mitiBot

Folders and files

Latest commit

History

Repository files navigation

mitiBot

Datasets

Using the data files

e2e mode

k-fold mode

Training

Testing

Cluster size maps

DBSCAN + Naive Bayes Classifer

DBSCAN + Logistic Regression Classifer

Experimenting

DBScan (eps=1.0, min_samples=4)

Using various number of clusters for KMeans

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages