A Django-based service for rating content with features to detect and handle suspicious rating activities.
- Installation
- Environment Setup
- Running the Project
- Load Initial Data
- APIs
- Suspicious Rating Detection
- Running Tests
- Clone the repository:
git clone https://github.com/mhdi01/BitPin.git cd bitpin
- Create virtual environment
python -m venv env source env/bin/activate pip install -r requirements.txt
- To set the environment = testing run this command
export DJANGO_ENV=testing
Also the default DJANGO_ENV is testing
- To set the environment = production run this command:
export DJANGO_ENV=production
- Confiure postgresql in the /bitpin/settings/production.py as needed
- Make migrations and migrate them
python manage.py migrate
- Then run the project
python manage.py runserver
Load Data needed for different table by these commands: ``bash python manage.py load_users python manage.py load_contents python manage.py load_ratings
. Endpoint : /auth/login/
. Method : POST
. Request Body:
{
"username": test,
"password": test
}
. Endpoint: /api/ratings/
. Headers : {'Authorization': 'Bearer Token'}
. Method: POST
. Request Body:
{
"content_id": 1,
"rating": 4.5
}
. Endpoint : /api/contents/list/
. Method : GET
To prevent manipulation of content ratings by coordinated groups, a task is run to detect suspicious rating activities. This task checks for patterns such as a sudden influx of ratings within a short time frame and a significant deviation from the average rating.
- High Volume of Ratings: The number of ratings within a specified time window exceeds a threshold.
- Low/High Ratings: The ratings are significantly lower or higher than the content's average rating.
- Rating Difference: The difference between the average rating and the ratings given in the time window exceeds a threshold.
There is a celery task named <detect_suspicious_activity> , This task will be called whenever a rating is saved in the database. there are three important criteria for us here:
- Time Window
- Number of Rating
- Average of the rating All of them are considered in the <detect_suspicious_activity> Task. we have Different Threshold for high and low rating and even for number of ratings. These Threshold might be temporary and we might choose different threshold for each content when we have collected enought Data in our database.
When number of ratings in the 1 hour Time windows is passed from the Threshold which is 100 , We need to check more conditions. we will check recent rating more than High Rate Threshold and less than Low Rate Threshold , we Get the number of these ratings and Get The Percentage of their apperanace. if the percentage is more than SUSPICIOUS_RATING_PERCENTAGE And The Average Rating that are recently added has more than 1.5 average rating from the Actual Rating we have a suspicious Situation. It Means that the number of rating was increasing suddenly , The Average Rating was far away from the actual rating and we update a field named is_suspicious for these kind of rating and we do not consider them as valid rating.
I Tried to user behavioral analysis in this task. I focused on the data and criteria that we could have, and i found each of them important in this task. So all of the needed criteria are used with different threshold to make a better analysis about the users beahvior for rating. To complete these task we need to have more data for each content , then we can add some expectation about rating/hour and average_rating . so with these expectation we can analize users behavior more accurate.
To run test Run this command:
python manage.py test