Skip to content

markolalovic/pedestrian-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pedestrian detection for autonomous driving applications

This repository contains the source code of a research project to develop a machine learning model that detects pedestrians in images with a focus on autonomous driving applications.

Check the testing video to see the models detections.

Overview

The main model is a modified PyTorch implementation of Faster R-CNN with ResNet50 Backbone pre-trained on COCO dataset. The architecture was modified for pedestrian detection and fine-tuned on annotations from CityPersons dataset and images from CityScapes dataset. See the technical overview in Figure below.

Technical overview

How-to

You can download the fitted model from here: final-model.pt

The code was written in Python and tested on Ubuntu 20.04.1 LTS using Python 3.8.10 and PyTorch library version 1.8.0. To install all the dependencies run:

pip install --upgrade pip
pip install torch==1.8.0
pip install torchvision
pip install pycocotools

Import the modules of the main model:

import os
import sys

module_path = os.path.abspath(os.path.join('../src/main-model/'))
if module_path not in sys.path:
    sys.path.append(module_path)

import fasterutils
import fasterrcnn

Import other modules to run the model:

import torch
import torch.utils.data
from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor(),])
from PIL import Image

Then, load the model:

model = torch.load(
    f = '/home/marko/data/models/final-model-fit.pt',
    map_location = torch.device('cpu'));

As an example, let's pick an image from Hamburg (in ../figures/ directory):

img = Image.open('../figures/hamburg.png')

model.eval()
img_tr = torch.unsqueeze(transform(img), 0)
with torch.no_grad():
    pred = model(img_tr)[0]

In the returned object pred you can find predicted bounding boxes around objects that were detected as pedestrians by the model. For each bounding box there is a corresponding score, that is the estimated likelihood of pedestrian in the bounding box:

pred
{'boxes': array([[ 807.6329 ,  343.27094,  890.54095,  541.85565] ...
 'scores': array([0.99913067, ... , 0.05334361], dtype=float32)}

Then, you can use show() function from our fasterutils module on transformed results:

bboxes, labels, scores = pred['boxes'], pred['labels'], pred['scores']
bboxes, labels, scores = [arr.cpu().detach().numpy() for arr in [bboxes, labels, scores]]
fasterutils.show(img, bboxes, scores)

To get the image below:

Hamburg example

Evaluation results

The Citypersons validation set of 500 images and corresponding annotations were used for testing. In particular, 441 images where there is at least on person in the image, from cities: Munster, Frankfurt, Lindau.

Below are the results for AP and AR measures:

Measure IoU Range Area MaxDets Value
Average Precision (AP) 0.50:0.95 all 100 0.461
Average Precision (AP) 0.50 all 100 0.754
Average Precision (AP) 0.75 all 100 0.492
Average Precision (AP) 0.50:0.95 small 100 0.087
Average Precision (AP) 0.50:0.95 medium 100 0.392
Average Precision (AP) 0.50:0.95 large 100 0.616
Average Recall (AR) 0.50:0.95 all 1 0.095
Average Recall (AR) 0.50:0.95 all 10 0.417
Average Recall (AR) 0.50:0.95 all 100 0.550
Average Recall (AR) 0.50:0.95 small 100 0.340
Average Recall (AR) 0.50:0.95 medium 100 0.506
Average Recall (AR) 0.50:0.95 large 100 0.663

And using log-average miss rate (MR) measures, for comparison using CityPersons Benchmark Table:

Method External training data MR (Reasonable) MR (Reasonable_small) MR (Reasonable_occ=heavy) MR (All)
APD-pretrain 7.31% 10.81% 28.07% 32.71%
Pedestron 7.69% 9.16% 27.08% 28.33%
APD × 8.27% 11.03% 35.45% 35.65%
YT-PedDet × 8.41% 10.60% 37.88% 37.22%
STNet × 8.92% 11.13% 34.31% 29.54%
MGAN × 9.29% 11.38% 40.97% 38.86%
DVRNet × 11.17% 15.62% 42.52% 40.99%
HBA-RCNN × 11.26% 15.68% 39.54% 38.77%
OR-CNN × 11.32% 14.19% 51.43% 40.19%
AdaptiveNMS × 11.40% 13.64% 46.99% 38.89%
Repultion Loss × 11.48% 15.67% 52.59% 39.17%
Cascade MS-CNN × 11.62% 13.64% 47.14% 37.63%
Adapted FasterRCNN × 12.97% 37.24% 50.47% 43.86%
MS-CNN × 13.32% 15.86% 51.88% 39.94%
This Model × 25.33% 50.96% 63.56% 41.96%

Extension

In the figure below are 3 consecutive frames from the testing video where I run the model on all frames of Pedestrian Challenge video.

Say we are detecting pedestrians at time t represented by the first image in the figure below. At time t, because of occlusions, brightness variations or for some other reasons, the model can fail to detect a pedestrian. But perhaps the model detected the same person in one or more of the previous frames. The idea is to use this information, which has accumulated over time (at frames t-1, t-2, ...) in order to achieve better detection at time t.

My proposed solution is a bit similar to a moving-average idea. For a chosen number of time steps, for simplicity say 2 we do the following. We first solve the classic matching problem of computer vision and match region proposals at time t to the region proposals at previous 2 time frames. This way, we know which detecting bounding boxes should represent the same person through time. Second, we take a mean of corresponding scores (estimated likelihoods that the region contains a person). And if this mean is greater that some threshold, we show the bounding box at time t.

I tried different variations of this idea with different number of time steps with interesting results. In the figure below, we can see the car approaching a pedestrian on the left. Here I colored the proposed regions with red for current frame, with green the previous frame and blue the previous frame at time t-1 for each time t. For example, we see that at time t-1 only the region proposed at time t-1 was detected as pedestrian. By thresholding the mean of all scores, we can detect the pedestrians we otherwise wouldn't. And vice versa, if there was, say a bicycle in one region proposal that received a high score, but it didn't receive a high score for previous 2 times, the mean would be smaller than the chosen threshold and we wouldn't falsely detect the proposed region as a pedestrian at time t.

For details, see the scripts: bus-video-pred.py, bus-video-dets.py and bus-video-dets-ma3.py in ../scripts/.

Hamburg example

About

Pedestrian detection for autonomous driving applications

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages