Pedestrian detection for autonomous driving applications

This repository contains the source code of a research project to develop a machine learning model that detects pedestrians in images with a focus on autonomous driving applications.

Check the testing video to see the models detections.

Overview

The main model is a modified PyTorch implementation of Faster R-CNN with ResNet50 Backbone pre-trained on COCO dataset. The architecture was modified for pedestrian detection and fine-tuned on annotations from CityPersons dataset and images from CityScapes dataset. See the technical overview in Figure below.

How-to

You can download the fitted model from here: final-model.pt

The code was written in Python and tested on Ubuntu 20.04.1 LTS using Python 3.8.10 and PyTorch library version 1.8.0. To install all the dependencies run:

pip install --upgrade pip
pip install torch==1.8.0
pip install torchvision
pip install pycocotools

Import the modules of the main model:

import os
import sys

module_path = os.path.abspath(os.path.join('../src/main-model/'))
if module_path not in sys.path:
    sys.path.append(module_path)

import fasterutils
import fasterrcnn

Import other modules to run the model:

import torch
import torch.utils.data
from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor(),])
from PIL import Image

Then, load the model:

model = torch.load(
    f = '/home/marko/data/models/final-model-fit.pt',
    map_location = torch.device('cpu'));

As an example, let's pick an image from Hamburg (in ../figures/ directory):

img = Image.open('../figures/hamburg.png')

model.eval()
img_tr = torch.unsqueeze(transform(img), 0)
with torch.no_grad():
    pred = model(img_tr)[0]

In the returned object pred you can find predicted bounding boxes around objects that were detected as pedestrians by the model. For each bounding box there is a corresponding score, that is the estimated likelihood of pedestrian in the bounding box:

pred
{'boxes': array([[ 807.6329 ,  343.27094,  890.54095,  541.85565] ...
 'scores': array([0.99913067, ... , 0.05334361], dtype=float32)}

Then, you can use show() function from our fasterutils module on transformed results:

bboxes, labels, scores = pred['boxes'], pred['labels'], pred['scores']
bboxes, labels, scores = [arr.cpu().detach().numpy() for arr in [bboxes, labels, scores]]
fasterutils.show(img, bboxes, scores)

To get the image below:

Evaluation results

The Citypersons validation set of 500 images and corresponding annotations were used for testing. In particular, 441 images where there is at least on person in the image, from cities: Munster, Frankfurt, Lindau.

Below are the results for AP and AR measures:

Measure	IoU Range	Area	MaxDets	Value
Average Precision (AP)	0.50:0.95	all	100	0.461
Average Precision (AP)	0.50	all	100	0.754
Average Precision (AP)	0.75	all	100	0.492
Average Precision (AP)	0.50:0.95	small	100	0.087
Average Precision (AP)	0.50:0.95	medium	100	0.392
Average Precision (AP)	0.50:0.95	large	100	0.616
Average Recall (AR)	0.50:0.95	all	1	0.095
Average Recall (AR)	0.50:0.95	all	10	0.417
Average Recall (AR)	0.50:0.95	all	100	0.550
Average Recall (AR)	0.50:0.95	small	100	0.340
Average Recall (AR)	0.50:0.95	medium	100	0.506
Average Recall (AR)	0.50:0.95	large	100	0.663

And using log-average miss rate (MR) measures, for comparison using CityPersons Benchmark Table:

Method	External training data	MR (Reasonable)	MR (Reasonable_small)	MR (Reasonable_occ=heavy)	MR (All)
APD-pretrain	√	7.31%	10.81%	28.07%	32.71%
Pedestron	√	7.69%	9.16%	27.08%	28.33%
APD	×	8.27%	11.03%	35.45%	35.65%
YT-PedDet	×	8.41%	10.60%	37.88%	37.22%
STNet	×	8.92%	11.13%	34.31%	29.54%
MGAN	×	9.29%	11.38%	40.97%	38.86%
DVRNet	×	11.17%	15.62%	42.52%	40.99%
HBA-RCNN	×	11.26%	15.68%	39.54%	38.77%
OR-CNN	×	11.32%	14.19%	51.43%	40.19%
AdaptiveNMS	×	11.40%	13.64%	46.99%	38.89%
Repultion Loss	×	11.48%	15.67%	52.59%	39.17%
Cascade MS-CNN	×	11.62%	13.64%	47.14%	37.63%
Adapted FasterRCNN	×	12.97%	37.24%	50.47%	43.86%
MS-CNN	×	13.32%	15.86%	51.88%	39.94%
This Model	×	25.33%	50.96%	63.56%	41.96%

Extension

In the figure below are 3 consecutive frames from the testing video where I run the model on all frames of Pedestrian Challenge video.

Say we are detecting pedestrians at time t represented by the first image in the figure below. At time t, because of occlusions, brightness variations or for some other reasons, the model can fail to detect a pedestrian. But perhaps the model detected the same person in one or more of the previous frames. The idea is to use this information, which has accumulated over time (at frames t-1, t-2, ...) in order to achieve better detection at time t.

My proposed solution is a bit similar to a moving-average idea. For a chosen number of time steps, for simplicity say 2 we do the following. We first solve the classic matching problem of computer vision and match region proposals at time t to the region proposals at previous 2 time frames. This way, we know which detecting bounding boxes should represent the same person through time. Second, we take a mean of corresponding scores (estimated likelihoods that the region contains a person). And if this mean is greater that some threshold, we show the bounding box at time t.

I tried different variations of this idea with different number of time steps with interesting results. In the figure below, we can see the car approaching a pedestrian on the left. Here I colored the proposed regions with red for current frame, with green the previous frame and blue the previous frame at time t-1 for each time t. For example, we see that at time t-1 only the region proposed at time t-1 was detected as pedestrian. By thresholding the mean of all scores, we can detect the pedestrians we otherwise wouldn't. And vice versa, if there was, say a bicycle in one region proposal that received a high score, but it didn't receive a high score for previous 2 times, the mean would be smaller than the chosen threshold and we wouldn't falsely detect the proposed region as a pedestrian at time t.

For details, see the scripts: bus-video-pred.py, bus-video-dets.py and bus-video-dets-ma3.py in ../scripts/.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data		data
figures		figures
notebooks		notebooks
presentation-slides		presentation-slides
project-report		project-report
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

figures

figures

notebooks

notebooks

presentation-slides

presentation-slides

project-report

project-report

scripts

scripts

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Pedestrian detection for autonomous driving applications

Overview

How-to

Evaluation results

Extension

About

Releases

Packages

Contributors 3

Languages

License

markolalovic/pedestrian-detection

Folders and files

Latest commit

History

Repository files navigation

Pedestrian detection for autonomous driving applications

Overview

How-to

Evaluation results

Extension

About

Topics

Resources

License

Stars

Watchers

Forks

Languages