tensorrt_demos

installation for yolov3 & yolov4

Prerequisite

The code in this repository was tested on Jetson Nano, TX2, and Xavier NX DevKits. In order to run the demos below, first make sure you have the proper version of image (JetPack) installed on the target Jetson system. For example, Setting up Jetson Nano: The Basics and Setting up Jetson Xavier NX.

More specifically, the target Jetson system must have TensorRT libraries installed.

For yolov3 & yolov4 it requires TensorRT 6.x+.
Demo #1 part 1: INT8 requires TensorRT 6.x+ and only works on GPUs with CUDA compute 6.1+.
Demo #2 part 2: DLA core requires TensorRT 7.x+ (is only tested on Jetson Xavier NX).

You could check which version of TensorRT has been installed on your Jetson system by looking at file names of the libraries. For example, TensorRT v5.1.6 (JetPack-4.2.2) was present on one of my Jetson Nano DevKits.

$ ls /usr/lib/aarch64-linux-gnu/libnvinfer.so*
/usr/lib/aarch64-linux-gnu/libnvinfer.so
/usr/lib/aarch64-linux-gnu/libnvinfer.so.5
/usr/lib/aarch64-linux-gnu/libnvinfer.so.5.1.6

Furthermore, all demo programs in this repository require "cv2" (OpenCV) module for python3. You could use the "cv2" module which came in the JetPack. Or, if you'd prefer building your own, refer to Installing OpenCV 3.4.6 on Jetson Nano for how to build from source and install opencv-3.4.6 on your Jetson system.

Demo #: YOLOv3

(Merged with Demo #: YOLOv4...)

Demo #2: YOLOv4

Assuming this repository has been cloned at "${HOME}/project/tensorrt_demos", follow these steps:

Install "pycuda" in case you haven't done so in Demo #3. Note that the installation script resides in the "ssd" folder.
```
$ cd ${HOME}/project/tensorrt_demos/ssd
$ ./install_pycuda.sh
```
Install version "1.4.1" (not the latest version) of python3 "onnx" module. Note that the "onnx" module would depend on "protobuf" as stated in the Prerequisite section. Reference: information provided by NVIDIA.
```
$ sudo pip3 install onnx==1.4.1
```
Go to the "plugins/" subdirectory and build the "yolo_layer" plugin. When done, a "libyolo_layer.so" would be generated.
```
$ cd ${HOME}/project/tensorrt_demos/plugins
$ make
```
Download the pre-trained yolov3/yolov4 COCO models and convert the targeted model to ONNX and then to TensorRT engine. I use "yolov4-416" as example below. (Supported models: "yolov3-tiny-288", "yolov3-tiny-416", "yolov3-288", "yolov3-416", "yolov3-608", "yolov3-spp-288", "yolov3-spp-416", "yolov3-spp-608", "yolov4-tiny-288", "yolov4-tiny-416", "yolov4-288", "yolov4-416", "yolov4-608", "yolov4-csp-256", "yolov4-csp-512", "yolov4x-mish-320", "yolov4x-mish-640", and custom models such as "yolov4-416x256".)
```
$ cd ${HOME}/project/tensorrt_demos/yolo
$ ./download_yolo.sh
$ python3 yolo_to_onnx.py -m yolov4-416
$ python3 onnx_to_tensorrt.py -m yolov4-416
```
The last step ("onnx_to_tensorrt.py") takes a little bit more than half an hour to complete on my Jetson Nano DevKit. When that is done, the optimized TensorRT engine would be saved as "yolov4-416.trt".

In case "onnx_to_tensorrt.py" fails (process "Killed" by Linux kernel), it could likely be that the Jetson platform runs out of memory during conversion of the TensorRT engine. This problem might be solved by adding a larger swap file to the system. Reference: Process killed in onnx_to_tensorrt.py Demo#5.

inference

add path of required arguments inside .py file

using simple python.py file

$ cd ${HOME}/project/tensorrt_demos
$ python3 trt_ad_yolo_image.py
$ python3 trt_ad_yolo_video.py

using argument

Test the TensorRT "yolov4-416" engine with the "dog.jpg" image.

$ cd ${HOME}/project/tensorrt_demos
$ wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/dog.jpg -O ${HOME}/Pictures/dog.jpg
$ python3 trt_yolo.py --image ${HOME}/Pictures/dog.jpg \
                      -m yolov4-416

This is a screenshot of the demo against JetPack-4.4, i.e. TensorRT 7.

The "trt_yolo.py" demo program could also take various image inputs. Refer to step 5 in Demo #1 again.

For example, I tested my own custom trained "yolov4-crowdhuman-416x416" TensorRT engine with the "Avengers: Infinity War" movie trailer:
(Optional) Test other models than "yolov4-416".
(Optional) If you would like to stream TensorRT YOLO detection output over the network and view the results on a remote host, check out my trt_yolo_mjpeg.py example.

Similar to step 5 of Demo #3, I created an "eval_yolo.py" for evaluating mAP of the TensorRT yolov3/yolov4 engines. Refer to README_mAP.md for details.

$ python3 eval_yolo.py -m yolov3-tiny-288
$ python3 eval_yolo.py -m yolov4-tiny-416
......
$ python3 eval_yolo.py -m yolov4-608
$ python3 eval_yolo.py -l -m yolov4-csp-256
......
$ python3 eval_yolo.py -l -m yolov4x-mish-640

I evaluated all these TensorRT yolov3/yolov4 engines with COCO "val2017" data and got the following results. I also checked the FPS (frames per second) numbers on my Jetson Nano DevKit with JetPack-4.4 (TensorRT 7).

TensorRT engine	mAP @ IoU=0.5:0.95	mAP @ IoU=0.5	FPS on Nano
yolov3-tiny-288 (FP16)	0.077	0.158	35.8
yolov3-tiny-416 (FP16)	0.096	0.202	25.5
yolov3-288 (FP16)	0.331	0.601	8.16
yolov3-416 (FP16)	0.373	0.664	4.93
yolov3-608 (FP16)	0.376	0.665	2.53
yolov3-spp-288 (FP16)	0.339	0.594	8.16
yolov3-spp-416 (FP16)	0.391	0.664	4.82
yolov3-spp-608 (FP16)	0.410	0.685	2.49
yolov4-tiny-288 (FP16)	0.179	0.344	36.6
yolov4-tiny-416 (FP16)	0.196	0.387	25.5
yolov4-288 (FP16)	0.376	0.591	7.93
yolov4-416 (FP16)	0.459	0.700	4.62
yolov4-608 (FP16)	0.488	0.736	2.35
yolov4-csp-256 (FP16)	0.336	0.502	12.8
yolov4-csp-512 (FP16)	0.436	0.630	4.26
yolov4x-mish-320 (FP16)	0.400	0.581	4.79
yolov4x-mish-640 (FP16)	0.470	0.668	1.46

Check out my blog posts for implementation details:

TensorRT ONNX YOLOv3
TensorRT YOLOv4
Verifying mAP of TensorRT Optimized SSD and YOLOv3 Models
For training your own custom yolov4 model: Custom YOLOv4 Model on Google Colab
For adapting the code to your own custom trained yolov3/yolov4 models: TensorRT YOLO For Custom Trained Models (Updated)

Demo #6: Using INT8 and DLA core

NVIDIA introduced INT8 TensorRT inferencing since CUDA compute 6.1+. For the embedded Jetson product line, INT8 is available on Jetson AGX Xavier and Xavier NX. In addition, NVIDIA further introduced Deep Learning Accelerator (NVDLA) on Jetson Xavier NX. I tested both features on my Jetson Xavier NX DevKit, and shared the source code in this repo.

Please make sure you have gone through the steps of Demo #5 and are able to run TensorRT yolov3/yolov4 engines successfully, before following along:

In order to use INT8 TensorRT, you'll first have to prepare some images for "calibration". These images for calibration should cover all distributions of possible image inputs at inference time. According to official documentation, 500 of such images are suggested by NVIDIA. As an example, I used 1,000 images from the COCO "val2017" dataset for that purpose. Note that I've previously downloaded the "val2017" images for mAP evaluation.
```
$ cd ${HOME}/project/tensorrt_demos/yolo
$ mkdir calib_images
### randomly pick and copy over 1,000 images from "val207"
$ for jpg in $(ls -1 ${HOME}/data/coco/images/val2017/*.jpg | sort -R | head -1000); do \
    cp ${HOME}/data/coco/images/val2017/${jpg} calib_images/; \
  done
```
When this is done, the 1,000 images for calibration should be present in the "${HOME}/project/tensorrt_demos/yolo/calib_images/" directory.
Build the INT8 TensorRT engine. I use the "yolov3-608" model in the example commands below. (I've also created a "build_int8_engines.sh" script to facilitate building multiple INT8 engines at once.) Note that building the INT8 TensorRT engine on Jetson Xavier NX takes quite long. By enabling verbose logging ("-v"), you would be able to monitor the progress more closely.
```
$ ln -s yolov3-608.cfg yolov3-int8-608.cfg
$ ln -s yolov3-608.onnx yolov3-int8-608.onnx
$ python3 onnx_to_tensorrt.py -v --int8 -m yolov3-int8-608
```

(Optional) Build the TensorRT engines for the DLA cores. I use the "yolov3-608" model as example again. (I've also created a "build_dla_engines.sh" script for building multiple DLA engines at once.)

$ ln -s yolov3-608.cfg yolov3-dla0-608.cfg
$ ln -s yolov3-608.onnx yolov3-dla0-608.onnx
$ python3 onnx_to_tensorrt.py -v --int8 --dla_core 0 -m yolov3-dla0-608
$ ln -s yolov3-608.cfg yolov3-dla1-608.cfg
$ ln -s yolov3-608.onnx yolov3-dla1-608.onnx
$ python3 onnx_to_tensorrt.py -v --int8 --dla_core 1 -m yolov3-int8-608

Test the INT8 TensorRT engine with the "dog.jpg" image.

$ cd ${HOME}/project/tensorrt_demos
$ python3 trt_yolo.py --image ${HOME}/Pictures/dog.jpg \
                      -m yolov3-int8-608

(Optional) Also test the DLA0 and DLA1 TensorRT engines.

$ python3 trt_yolo.py --image ${HOME}/Pictures/dog.jpg \
                      -m yolov3-dla0-608
$ python3 trt_yolo.py --image ${HOME}/Pictures/dog.jpg \
                      -m yolov3-dla1-608

Evaluate mAP of the INT8 and DLA TensorRT engines.

$ python3 eval_yolo.py -m yolov3-int8-608
$ python3 eval_yolo.py -m yolov3-dla0-608
$ python3 eval_yolo.py -m yolov3-dla1-608

I tested the 5 original yolov3/yolov4 models on my Jetson Xavier NX DevKit with JetPack-4.4 (TensorRT 7.1.3.4). Here are the results.

The following FPS numbers were measured under "15W 6CORE" mode, with CPU/GPU clocks set to maximum value (sudo jetson_clocks).

TensorRT engine	FP16	INT8	DLA0	DLA1
yolov3-tiny-416	58	65	42	42
yolov3-608	15.2	23.1	14.9	14.9
yolov3-spp-608	15.0	22.7	14.7	14.7
yolov4-tiny-416	57	60	X	X
yolov4-608	13.8	20.5	8.97	8.97
yolov4-csp-512	19.8	27.8	--	--
yolov4x-mish-640	9.01	14.1	--	--

And the following are "mAP@IoU=0.5:0.95" / "mAP@IoU=0.5" of those TensorRT engines.

TensorRT engine	FP16	INT8	DLA0	DLA1
yolov3-tiny-416	0.096 / 0.202	0.094 / 0.198	0.096 / 0.199	0.096 / 0.199
yolov3-608	0.376 / 0.665	0.378 / 0.670	0.378 / 0.670	0.378 / 0.670
yolov3-spp-608	0.410 / 0.685	0.407 / 0.681	0.404 / 0.676	0.404 / 0.676
yolov4-tiny-416	0.196 / 0.387	0.190 / 0.376	X	X
yolov4-608	0.488 / 0.736	0.317 / 0.507	0.474 / 0.727	0.473 / 0.726
yolov4-csp-512	0.436 / 0.630	0.391 / 0.577	--	--
yolov4x-mish-640	0.470 / 0.668	0.434 / 0.631	--	--

Issues:
- For some reason, I'm not able to build DLA TensorRT engines for the "yolov4-tiny-416" model. I have reported the issue to NVIDIA.
- There is no method in TensorRT 7.1 Python API to specifically set DLA core at inference time. I also reported this issue to NVIDIA. When testing, I simply deserialize the TensorRT engines onto Jetson Xavier NX. I'm not 100% sure whether the engine is really executed on DLA core 0 or DLA core 1.
- mAP of the INT8 TensorRT engine of the "yolov4-608" model is not good. Originally, I thought it was an issue of TensorRT library's handling of "Concat" nodes. But after some more investigation, I saw that was not the case. Currently, I'm still not sure what the problem is...

Licenses

I referenced source code of NVIDIA/TensorRT samples to develop most of the demos in this repository. Those NVIDIA samples are under Apache License 2.0.
GoogLeNet: "This model is released for unrestricted use."
MTCNN: license not specified. Note the original MTCNN is under MIT License.
TensorFlow Object Detection Models: Apache License 2.0.
YOLOv3/YOLOv4 models (DarkNet): YOLO LICENSE.
MODNet: Creative Commons Attribution NonCommercial ShareAlike 4.0 license.
For the rest of the code (developed by jkjung-avt and other contributors): MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 410 Commits
common		common
doc		doc
googlenet		googlenet
modnet		modnet
mtcnn		mtcnn
plugins		plugins
ssd		ssd
utils		utils
yolo		yolo
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_mAP.md		README_mAP.md
README_x86.md		README_x86.md
eval_ssd.py		eval_ssd.py
eval_yolo.py		eval_yolo.py
pytrt.pxd		pytrt.pxd
pytrt.pyx		pytrt.pyx
setup.py		setup.py
test_modnet.py		test_modnet.py
trtNet.cpp		trtNet.cpp
trtNet.h		trtNet.h
trt_ad_yolo_image.py		trt_ad_yolo_image.py
trt_ad_yolo_video.py		trt_ad_yolo_video.py
trt_googlenet.py		trt_googlenet.py
trt_googlenet_async.py		trt_googlenet_async.py
trt_modnet.py		trt_modnet.py
trt_mtcnn.py		trt_mtcnn.py
trt_ssd.py		trt_ssd.py
trt_ssd_async.py		trt_ssd_async.py
trt_yolo.py		trt_yolo.py
trt_yolo_cv.py		trt_yolo_cv.py
trt_yolo_mjpeg.py		trt_yolo_mjpeg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tensorrt_demos

installation for yolov3 & yolov4

Prerequisite

Demo #: YOLOv3

Demo #2: YOLOv4

inference

add path of required arguments inside .py file

using simple python.py file

using argument

Demo #6: Using INT8 and DLA core

Licenses

About

Releases

Packages

Languages

License

akashAD98/tensorrt_demos-for-Scaled-yolo

Folders and files

Latest commit

History

Repository files navigation

tensorrt_demos

installation for yolov3 & yolov4

Prerequisite

Demo #: YOLOv3

Demo #2: YOLOv4

inference

add path of required arguments inside .py file

using simple python.py file

using argument

Demo #6: Using INT8 and DLA core

Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages