Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Under Development #1592

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
cmake_minimum_required(VERSION 3.14)

project(
tensorrtx
VERSION 0.1
LANGUAGES C CXX CUDA)

set(TensorRT_7_8_10_TARGETS mlp lenet alexnet vgg)

set(TensorRT_8_TARGETS)

set(TensorRT_10_TARGETS)

set(ALL_TARGETS ${TensorRT_7_8_10_TARGETS} ${TensorRT_8_TARGETS}
${TensorRT_10_TARGETS})

foreach(sub_dir ${ALL_TARGETS})
message(STATUS "Add subdirectory: ${sub_dir}")
add_subdirectory(${sub_dir})
endforeach()
77 changes: 68 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The basic workflow of TensorRTx is:
## News

- `22 Oct 2024`. [lindsayshuo](https://github.com/lindsayshuo): YOLOv8-obb
- `18 Oct 2024`. [zgjja](https://github.com/zgjja): Rafactor docker image.
- `18 Oct 2024`. [zgjja](https://github.com/zgjja): Refactor docker image.
- `11 Oct 2024`. [mpj1234](https://github.com/mpj1234): YOLO11
- `9 Oct 2024`. [Phoenix8215](https://github.com/Phoenix8215): GhostNet V1 and V2.
- `21 Aug 2024`. [Lemonononon](https://github.com/Lemonononon): real-esrgan-general-x4v3
Expand All @@ -38,7 +38,7 @@ The basic workflow of TensorRTx is:
- [A guide for quickly getting started, taking lenet5 as a demo.](./tutorials/getting_started.md)
- [The .wts file content format](./tutorials/getting_started.md#the-wts-content-format)
- [Frequently Asked Questions (FAQ)](./tutorials/faq.md)
- [Migrating from TensorRT 4 to 7](./tutorials/migrating_from_tensorrt_4_to_7.md)
- [Migration Guide](./tutorials/migration_guide.md)
- [How to implement multi-GPU processing, taking YOLOv4 as example](./tutorials/multi_GPU_processing.md)
- [Check if Your GPU support FP16/INT8](./tutorials/check_fp16_int8_support.md)
- [How to Compile and Run on Windows](./tutorials/run_on_windows.md)
Expand All @@ -47,21 +47,80 @@ The basic workflow of TensorRTx is:

## Test Environment

1. TensorRT 7.x
2. TensorRT 8.x(Some of the models support 8.x)
1. (**NOT recommended**) TensorRT 7.x
2. (**Recommended**)TensorRT 8.x
3. (**NOT recommended**) TensorRT 10.x

### Note

1. For history reason, some of the models are limited to specific TensorRT version, please check the README.md or code for the model you want to use.
2. Currently, TensorRT 8.x has better compatibility and the most of the features supported.

## How to run

Each folder has a readme inside, which explains how to run the models inside.
**Note**: this project support to build each network by the `CMakeLists.txt` in its subfolder, or you can build them together by the `CMakeLists.txt` on top of this project.

* General procedures before building and running:

```bash
# 1. generate xxx.wts from https://github.com/wang-xinyu/pytorchx/tree/master/lenet
# ...

# 2. put xxx.wts on top of this folder
# ...
```

* (*Option 1*) To build a single subproject in this project, do:

```bash
## enter the subfolder
cd tensorrtx/xxx

## configure & build
cmake -S . -B build
make -C build
```

* (*Option 2*) To build many subprojects, firstly, in the top `CMakeLists.txt`, **uncomment** the project you don't want to build or not suppoted by your TensorRT version, e.g., you cannot build subprojects in `${TensorRT_8_Targets}` if your TensorRT is `7.x`. Then:

```bash
## enter the top of this project
cd tensorrtx

## configure & build
# you may use "Ninja" rather than "make" to significantly boost the build speed
cmake -G Ninja -S . -B build
ninja -C build
```

**WARNING**: This part is still under development, most subprojects are not adapted yet.

* run the generated executable, e.g.:

```bash
# serialize model to plan file i.e. 'xxx.engine'
build/xxx -s

# deserialize plan file and run inference
build/xxx -d

# (Optional) check if the output is same as pytorchx/lenet
# ...

# (Optional) customize the project
# ...
```

For more details, each subfolder may contain a `README.md` inside, which explains more.

## Models

Following models are implemented.

|Name | Description |
|-|-|
|[mlp](./mlp) | the very basic model for starters, properly documented |
|[lenet](./lenet) | the simplest, as a "hello world" of this project |
| Name | Description | Supported TensorRT Version |
|---------------|---------------|---------------|
|[mlp](./mlp) | the very basic model for starters, properly documented | 7.x/8.x/10.x |
|[lenet](./lenet) | the simplest, as a "hello world" of this project | 7.x/8.x/10.x |
|[alexnet](./alexnet)| easy to implement, all layers are supported in tensorrt |
|[googlenet](./googlenet)| GoogLeNet (Inception v1) |
|[inception](./inception)| Inception v3, v4 |
Expand Down
56 changes: 38 additions & 18 deletions alexnet/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,25 +1,45 @@
cmake_minimum_required(VERSION 2.6)
cmake_minimum_required(VERSION 3.14)

project(alexnet)
project(
alexnet
VERSION 0.1
LANGUAGES C CXX CUDA)

add_definitions(-std=c++11)
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES
60
70
72
75
80
86
89)
endif()

option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_INCLUDE_CURRENT_DIR TRUE)
set(CMAKE_BUILD_TYPE
"Debug"
CACHE STRING "Build type for this project" FORCE)

include_directories(${PROJECT_SOURCE_DIR}/include)
# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
# cuda
include_directories(/usr/local/cuda/include)
link_directories(/usr/local/cuda/lib64)
# tensorrt
include_directories(/usr/include/x86_64-linux-gnu/)
link_directories(/usr/lib/x86_64-linux-gnu/)
option(CUDA_USE_STATIC_CUDA_RUNTIME "Use static cudaruntime library" OFF)

add_executable(alexnet ${PROJECT_SOURCE_DIR}/alex.cpp)
target_link_libraries(alexnet nvinfer)
target_link_libraries(alexnet cudart)
find_package(Threads REQUIRED)
find_package(CUDAToolkit REQUIRED)

add_definitions(-O2 -pthread)
if(NOT TARGET TensorRT::TensorRT)
include(FindTensorRT.cmake)
else()
message("TensorRT has been found, skipping for ${PROJECT_NAME}")
endif()

add_executable(${PROJECT_NAME} alex.cpp)

target_include_directories(${PROJECT_NAME} PUBLIC ${CMAKE_SOURCE_DIR}/include)

target_link_libraries(${PROJECT_NAME} PUBLIC Threads::Threads m
TensorRT::TensorRT CUDA::cudart)
78 changes: 78 additions & 0 deletions alexnet/FindTensorRT.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
cmake_minimum_required(VERSION 3.17.0)

set(TRT_VERSION
$ENV{TRT_VERSION}
CACHE STRING
"TensorRT version, e.g. \"8.6.1.6\" or \"8.6.1.6+cuda12.0.1.011\"")

# find TensorRT include folder
if(NOT TensorRT_INCLUDE_DIR)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
set(TensorRT_INCLUDE_DIR
"/usr/local/cuda/targets/aarch64-linux/include"
CACHE PATH "TensorRT_INCLUDE_DIR")
else()
set(TensorRT_INCLUDE_DIR
"/usr/include/x86_64-linux-gnu"
CACHE PATH "TensorRT_INCLUDE_DIR")
endif()
message(STATUS "TensorRT: ${TensorRT_INCLUDE_DIR}")
endif()

# find TensorRT library folder
if(NOT TensorRT_LIBRARY_DIR)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
set(TensorRT_LIBRARY_DIR
"/usr/lib/aarch64-linux-gnu/tegra"
CACHE PATH "TensorRT_LIBRARY_DIR")
else()
set(TensorRT_LIBRARY_DIR
"/usr/include/x86_64-linux-gnu"
CACHE PATH "TensorRT_LIBRARY_DIR")
endif()
message(STATUS "TensorRT: ${TensorRT_LIBRARY_DIR}")
endif()

set(TensorRT_LIBRARIES)

message(STATUS "Found TensorRT lib: ${TensorRT_LIBRARIES}")

# process for different TensorRT version
if(DEFINED TRT_VERSION AND NOT TRT_VERSION STREQUAL "")
string(REGEX MATCH "([0-9]+)" _match ${TRT_VERSION})
set(TRT_MAJOR_VERSION "${_match}")
set(_modules nvinfer nvinfer_plugin)

if(TRT_MAJOR_VERSION GREATER_EQUAL 8)
list(APPEND _modules nvinfer_vc_plugin nvinfer_dispatch nvinfer_lean)
endif()
else()
message(FATAL_ERROR "Please set a environment variable \"TRT_VERSION\"")
endif()

# find and add all modules of TensorRT into list
foreach(lib IN LISTS _modules)
find_library(
TensorRT_${lib}_LIBRARY
NAMES ${lib}
HINTS ${TensorRT_LIBRARY_DIR})
list(APPEND TensorRT_LIBRARIES ${TensorRT_${lib}_LIBRARY})
endforeach()

# make the "TensorRT target"
add_library(TensorRT IMPORTED INTERFACE)
add_library(TensorRT::TensorRT ALIAS TensorRT)
target_link_libraries(TensorRT INTERFACE ${TensorRT_LIBRARIES})

set_target_properties(
TensorRT
PROPERTIES C_STANDARD 17
CXX_STANDARD 17
POSITION_INDEPENDENT_CODE ON
SKIP_BUILD_RPATH TRUE
BUILD_WITH_INSTALL_RPATH TRUE
INSTALL_RPATH "$\{ORIGIN\}"
INTERFACE_INCLUDE_DIRECTORIES "${TensorRT_INCLUDE_DIR}")

unset(TRT_MAJOR_VERSION)
unset(_modules)
72 changes: 53 additions & 19 deletions alexnet/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,67 @@
# alexnet
# AlexNet

AlexNet model architecture from the "One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.
## Introduction

For the details, you can refer to [pytorchx/alexnet](https://github.com/wang-xinyu/pytorchx/tree/master/alexnet)
AlexNet model architecture comes from this paper: [One weird trick for parallelizing convolutional neural networks](https://arxiv.org/abs/1404.5997). To generate `.wts` file, you can refer to [pytorchx/alexnet](https://github.com/wang-xinyu/pytorchx/tree/master/alexnet). To check the pytorch implementation of AlexNet, refer to [HERE](https://github.com/pytorch/vision/blob/main/torchvision/models/alexnet.py#L17)

This alexnet is just several `conv-relu-pool` blocks followed by several `fc-relu`, nothing special. All layers can be implemented by tensorrt api, including `addConvolution`, `addActivation`, `addPooling`, `addFullyConnected`.
AlexNet consists of 3 major parts: features, adaptive average pooling, and classifier:
* features: just several stacked `CRP`(conv-relu-pool) and `CR` layers
* adaptive average pooling: pytorch can decide its inner parameters, but we need to calculate it ourselves in TensorRT API
* classifier: just several `fc-relu` layers. All layers can be implemented by tensorrt api, including `addConvolution`, `addActivation`, `addPooling`, `addMatrixMultiply`, `addElementWise` etc.

```
// 1. generate alexnet.wts from [pytorchx/alexnet](https://github.com/wang-xinyu/pytorchx/tree/master/alexnet)

// 2. put alexnet.wts into tensorrtx/alexnet

// 3. build and run

cd tensorrtx/alexnet
## Use AlexNet from PyTorch

mkdir build
We can use torchvision to load the pretrained alexnet model:

cd build
```python
alexnet = torchvision.models.alexnet(pretrained=True)
```

cmake ..
The model structure is:

```txt
AlexNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
```

make
## FAQ

sudo ./alexnet -s // serialize model to plan file i.e. 'alexnet.engine'
### How to align the output with Pytorch?

sudo ./alexnet -d // deserialize plan file and run inference
If your output is different from pytorch, you have to check which TensorRT API or your code cause this. A simple solution would be check the `.engine` output part by part, e.g., you can set the early layer of alexnet as output:

// 4. see if the output is same as pytorchx/alexnet
```c++
fc3_1->getOutput(0)->setName(OUTPUT_NAME);
network->markOutput(*pool3->getOutput(0)); // original is: "*fc3_1->getOutput(0)"
```

For this line of code, i use the output from "feature" part of alexnet, ignoring the rest of the model, then, don't forget to change the `OUTPUT_SIZE` macro on top of the file, lastly, build the `.engine` file to apply the changes.

You can sum up all output from C++ code, and compare it with Pytorch output, for Pytorch, you can do this by: `torch.sum(x)` at debug phase. The ideal value deviation between 2 values would be $[10^{-1}, 10^{-2}]$, for this example, since the output elements for "feature" is $256 * 6 * 6$ (bacth = 1), the final error would roughly be $10^{-4}$.

Note: This is a quick check, for more accurate check, you have to save the output tensor into a file to compare them value by value, but this situation is rare.
Loading
Loading