trt accelerator #7238

riyajatar37003 · 2024-05-17T17:20:55Z

Description
i have converted a pytorch model to onnx with fp16 precision.
Triton Information
24.03

Are you using the Triton container or did you build it yourself?
container
To Reproduce
i am using model-analyser to generate reports for different configs, but its giving below warning and stucks there forever.
I0517 17:00:33.419397 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_0 (GPU device 0) I0517 17:00:33.419416 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_1 (GPU device 0) I0517 17:00:33.419458 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_2 (GPU device 0) I0517 17:00:33.419473 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_3 (GPU device 0) I0517 17:00:33.419514 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_4 (GPU device 0) I0517 17:00:33.419526 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_5 (GPU device 0) I0517 17:00:33.419580 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_6 (GPU device 0) I0517 17:00:33.419601 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_7 (GPU device 0) 2024-05-17 17:00:41.008249791 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:00:41.008285569 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:00:41.008290949 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:00:41.008297602 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:00:41.008343438 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:00:42.421015291 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:42 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:00:43.797671162 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:43 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:03:19.817791037 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:03:19.817834209 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:03:19.817839830 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:03:19.817845841 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:03:19.817889874 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:03:21.216239987 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:21 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:03:22.564165174 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:22 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:06:00.961948435 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:06:00.961992879 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:06:00.961998901 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:06:00.962005373 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:06:00.962053173 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:06:02.351065690 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:02 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:06:03.729500906 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:03 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:08:41.788991283 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:08:41.789050785 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:08:41.789056105 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:08:41.789062357 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:08:41.789107623 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:08:43.198704153 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:43 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:08:44.570473931 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:44 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:11:28.251322741 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:11:28.251357717 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:11:28.251363007 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:11:28.251369129 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:11:28.251412792 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:11:29.643603875 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:29 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:11:31.028186788 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:31 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

model is just embedding model from hugging face.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trt accelerator #7238

trt accelerator #7238

riyajatar37003 commented May 17, 2024 •

edited

trt accelerator #7238

trt accelerator #7238

Comments

riyajatar37003 commented May 17, 2024 • edited

riyajatar37003 commented May 17, 2024 •

edited