-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trt accelerator #7238
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
i have converted a pytorch model to onnx with fp16 precision.
Triton Information
24.03
Are you using the Triton container or did you build it yourself?
container
To Reproduce
i am using model-analyser to generate reports for different configs, but its giving below warning and stucks there forever.
I0517 17:00:33.419397 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_0 (GPU device 0) I0517 17:00:33.419416 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_1 (GPU device 0) I0517 17:00:33.419458 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_2 (GPU device 0) I0517 17:00:33.419473 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_3 (GPU device 0) I0517 17:00:33.419514 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_4 (GPU device 0) I0517 17:00:33.419526 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_5 (GPU device 0) I0517 17:00:33.419580 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_6 (GPU device 0) I0517 17:00:33.419601 1 onnxruntime.cc:2965] TRITONBACKEND_ModelInstanceInitialize: bge_reranker_v2_onnx_0_7 (GPU device 0) 2024-05-17 17:00:41.008249791 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:00:41.008285569 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:00:41.008290949 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:00:41.008297602 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:00:41.008343438 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:41 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:00:42.421015291 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:42 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:00:43.797671162 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:00:43 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:03:19.817791037 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:03:19.817834209 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:03:19.817839830 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:03:19.817845841 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:03:19.817889874 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:19 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:03:21.216239987 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:21 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:03:22.564165174 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:03:22 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:06:00.961948435 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:06:00.961992879 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:06:00.961998901 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:06:00.962005373 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:06:00.962053173 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:00 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:06:02.351065690 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:02 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:06:03.729500906 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:06:03 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:08:41.788991283 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:08:41.789050785 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:08:41.789056105 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:08:41.789062357 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:08:41.789107623 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:41 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:08:43.198704153 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:43 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:08:44.570473931 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:08:44 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:11:28.251322741 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] TensorRT encountered issues when converting weights between types and that could affect accuracy. 2024-05-17 17:11:28.251357717 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. 2024-05-17 17:11:28.251363007 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] Check verbose logs for the list of affected weights. 2024-05-17 17:11:28.251369129 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] - 256 weights are affected by this issue: Detected subnormal FP16 values. 2024-05-17 17:11:28.251412792 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:28 WARNING] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. 2024-05-17 17:11:29.643603875 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:29 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2024-05-17 17:11:31.028186788 [W:onnxruntime:log, tensorrt_execution_provider.h:83 log] [2024-05-17 17:11:31 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
model is just embedding model from hugging face.
The text was updated successfully, but these errors were encountered: