-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Linux docker images #23244
Update Linux docker images #23244
Conversation
@tianleiwu , there is a strange error from CUDNN frontend , which was caused by upgrading CUDNN from 9.5 to 9.6. Could you please help me take a look? |
Tried upgrade both cudnn-frontend and cudnn, and submitted a test build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1579205&view=results Worst case is that we may add an fallback to cudnn backend directly as before for the case that cannot be handled by cudnn frontend. |
The error was: [E:onnxruntime:yolov3, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'conv2d_2_0' Status Message: Failed to initialize CUDNN Frontend/onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] CUDNN_FE failure 8: HEURISTIC_QUERY_FAILED ; GPU=0 ; hostname=98d137446008 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=225 ; expr=s_.cudnn_fe_graph->create_execution_plans({heur_mode}); |
@gedoensmax, @JTischbein, is it a known issue that cudnn 9.6 has regression of support convolution for yolo v3? Here is cudnn 9.6 debug log:
|
c9c52e1
to
5944339
Compare
I downgraded the CUDA 12 image's CUDNN version back to 9.5, then the test passed.
It means we cannot use same the same cudnn version for both CUDA 11 and 12. But, that's ok. |
69ddb90
to
494982c
Compare
The new images contain the following updates: 1. Added Git, Ninja and VCPKG to all docker images 2. Updated CPU containers' GCC version from 12 to 14 3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6) 4. Addressed container supply chain warnings by building CUDA 12 images from scratch(avoid using Nvidia's prebuilt images) 5. Updated manylinux commit id to 75aeda9d18eafb323b00620537c8b4097d4bef48 Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.
The new images contain the following updates:
Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.