Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: initialization error Compile with `TORCH_USE_CUDA_DSA' #91

Open
gregbugaj opened this issue Nov 27, 2023 · 0 comments
Open

Comments

@gregbugaj
Copy link
Collaborator

Describe the bug

While migrating to torch-2.2.0.dev20231126+cu118 run into the issue. This could be due to the nature that this is a dev release but adding it for tracking purposes.

Describe how you solve it

Use stable version torch 2.1.1

Environment

(marie) greg@xpredator:~/dev/marieai/marie-ai$ python -m detectron2.utils.collect_env
-------------------------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
numpy                            1.24.1
detectron2                       0.6 @/home/greg/dev/3rdparty/detectron2/detectron2
detectron2._C                    not built correctly: /home/greg/dev/3rdparty/detectron2/detectron2/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNR5torch7Library4_defEOSt7variantIJN3c1012OperatorNameENS2_14FunctionSchemaEEEONS_11CppFunctionE
Compiler ($CXX)                  c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CUDA compiler                    Build cuda_11.8.r11.8/compiler.31833905_0
detectron2 arch flags            8.9
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          2.2.0.dev20231126+cu118 @/home/greg/environment/marie/lib/python3.10/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 4090 (arch=8.9)
Driver version                   545.23.06
CUDA_HOME                        /usr/local/cuda
Pillow                           9.3.0
torchvision                      0.17.0.dev20231126+cu118 @/home/greg/environment/marie/lib/python3.10/site-packages/torchvision
torchvision arch flags           3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                           0.1.5
iopath                           0.1.9
cv2                              4.8.1
-------------------------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.2.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 


Screenshots

ERROR  main        : extract_t/rep-0@1330319                                    
       RuntimeError('CUDA error: initialization                                 
       error\nCompile with `TORCH_USE_CUDA_DSA` to enable                       
       device-side assertions.\n') during 'WorkerRuntime'                       
       initialization                                                           
        add "--quiet-error" to suppress the exception                           
       details                                                                  
       Traceback (most recent call last):                                       
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/executo…                    
       line 143, in run                                                         
           runtime = AsyncNewLoopRuntime(                                       
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 93, in __init__                                                     
           self._loop.run_until_complete(self.async_setup())                    
         File "/usr/lib/python3.10/asyncio/base_events.py",                     
       line 649, in run_until_complete                                          
           return future.result()                                               
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 310, in async_setup                                                 
           self.server = self._get_server()                                     
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 215, in _get_server                                                 
           return GRPCServer(                                                   
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 34, in __init__                                                     
           super().__init__(**kwargs)                                           
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 70, in __init__                                                     
           ] = (req_handler or self._get_request_handler())                     
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 95, in _get_request_handler                                         
           return self.req_handler_cls(                                         
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 140, in __init__                                                    
           self._load_executor(                                                 
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/runtime…                    
       line 379, in _load_executor                                              
           self._executor: BaseExecutor =                                       
       BaseExecutor.load_config(                                                
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…                    
       line 792, in load_config                                                 
           obj = JAML.load(tag_yml, substitute=False,                           
       runtime_args=runtime_args)                                               
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…                    
       line 174, in load                                                        
           r = yaml.load(stream,                                                
       Loader=get_jina_loader_with_runtime(runtime_args))                       
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 81, in load                                                         
           return loader.get_single_data()                                      
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 51, in get_single_data                                              
           return self.construct_document(node)                                 
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 55, in construct_document                                           
           data = self.construct_object(node)                                   
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 100, in construct_object                                            
           data = constructor(self, node)                                       
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…                    
       line 582, in _from_yaml                                                  
           return get_parser(cls,                                               
       version=data.get('version', None)).parse(                                
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/jaml/parsers/…                    
       line 46, in parse                                                        
           obj = cls(                                                           
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/executo…                    
       line 58, in arg_wrapper                                                  
           f = func(self, *args, **kwargs)                                      
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/serve/helper.…                    
       line 74, in arg_wrapper                                                  
           f = func(self, *args, **kwargs)                                      
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/executor/text…                    
       line 98, in __init__                                                     
           self.pipeline =                                                      
       ExtractPipeline(pipeline_config=pipeline,                                
       cuda=use_cuda)                                                           
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/pipe/extract_…                    
       line 94, in __init__                                                     
           self.overlay_processor = OverlayProcessor(                           
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/overlay/overl…                    
       line 44, in __init__                                                     
           self.opt, self.model = self.__setup(cuda,                            
       checkpoint_dir)                                                          
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/overlay/overl…                    
       line 109, in __setup                                                     
           model = create_model(opt)                                            
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…                    
       line 75, in create_model                                                 
           instance = model(opt)                                                
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…                    
       line 45, in __init__                                                     
           self.netG = networks.define_G(opt.input_nc,                          
       opt.output_nc, opt.ngf, opt.netG,                                        
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…                    
       line 271, in define_G                                                    
           return init_net(net, init_type, init_gain,                           
       gpu_ids)                                                                 
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…                    
       line 151, in init_net                                                    
           net.to("cuda")                                                       
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 1152, in to                                                         
           return self._apply(convert)                                          
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 802, in _apply                                                      
           module._apply(fn)                                                    
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 802, in _apply                                                      
           module._apply(fn)                                                    
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 825, in _apply                                                      
           param_applied = fn(param)                                            
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 1150, in convert                                                    
           return t.to(device, dtype if                                         
       t.is_floating_point() or t.is_complex() else None,                       
       non_blocking)                                                            
         File                                                                   
       "/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…                    
       line 302, in _lazy_init                                                  
           torch._C._cuda_init()                                                
       RuntimeError: CUDA error: initialization error                           
       Compile with `TORCH_USE_CUDA_DSA` to enable                              
       device-side assertions.       
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant