Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic Exchange CUDA Error #16037

Open
GaryHuan9 opened this issue Nov 10, 2024 · 0 comments
Open

Atomic Exchange CUDA Error #16037

GaryHuan9 opened this issue Nov 10, 2024 · 0 comments
Labels
bug Something isn't working cuda CUDA back-end

Comments

@GaryHuan9
Copy link

Describe the bug

Hey! I am learning to use SYCL but I encountered a little issue when using sycl::atomic_ref::exchange. Things work fine on CPU, but when I switched to GPU even a very simple test (see below) crash with a CUDA error. Other atomic primitives such as store or load works fine.

To reproduce

  1. Include code snippet as short as possible
#include <sycl.hpp>

int main()
{
    sycl::queue queue(sycl::gpu_selector_v);
    std::cout << "Device: " << queue.get_device().get_info<sycl::info::device::name>() << std::endl;

    queue.submit([&](sycl::handler& diana)
    {
        sycl::stream out(1024, 256, diana);

        diana.parallel_for(1, [=](sycl::id<> id)
        {
            int memory = 3;
            sycl::atomic_ref<int,
                sycl::memory_order::relaxed,
                sycl::memory_scope::work_item> at(memory);

            int load = at.exchange(123);
            out << "id " << id << " load " << load << sycl::endl;
        });
    });

    queue.wait_and_throw();
}
  1. Specify the command which should be used to compile the program
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda main.cpp 
  1. Specify the command which should be used to launch the program
./a.out
  1. Indicate what is wrong and what was expected

This is my output; obviously it crashes which is not what one would expect.

Device: NVIDIA GeForce RTX 4090
<CUDA>[ERROR]: 
UR CUDA ERROR:
        Value:           719
        Name:            CUDA_ERROR_LAUNCH_FAILED
        Description:     unspecified launch failure
        Function:        urEnqueueMemBufferRead
        Source Location: /tmp/tmp.nlKu2FwFq5/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/enqueue.cpp:1777

terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Aborted (core dumped)

Environment

  • OS: Fedora Linux 40 x86_64 6.11.6-200.fc40.x86_64
  • Target device and vendor: NVIDIA GeForce RTX 4090
  • DPC++ version: icpx --version output:
Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.0/bin/compiler/../icpx.cfg
  • Dependencies version: Header of nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

And output of sycl-ls --verbose:

[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor             OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]

Platforms: 2
Platform [#1]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type              : cpu
        Version           : OpenCL 3.0 (Build 0)
        Name              : AMD Ryzen 9 3900X 12-Core Processor            
        Vendor            : Intel(R) Corporation
        Driver            : 2024.18.10.0.08_160000
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_private_alloca
        info::device::sub_group_sizes: 4 8 16 32 64
        Architecture: x86_64
Platform [#2]:
    Version  : CUDA 12.6
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : 8.9
        Name              : NVIDIA GeForce RTX 4090
        Vendor            : NVIDIA Corporation
        Driver            : CUDA 12.6
        UUID              : 1367131105491041301142711512019110415220878
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_1d_usm ext_oneapi_bindless_images_sample_2d_usm
        info::device::sub_group_sizes: 32
        Architecture: nvidia_gpu_sm_89
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : cpu, Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor             OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
custom_selector(cpu)    : cpu, Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor             OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
custom_selector(acc)    : No device of requested type available. Please chec...

Additional context

No response

@GaryHuan9 GaryHuan9 added bug Something isn't working cuda CUDA back-end labels Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

1 participant