Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localmem does not work for cpu backend? #544

Open
ww1g11 opened this issue Nov 15, 2024 · 1 comment
Open

localmem does not work for cpu backend? #544

ww1g11 opened this issue Nov 15, 2024 · 1 comment

Comments

@ww1g11
Copy link

ww1g11 commented Nov 15, 2024

Hi, I am trying to write a matvec kernel with shared memory, it works for CUDA backend. However, it results in an error when switched to CPU backend:

ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Stacktrace:
 [1] cpu_matvec_kernel!

How to fix it? many thanks.

The julia script is shown below:

using KernelAbstractions
using CUDA
using Test

@kernel function matvec_kernel!(output, @Const(A), @Const(b))
    I = @index(Global, Linear)
    I = div(I-1, 32) + 1
    idx = @index(Local, Linear)
    i = (idx - 1) % 32 + 1  #local index within the wrap

    cache_size = @uniform @groupsize()
    cache = @localmem eltype(output) cache_size

    N = size(A, 2)
    sum = zero(eltype(output))
    @inbounds begin
        for J = i:32:N
            sum += A[I, J] * b[J]
        end
        cache[idx] = sum
    end
    @synchronize

    j::Int = 16
    while j > 0
        if i <= j
            @inbounds cache[idx] += cache[idx + j]  # can not find j for cpu backend
        end
        @synchronize
        j = j ÷ 2
    end

    if i == 1
        @inbounds output[I] = cache[idx]
    end
    
end

function matvec!(output, A, b)
    backend = KernelAbstractions.get_backend(A)
    kernel! = matvec_kernel!(backend, 256)
    kernel!(output, A, b; ndrange=32*size(A, 1))
end


m, n = 2^10, 2^10
A = CUDA.rand(Float32, m, n)
b = CUDA.rand(Float32, n)
output = CUDA.rand(Float32, m)

matvec!(output, A, b)
@test isapprox(output, A * b)

matvec!(Array(output), Array(A), Array(b))

The versioninfo() gives:

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × 13th Gen Intel(R) Core(TM) i7-13700F
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 561.3.0

CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+561.3

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 3060 Ti (sm_86, 5.048 GiB / 8.000 GiB available)
@vchuravy
Copy link
Member

This is #262 @synchronize does not work within while loops.

You can use OpenCL.jl + POCL_jll to execute this code on the CPU, which we are working towards making the default for KA to fix bugs like these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants