Support new cuFFT callbacks #8242

leofang · 2024-03-14T14:37:42Z

cuFFT now supports a new callback mechanism that is not only more performant but also more friendly to dynamic languages like Python and Julia, currently under public preview and will be released in a future CUDA version.

The new approach does not require any ugly compilation of host & device code; instead, we just need to use NVRTC to compile a user-provided device function (currently supporting raw CUDA C++ strings; once we support user-provided types in @cupyx.jit.rawkernel (#6663) we can also support pure Python functions as device functions) into LTO IR, and pass it to the new cuFFT API cufftXtSetJITCallback which will be responsible for linking the LTO IR with the cuFFT kernel.

The existing callback manager (_CallbackManager) is renamed to _LegacyCallbackManager, to be distinguished with the new _JITCallbackManager. My suggestion is once this feature is officially released (generally accessible), we add a deprecation warning to _LegacyCallbackManager. Users can currently pick either callback approach via the new option cp.fft.config.set_cufft_callbacks(..., cb_ver=...).

Currently, this feature requires users loading libcufft EA version: LD_PRELOAD=/path/to/EA/libcufft.so python ..., including running the test suite or sample code, so that the libcufft copy from CUDA Toolkit is not used. I'd like to have this capability included in CuPy sooner, so that users can give it a try and share feedbacks for final adjustment.

leofang added 21 commits September 29, 2023 18:44

Merge branch 'cccl' into jit_nvrtc

ef674d6

Merge branch 'main' into jit_nvrtc

7130dae

get LTO IR for device funcs

c82c765

dlopen cufftXtSetJITCallback

3f8ad41

Merge branch 'main' into jit_dev_lto

01f58d5

refactor and add jit callback mgr

72219eb

fix & add samples

de1b8bd

Merge branch 'jit_nvrtc' into jit_dev_lto

454d1de

Merge branch 'main' into jit_nvrtc

c3fc11f

Merge branch 'jit_nvrtc' into jit_dev_lto

3f6a90f

fix bug

45be396

Merge branch 'main' into jit_dev_lto

c5b351a

fix

656cac2

translate C++ sample to cupy

ad819b5

support passing LTO IR as callbacks

5f51ca6

add jit callback tests

a4899db

fix bug hidden; refactor test requirement

b30b927

support caching LTO IR (in-memory/disk)

5ad0c57

Merge branch 'main' into jit_dev_lto

ae0b8bd

Merge branch 'main' into jit_dev_lto

7549c56

remove Numba example for the time being

7b964eb

leofang marked this pull request as draft March 14, 2024 14:41

takagi self-assigned this Mar 15, 2024

takagi added cat:enhancement Improvements to existing features prio:medium labels Mar 15, 2024

leofang mentioned this pull request Mar 15, 2024

Support for CUFFT callbacks JuliaGPU/CUDA.jl#75

Open

make linters happy

6bc3fba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support new cuFFT callbacks #8242

Support new cuFFT callbacks #8242

leofang commented Mar 14, 2024 •

edited

Support new cuFFT callbacks #8242

Are you sure you want to change the base?

Support new cuFFT callbacks #8242

Conversation

leofang commented Mar 14, 2024 • edited

leofang commented Mar 14, 2024 •

edited