Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to reduce_into and similar algorithms #3008

Open
shwina opened this issue Dec 2, 2024 · 2 comments

Comments

@shwina
Copy link

shwina commented Dec 2, 2024

Currently, reduce_into usage looks like:

# construct the reduer:
reducer = cudax.reduce_into(d_in, d_out, op, h_init)

# allocate temp storage
temp_storage_bytes = reducer.reduce_into(None, d_in, d_out, op, h_init)
d_temp = cuda.device_array(temp_storage_bytes)
result = reducer.reduce_into(d_temp, d_in, d_out, op, h_init)

Note that the initial construction of reducer shouldn't strictly need the arguments d_in, d_out, h_init. In fact, passing any placeholder arrays of the same data type will serve the same purpose.

We should refactor such that only the required information is passed into the constructor.

Additional Context

#3001

@leofang
Copy link
Member

leofang commented Dec 3, 2024

I was under the impression that d_in/d_out could be iterators too, in which case reduce_info would need to know them (as part of the problem definition) for later codegen?

@rwgk
Copy link
Contributor

rwgk commented Dec 3, 2024

I was under the impression that d_in/d_out could be iterators too,

Yes. The API I have right now (in #2788) is hack-ish, just enough for full testing. I want to discuss with @shwina (today) what the iterator API should look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants