[REFACTOR] cuda.parallel: Don't require passing input/output arrays to `reduce_into` and similar algorithms #3008

shwina · 2024-12-02T23:03:27Z

Currently, reduce_into usage looks like:

# construct the reduer:
reducer = cudax.reduce_into(d_in, d_out, op, h_init)

# allocate temp storage
temp_storage_bytes = reducer.reduce_into(None, d_in, d_out, op, h_init)
d_temp = cuda.device_array(temp_storage_bytes)
result = reducer.reduce_into(d_temp, d_in, d_out, op, h_init)

Note that the initial construction of reducer shouldn't strictly need the arguments d_in, d_out, h_init. In fact, passing any placeholder arrays of the same data type will serve the same purpose.

We should refactor such that only the required information is passed into the constructor.

Additional Context

#3001

The text was updated successfully, but these errors were encountered:

leofang · 2024-12-03T17:48:35Z

I was under the impression that d_in/d_out could be iterators too, in which case reduce_info would need to know them (as part of the problem definition) for later codegen?

rwgk · 2024-12-03T17:55:11Z

I was under the impression that d_in/d_out could be iterators too,

Yes. The API I have right now (in #2788) is hack-ish, just enough for full testing. I want to discuss with @shwina (today) what the iterator API should look like.

github-project-automation bot added this to CCCL Dec 2, 2024

github-project-automation bot moved this to Todo in CCCL Dec 2, 2024

shwina mentioned this issue Dec 2, 2024

[PERF] cuda.parallel: Cache intermediate results to improve performance of cudax.reduce_into #3001

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to `reduce_into` and similar algorithms #3008

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to `reduce_into` and similar algorithms #3008

shwina commented Dec 2, 2024

leofang commented Dec 3, 2024

rwgk commented Dec 3, 2024

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to reduce_into and similar algorithms #3008

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to reduce_into and similar algorithms #3008

Comments

shwina commented Dec 2, 2024

Additional Context

leofang commented Dec 3, 2024

rwgk commented Dec 3, 2024

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to `reduce_into` and similar algorithms #3008

[REFACTOR] cuda.parallel: Don't require passing input/output arrays to `reduce_into` and similar algorithms #3008