You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In GpuAggregateExec we can re-partition data if it is too large to fit on the GPU. But if we get unlucky and the hashes skew to not enough buckets we might need to partition the data again. Currently this is done by updating the hash seed. and trying again.
We should have the hash seed only be a hash seed and not need to carry carry information about how many times a repatition has happened. We should also have a limit on the number of repartitions that we do, just so if something bad happens we don't get into a live lock situation. That limit can be huge like 20, and we can have a separate limit to log a warning, hopefully with more human readable code.
The text was updated successfully, but these errors were encountered:
Describe the bug
In GpuAggregateExec we can re-partition data if it is too large to fit on the GPU. But if we get unlucky and the hashes skew to not enough buckets we might need to partition the data again. Currently this is done by updating the hash seed. and trying again.
Some recent changes https://github.com/NVIDIA/spark-rapids/pull/11792/files removed a limit on the number of repartions that we can do. But the warning is printed out when some cryptic code
if (hasSeed +7 > 200)
.We should have the hash seed only be a hash seed and not need to carry carry information about how many times a repatition has happened. We should also have a limit on the number of repartitions that we do, just so if something bad happens we don't get into a live lock situation. That limit can be huge like 20, and we can have a separate limit to log a warning, hopefully with more human readable code.
The text was updated successfully, but these errors were encountered: