-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Rework RapidsShuffleManager initialization for Apache Spark 4.0.0 #11107
Comments
Thanks for filing this. I do not know why we got an NPE here, I didn't get one when I tested the apache issue, so I am worried now that there's a bug somewhere. |
Our plugin init code currently assumes that the lazy shuffle manager instance |
This issue affects Databricks 14.3 as well
|
With apache/spark#43627 we eliminate the need to add the plugin jar via
spark.executor.extraClassPath
and paved the way to the simplified Boolean switch useRSM=true/false. Now would be a good time to do this work. At the minimum we need to fix theNullPointerException
issue resulting from the initialization order change.Steps/Code to reproduce bug
Start a local-cluster with RSM
JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 \ ~/dist/spark-4.0.0-preview1-bin-hadoop3/bin/spark-shell \ --jars scala2.13/dist/target/rapids-4-spark_2.13-24.08.0-SNAPSHOT-cuda11.jar --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.sql.explain=ALL \ --conf spark.rapids.memory.gpu.allocSize=1536m \ --conf spark.shuffle.manager=com.nvidia.spark.rapids.spark400.RapidsShuffleManager \ --master local-cluster[2,2,1024]
Note:
--conf spark.executor.extraClassPath=$PWD/scala2.13/dist/target/rapids-4-spark_2.13-24.08.0-SNAPSHOT-cuda11.jar
Run
Check the executor log
Additional context
[SPARK-45762][CORE] Support shuffle managers defined in user jars by changing startup order
razajafri#3
The text was updated successfully, but these errors were encountered: