Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy mode client ? #1406

Closed
lboudard opened this issue Nov 23, 2021 · 4 comments
Closed

Deploy mode client ? #1406

lboudard opened this issue Nov 23, 2021 · 4 comments

Comments

@lboudard
Copy link

lboudard commented Nov 23, 2021

I've seen in multiple issues (see here) that spark-application/operator is not supposed to support client mode yet, even though it seems that it is available from specs definitions

Trying to set mode: client in spark-application job definition results in following error at spark-operator level:

/opt/spark/bin/spark-submit --master k8s://https://10.100.176.1:443 --deploy-mode client ...
21/11/23 14:03:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)

Though setting mode: cluster (default) actually calls the default spark docker image entrypoint, which seems by default submit/startup with client mode:
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh

and indeed at the startup of the driver:

[spark-kubernetes-driver] + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
[spark-kubernetes-driver] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.100.160.103 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///home/myapp/app/myapp/__main__.py test-spark

So I'm not really sure if the spark job submitted this way is actually in cluster mode or in client mode?

Version info:
chart-info: spark-operator-1.1.10
spark-operator version: v1beta2-1.2.3-3.1.1

Thanks!

@AlexNavara
Copy link

@lboudard Behind the scenes, "spark-submit" is eventually called with "--deploy-mode client" argument. This behavior is not specific to spark-operator, but to spark itself.
At the user level, you use that option to control the place where the driver process will run relatively to place where 'spark-submit' script is called. If you choose 'client' - then the driver process will run at the same place where "spark-submit" is called. If you pick 'cluster' - the driver will run in a container, managed by resource allocator (e.g. k8s pod, YARN container). But even in cluster mode, if you take a look at the driver startup logs in the container, you'll see that it is "spark-submit --deploy-mode client ...".
Now let's see what it means in the spark operator world. Here 'spark-submit' is called inside spark-operator container. So by setting "deploy-mode=client" here you actually force the driver process to run inside the operator container, not in a separate pod.

Concluding, the answer to your question is "your app runs in cluster mode".

@Wh1isper
Copy link

As far as I know Spark operator lets users do spark-submit commits using yaml and takes care of complex configuration issues.

As I answered #1652 (comment), if you want spark application on k8s, you can use client mode and configure it on k8s (spark.master, etc.) in the application. Pyspark users can use the sparglim package to build client mode applications quickly.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants