Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Spark 3.5.3 Kernel #1392

Open
fatihmete opened this issue Oct 25, 2024 · 2 comments
Open

Custom Spark 3.5.3 Kernel #1392

fatihmete opened this issue Oct 25, 2024 · 2 comments
Labels

Comments

@fatihmete
Copy link

Hello everyone. I am using Jupyter Enterprise Gateway with PySpark sessions on Kubernetes. The elyra/kernel-spark-py:3.2.3 image works as expected.

I modified the image and rebuilt it to upgrade the Spark version to 3.5.3. When I start this kernel through JEG, the Spark driver and executor pods are created and run as expected. However, within the notebook, the spark variable is stuck in the WaitingForSparkSessionToBeInitialized value. If I redefine it with spark=SparkSession.builder.getOrCreate(), it doesn't give an error and works.

@fatihmete fatihmete added the bug label Oct 25, 2024
@fatihmete
Copy link
Author

fatihmete commented Nov 1, 2024

I did some changes in the launch_ipykernel.py file. I removed all codes related with background thread in initialize_spark_session function. Now it works, but the kernel waits about 2 minutes in unknown status after driver and executor pods started.
Also this issue exist in 3.4.4 version of spark.

image

@fatihmete
Copy link
Author

fatihmete commented Dec 15, 2024

Latency in starting spark notebook is relating to web socket timeout. I found the error shown below in EnterpriseGateway logs:

[W 2024-12-13 .... EnterpriseGatewayApp] Websocket ping timeout after 90537ms. 
[W 2024-12-13 .... EnterpriseGatewayApp] Websocket closed <kernel_id>
...

The error doesn't affect spark notebook, notebook works as expected. So I tried to find how to decrease timeout seconds. I found a environment variable described as below in documentation:

--EnterpriseGatewayApp.ws_ping_interval=<Int>
    Specifies the ping interval(in seconds) that should be used by zmq port
     associated withspawned kernels.Set this variable to 0 to disable ping mechanism.
    (EG_WS_PING_INTERVAL_SECS env var)
    Default: 30

I set EG_WS_PING_INTERVAL_SECS to 1, then timeout seconds decreased to 30 seconds. When I tried 0 second, It waited 90 seconds again. The variable configures tornado web application parameters and doesn't work like described in documentation. Finally I changed related parameters directly on enterprisegatewayapp.py file. I succeeded to disable ping mechanism. Now starting of spark 3.5.3 notebooks take about 20 seconds.

Parameters related to websocket timeout:

  • websocket_ping_interval
  • websocket_ping_timeout

enterprisegatewayapp.py related line:

ws_ping_interval=self.ws_ping_interval * 1000,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant