-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enterprise-gateway does not connect to k8s kernel when istio is configured #1168
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
I should also point out that I tracked the error down to this line in the jupyter-server project: https://github.com/jupyter-server/jupyter_server/blob/7d2154a1e243f80ed5fc4c067fd022e32f3fc8f0/jupyter_server/gateway/managers.py#L70 |
Hi @tahesse - thanks for opening this issue and the great details! I will try to take a look into this next week but if anyone else wants to look into it, that would be great! The kernel logs look okay and given EG never appears to receive the kernel connection information, implies there's something amiss between the kernel pod and the EG pod. I'm assuming they're running within the same network - correct?
Since it appears it's finding the public local IP you might try setting the env I will try to deploy your helm chart in my environment upon my return next week although I suspect this issue stems from the cluster's configuration more than EG and probably won't reproduce the issue (but we'll see). |
@kevin-bates thank you for the reply!
I'm 99% sure because they run in the same Kubernetes Cluster in different namespace (it does work for other services though). I also did connect via ssh to the remote kernel pod and was able to communicate with the REST API.
I will give I will let you know about the outcome. |
Yes, internal is preferred unless the kernel is running in an external network. |
@kevin-bates I started my local jupyterlab with: EG_PROHIBITED_LOCAL_IPS='10.100.*.*' python3 -m jupyterlab --debug \
--gateway-url=http://enterprise-gateway.ns-jupyter:8888 \
--GatewayClient.http_user=guest \
--GatewayClient.http_pwd=guest-password \
--GatewayClient.request_timeout=240.0 \
--GatewayClient.connect_timeout=240.0 My remote kernel logs: /usr/local/bin/bootstrap-kernel.sh env: SHELL=/bin/bash KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_SERVICE_PORT=443 KERNEL_NAME=python_kubernetes HOSTNAME=guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3 LANGUAGE=en_US.UTF-8 KERNEL_SPARK_CONTEXT_INIT_MODE=none KERNEL_ID=51985e3e-f3e8-4a34-a4e2-69d44c201ce3 NB_UID=1000 PWD=/home/jovyan RESPONSE_ADDRESS=10.100.44.42:8877 MINICONDA_MD5=87e77f097f6ebb5127c77662dfc3165e HOME=/home/jovyan LANG=en_US.UTF-8 KUBERNETES_PORT_443_TCP=tcp://172.20.77.1:443 NB_GID=100 XDG_CACHE_HOME=/home/jovyan/.cache/ SHLVL=0 CONDA_DIR=/opt/conda MINICONDA_VERSION=4.8.2 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_ADDR=172.20.77.1 PORT_RANGE=0..0 KERNEL_USERNAME=guest KERNEL_LANGUAGE=python CONDA_VERSION=4.8.2 NB_USER=jovyan KUBERNETES_SERVICE_HOST=172.20.77.1 LC_ALL=en_US.UTF-8 KUBERNETES_PORT=tcp://172.20.77.1:443 KUBERNETES_PORT_443_TCP_PORT=443 PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin PUBLIC_KEY=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEfiWkzCCMl/VFI8J2042RvWh13bSihVo+xp6HQnnQ8YWO5MsyW/nelzcMa2eBJWB+Yg/IQ/0q6BRog7oqDpUNbUxwGSzU3TyBYeRQCtXynR/EjFNyswE6gQrg15GbFxwmz4nfMkKXtlpItLrslcUqVY+wlUd+sdbJe9YMLp3REwIDAQAB DEBIAN_FRONTEND=noninteractive KERNEL_NAMESPACE=guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3 _=/usr/bin/env
+ python /usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py --kernel-id 51985e3e-f3e8-4a34-a4e2-69d44c201ce3 --port-range 0..0 --response-address 10.100.44.42:8877 --public-key MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEfiWkzCCMl/VFI8J2042RvWh13bSihVo+xp6HQnnQ8YWO5MsyW/nelzcMa2eBJWB+Yg/IQ/0q6BRog7oqDpUNbUxwGSzU3TyBYeRQCtXynR/EjFNyswE6gQrg15GbFxwmz4nfMkKXtlpItLrslcUqVY+wlUd+sdbJe9YMLp3REwIDAQAB --spark-context-initialization-mode none
[D 2022-10-10 06:45:21,298.298 launch_ipykernel] Using connection file '/tmp/kernel-51985e3e-f3e8-4a34-a4e2-69d44c201ce3_kpviwi08.json'.
[I 2022-10-10 06:45:21,300.300 launch_ipykernel] Signal socket bound to host: 0.0.0.0, port: 59223
[D 2022-10-10 06:45:21,301.301 launch_ipykernel] JSON Payload 'b'{"shell_port": 44921, "iopub_port": 46351, "stdin_port": 38143, "control_port": 32833, "hb_port": 43811, "ip": "0.0.0.0", "key": "b6f18ebe-f585-4a45-9897-6f347c3f6ae3", "transport": "tcp", "signature_scheme": "hmac-sha256", "kernel_name": "", "pid": 9, "pgid": 7, "comm_port": 59223, "kernel_id": "51985e3e-f3e8-4a34-a4e2-69d44c201ce3"}'
[D 2022-10-10 06:45:21,348.348 launch_ipykernel] Encrypted Payload 'b'eyJ2ZXJzaW9uIjogMSwgImtleSI6ICJYd1ZyU2MwUWFoSlIwZHJ3YWNmaThaYTYzTWtQM3ZkTGtEMnl2b0NJc0I5SUMyOTlSU3A4c2w2N1d3VGxXSzBtME1ERXpjU3VVdzJVMjltL0R3aWhTNVVpaDFmZk9JaU5RRGxwcDhkKzdDSHM2c3ZXZnE0S29TOWYrMjlxYjl0WDlGdmVXRXNXbXlCc1hWeTZVTDZRZG90QUJXc29SUGE2YzI4UVc2SGlGUXM9IiwgImNvbm5faW5mbyI6ICJaaXFmeFl3UUxobW5HdUxGY0N4S285SFZVSEcrcFYzS3RWaTg5UDdkQnF0bi9EeWMvclo2eUVLaHhkSWpRSXR1Sm9URDZuTzFEN3FDN2pCVFhWTmZ4akRGNjlCYXBnUWVQVzFrOXN5dnRWK0lBTDM1MnpzWFhKeWgxZFE4ZUFyM2F1Mm1tWUFRMVExRzZvbG5kSTlBS2hrSk5KRWo4SC9QVE5zWU9lMFpPZUtpMlF1YTk4QmNRZ3dSaGgzSGpXTE92ZmJBejdBelYvREdzN0hZYjVZSERDUmVuNk1iaElBV21Za1ZQV21mMjB1VlorK2kwSVg5eUFBVDZ2YUR6UWkvcnNEWVNHQ2dUOVhDQ0o5Uk9venBDTXp4NkJuNXJ3Ly9qWGgzNGZqTjRkSGdOa0RMWEtWMWx1QUpDaTJDZ2pUWUxJL2loTkVWeTFwVWl5cnlneUJadG0vdys0eHlJd0F3ay9nK0ZVTjlQaC9sUG92MDNoVnJZTUdOa1JrdjJoVm9vNTA2dVFMbE1kR04zY3dIT294TFdYTk5qZWxUVXBsUkNsUGV4UHFTRS9XMEZ0U0dyYWRNTDVMSExzUEI3TmFmNi9uSjloYmNKN2hvUEtNRWZUSHJ3QT09In0=' and enterprise-gateway logs: [D 2022-10-10 06:49:08.493 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:09.022 EnterpriseGatewayApp] 417: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:09.048 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:09.578 EnterpriseGatewayApp] 418: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:09.606 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:10.131 EnterpriseGatewayApp] 419: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:10.158 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:10.689 EnterpriseGatewayApp] 420: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:10.712 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:11.234 EnterpriseGatewayApp] 421: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:11.255 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:11.779 EnterpriseGatewayApp] 422: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:11.807 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:12.336 EnterpriseGatewayApp] 423: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:12.363 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:12.896 EnterpriseGatewayApp] 424: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:12.925 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:13.448 EnterpriseGatewayApp] 425: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:13.475 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:13.999 EnterpriseGatewayApp] 426: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:14.027 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:14.558 EnterpriseGatewayApp] 427: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:14.586 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:15.121 EnterpriseGatewayApp] 428: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:15.151 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:15.687 EnterpriseGatewayApp] 429: Waiting to connect to k8s pod in namespace 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3'. Name: 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3', Status: 'Running', Pod IP: '10.100.18.194', KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3'
[D 2022-10-10 06:49:15.715 EnterpriseGatewayApp] Waiting for KernelID '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' to send connection info from host 'guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3' - retrying...
[D 2022-10-10 06:49:16.268 EnterpriseGatewayApp] KubernetesProcessProxy.terminate_container_resources, pod: guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3.guest-51985e3e-f3e8-4a34-a4e2-69d44c201ce3, kernel ID: 51985e3e-f3e8-4a34-a4e2-69d44c201ce3 has been terminated.
[E 2022-10-10 06:49:16.275 EnterpriseGatewayApp] KernelID: '51985e3e-f3e8-4a34-a4e2-69d44c201ce3' launch timeout due to: Waited too long (238.0s) to get connection file
[E 221010 06:49:16 web:2239] 500 POST /api/kernels (127.0.0.1) 238663.10ms I don't quite understand why enterprise-gateway still tries to work with the remote IP address in
Thank you for the information and help so far! |
Just to be follow up, I am currently stuck at this point. Is there any way that I can potentially start a kernel locally to debug it? |
This is setting the env
(I'm not certain whether the quotes are necessary or not.) You can
You could try using Your primary issue wrt this last exercise is that you're not setting the env into the appropriate process. I recommend sticking with the K8s env for a bit longer since the symptoms are somewhat specific to that env. |
Thanks @kevin-bates. Setting the I have enterprise-gateway with istio deployed, I will remove it and retry. If that doesn't help, I will set Meanwhile, would an |
Hmm - this should not have any bearing on the accessibility of EG from applications. Could you clarify what you mean by prevents all connections to/from enterprise-gateway?
Hmm, might istio be preventing the response to port
I'm not sure how useful this experiment will be and may not be worth the effort.
Could you please clarify what you mean by this as well? Typically k8s deployments are performed via helm or some other form of yaml and you're free to add whatever you want - so some details would be helpful. |
I cannot communicate with the REST API from within the cluster: curl -vvv http://enterprise-gateway.ns-jupyter:8888/api/kernelspecs 7 ↵ tahesse@gauss
* Trying 127.1.41.1:8888...
* connect to 127.1.41.1 port 8888 failed: Connection refused
* Failed to connect to enterprise-gateway.ns-jupyter port 8888 after 10 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to enterprise-gateway.ns-jupyter port 8888 after 10 ms: Connection refused
I have no ingress running for enterprise-gateway; is an ingress mandatory?
Sure! For the Jupyterhub k8s deployment (via helm), there is this option in the https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/jupyterhub/values.yaml#L76 which is used as such in the hub deployment: The advantage is that an operator can directly add the env vars in the values-override.yaml instead of modifying the deployment.yaml (especially nice, when extending the helm chart). |
I see, yes, that is helpful. A PR would be great!
No, ingress is not mandatory. Just some form of a reverse proxy is recommended and it looks like you're using Hub. I'm not familiar with |
Sorry, I hope that I can clear up the confusion.
That is my ultimate goal but I first try to make it work with jupyterlab because the development/fix cycle is faster and there are less moving parts.
We use
I will craft one after I got it running. :) Update on my side: I know have added
to the EG deployment.yaml and it seems that the response_address is not propagated to the kernel as I'd expect from looking at whereas the environmental variables exist in the enterprise-gateway pod: jovyan@enterprise-gateway-6bc565d956-gb4s2:/usr/local/bin$ printenv | grep EG_RESPONSE
EG_RESPONSE_ADDRESS=172.20.71.6:8877
EG_RESPONSE_PORT=8877 Kernel pod logs+ python /usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py --kernel-id 4b6d95f4-241d-4644-b622-f6ff4b54814a --port-range 0..0 --response-address 10.100.35.193:8877 --public-key MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDSQP9YFtzoY1v+VwYXd09x/fNEDSFIASwjoAoNA5jiOAKQujgw/xxBge1SnovvlGDjOFkkuK1bfRvECYnHafM98hRGlRVGXzbbw5d6hDHUXQMdXgh1JQJFAV8vMI6o3Sqm3ZJRodYuUDvPbbJRNhSbQEEVuzZN5R5p382gxUUFTQIDAQAB --spark-context-initialization-mode none /usr/local/bin/bootstrap-kernel.sh env: SHELL=/bin/bash KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_SERVICE_PORT=443 KERNEL_NAME=python_kubernetes HOSTNAME=guest-4b6d95f4-241d-4644-b622-f6ff4b54814a LANGUAGE=en_US.UTF-8 KERNEL_SPARK_CONTEXT_INIT_MODE=none GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_SERVICE_PORT_SPARK_DRIVER_UI_PORT=4040 KERNEL_ID=4b6d95f4-241d-4644-b622-f6ff4b54814a NB_UID=1000 GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_SERVICE_PORT=4040 GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_PORT_4040_TCP_PROTO=tcp PWD=/home/jovyan RESPONSE_ADDRESS=10.100.35.193:8877 GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_PORT_4040_TCP=tcp://172.20.18.60:4040 MINICONDA_MD5=87e77f097f6ebb5127c77662dfc3165e HOME=/home/jovyan LANG=en_US.UTF-8 KUBERNETES_PORT_443_TCP=tcp://172.20.77.1:443 NB_GID=100 GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_PORT_4040_TCP_PORT=4040 XDG_CACHE_HOME=/home/jovyan/.cache/ SHLVL=0 CONDA_DIR=/opt/conda MINICONDA_VERSION=4.8.2 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_ADDR=172.20.77.1 PORT_RANGE=0..0 KERNEL_USERNAME=guest KERNEL_LANGUAGE=python CONDA_VERSION=4.8.2 NB_USER=jovyan KUBERNETES_SERVICE_HOST=172.20.77.1 LC_ALL=en_US.UTF-8 KUBERNETES_PORT=tcp://172.20.77.1:443 KUBERNETES_PORT_443_TCP_PORT=443 PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_PORT=tcp://172.20.18.60:4040 GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_PORT_4040_TCP_ADDR=172.20.18.60 PUBLIC_KEY=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDSQP9YFtzoY1v+VwYXd09x/fNEDSFIASwjoAoNA5jiOAKQujgw/xxBge1SnovvlGDjOFkkuK1bfRvECYnHafM98hRGlRVGXzbbw5d6hDHUXQMdXgh1JQJFAV8vMI6o3Sqm3ZJRodYuUDvPbbJRNhSbQEEVuzZN5R5p382gxUUFTQIDAQAB GUEST_1F146A87_8304_41D7_9193_8584C02CF412_UI_SVC_SERVICE_HOST=172.20.18.60 DEBIAN_FRONTEND=noninteractive KERNEL_NAMESPACE=ns-spark-apps _=/usr/bin/env [D 2022-10-12 08:39:46,466.466 launch_ipykernel] Using connection file '/tmp/kernel-4b6d95f4-241d-4644-b622-f6ff4b54814a_v5j3dkkk.json'. [I 2022-10-12 08:39:46,470.470 launch_ipykernel] Signal socket bound to host: 0.0.0.0, port: 39427 Traceback (most recent call last): File "/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py", line 616, in connection_file, response_addr, lower_port, upper_port, kernel_id, public_key File "/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py", line 269, in return_connection_info s.connect((response_ip, response_port)) ConnectionRefusedError: [Errno 111] Connection refused I will test if dropping envoy proxies does help. |
I can confirm that the issue is due to istio, i.e. the envoy proxy sidecars. I guess that the hotfix for now is to not deploy it with istio. @kevin-bates Is there interest from the enterprise-gateway maintainer side to support istio? |
Great news.
We are always interested in supporting configurations our users need. That said, I don't think any of the current maintainers have the bandwidth and/or resources to take this on - so istio's support would need to come in the form of a contribution. Regarding your previous troubleshooting, setting the env When you got this working by removing Istio from the equation, what kind of response address was computed? |
Maybe https://github.com/splunk/jupyterhub-istio-proxy can serve as a blueprint for implementation. I have a tight schedule but maybe I can pour some time into it. Do you want to keep this issue open or start a separate issue for the istio service mesh extension? I also noticed that if enterprise-gateway runs outside of the istio service mesh, it won't even start python based spark-operator kernels (when spark-operator is running in the service mesh). It won't start a spark driver.
That fallback is a really nice to know! Thank you! :)
It is an IP within the same /16 subnet, despite the + python /usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py --kernel-id 443cbb3d-f70f-44a5-8670-3ca57938ccd6 --port-range 0..0 --response-address 10.100.21.198:8877 --public-key MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCrzyqh/7jryyFFvLJ20XDI1rsGdatlROT7in70oJCfR2F6FEhwdexv1cVleM6OTTN8NLbvZnUPk+lOKuYxrfNqjJO9wqEd27hM/MtbYPvL5e5v92LH5xiaagWdI7KQfWQfH1t3vnZ4PtoJsxb45ZQvIiDg0vSMjw8NxWhDZpeOxwIDAQAB --spark-context-initialization-mode none |
I don't know anything about Istio, but this implies it gets involved in intra-cluster communications (between pods). Is that correct? I was hoping this could be something that is configured (either via helm) or within the kernel-pod.yaml used to launch the kernel pods and not require "source code" changes. |
Yes, istio is basically spawns a sidecar when instructed and handles communication between pods through envoy proxies (thus allowing for secure and traceable transmission).
I can promise that I'll look for the least invasive solution that makes enterprise-gateway with istio work. I'm also puzzled why it doesn't work in the first place because the communication between pods is merely tunneled through the proxies AFAIU. I currently look into solutions that target the helm/declarations:
|
Let's keep this issue open. I've gone ahead and amended the title to include the istio context. Thank you for your help. |
@kevin-bates I tried to set annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }
traffic.sidecar.istio.io/excludeOutboundPorts: "8877" on the kernel pod but without success. I will try to debug the network connection from enterprise-gateway through envoy proxies to the kernel pod and vice versa and get back with the results. |
@kevin-bates I did some tests on a clean cluster with EG-unrelated pods/services and monitored the istio traffic. AFAIU istio does pod to pod communication via It makes sense given that the Hence, my proposal for now is to add another k8s resource ( What do you think? I'll test it meanwhile in my cluster and post some code to my ideas asap. EDIT: https://istio.io/latest/docs/ops/deployment/requirements/#pod-requirements specifically states:
Is there any way to nicely debug enterprise-gateway during development? I've worked myself through https://jupyter-enterprise-gateway.readthedocs.io/en/latest/contributors/system-architecture.html and https://jupyter-enterprise-gateway.readthedocs.io/en/latest/contributors/devinstall.html but the development process is still kind of slow. |
Hi @tahesse.
I think a configurable approach is ideal, one that we can easily document and enable via helm deployments - thank you.
What does it mean to "belong to a Kubernetes service"? I'm assuming this implies the Service and the (kernel) pod must reside in the same namespace. Since kernel pods are primarily run in namespaces outside of EG's, does this imply that each launch of the kernel will result in the creation of its own service? And, if folks are specifying their own kernel namespace (via
Sorry for the hassles here. I use a Mac and run Rancher Desktop for my k8s development. My typically iteration is:
I create aliases for the helm deployments... alias eg_deploy='helm upgrade --install enterprise-gateway etc/kubernetes/helm/enterprise-gateway -n enterprise-gateway'
alias eg_remove='helm delete enterprise-gateway -n enterprise-gateway' and another to tail the EG logs... alias eg_logs='kubectl logs -f deployment.apps/enterprise-gateway -n enterprise-gateway' If others have an easier workflow, I'd love to hear from you. |
I think they refer to association through label and selector between pod/deployment and service. Thanks for posting your workflow, that helps a lot for sure! I'm running my stuff on silicon Mac M1 which is quite the hassle with arm VS amd arch. My Kubernetes (EKS) was partitioned for. multi-tenant use using Loft. Roughly the same as Rancher I think, that is k8s in docker (kind). TYSM! So far, I had to modify few python files to get some "istio_enabled" pivot which can be leveraged to spawn an additional headless k8s service, I'll do some testing tomorrow with the workflow steps you posted. Thanks again! |
What do you think about having a kubernetes Read kubernetes best practices for more information: https://kubernetes.io/docs/concepts/configuration/overview/#services So far, my changes do not seem to be working with the headless service. I don't quite understand if the socket connection might be an issue and moreover, it looks like there are socket connections on both sides which is quite confusing. I am not sure whether that is due to the missing ports (https://istio.io/latest/docs/ops/configuration/traffic-management/traffic-routing/#headless-services) which are generated within the Needs more investigation... |
You might be interested in #1181. This enables the ability to add your Regarding the inability of the server to receive the connection information from the kernel pod, that's odd. However, one of the intentions of the single response address PR was to expose the response port outside of the EG service, but, assuming your kernel pods are within the same cluster, that shouldn't be necessary. I'm sorry I'm not familiar with Istio. |
RE. #1181 the
Fair enough, the introduced change regarding single response address sounds good and I see the problem with istio here. I will now try to replace all IP with service (i.e. going through DNS rather than IP) communications. Somehow enterprise-gateway is picking up on the envoy proxies but envoy reverse proxies block the communication it seems. Normally, istio should allow that communication to happen if it bypasses the envoy reverse proxies.
|
Hi, Any update here how to get enterprise gateway work with istio? If any one has solved can you please provide details? |
Hey, maybe kubeflow/spark-operator#1652 provides some help to you? |
Hello enterprise-gateway team!
I checked all other issues in the jupyter-server organization and googled a lot without luck yet.
Description
I am trying to run enterprise-gateway in my kubernetes cluster to be able to run remote kernels. I installed
enterprise-gateway via helm with the chart from this repository. I extended the helm chart with my own helm chart
which does install a namespace with istio labels before installing enterprise-gateway. My
values-override.yaml
looks as follows:
For testing purposes, I ran kubefwd for all namespaces to be able to communicate with the enterprise-gateway service.
I can successfully call the enterprise-gateway REST endpoints from the CLI, e.g.
curl http://enterprise-gateway.ns-jupyter:8888/api/kernelspecs | jq
yields
this JSON response.
However, when I try to spawn a remote kernel via CLI
or via jupyter lab
it spawns the remote kernel
remote kernel logs
but enterprise-gateway is unable to connect to the remote kernel as can be seen from the enterprise-gateway pod logs.
Reproduce
values-override.yaml
from the previous section.'python_kubernetes'
kernel.Expected behavior
I would expect that enterprise-gateway receives the connection information from the remote kernel so that
enterprise-gateway does not timeout.
Context
Command Line Output
If I can provide any more information, please let me know!
Thank you for your work on enterprise-gateway, and thanks for any help / pointers in the right direction! :)
The text was updated successfully, but these errors were encountered: