-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GESIS server is unstable #3056
Comments
Related to jupyterhub#2995 Related to jupyterhub#3056
What @arnim and I know based on the logs (copy at the end of this message) is that the server is taking very long to pull some images. GESIS server pulled But GESIS server pulled The very long time to pull some image is creating a queue on Kubernetes. Kubernetes Event Log
|
GESIS server continues to be very unstable. Grafana reports the following during the first 12 hours of last Monday (19 August 2024): The high number of Pending pods (yellow line) has been a constant since this issue was open. Grafana reports the following during the first 12 hours of this Monday (26 August 2024): Something happen that the number of Pending Pods (yellow line) was small on Monday morning. The point of inflection was on Saturday, 24 August 2024 around 11:00AM. My hypothesis is that something changed in the GESIS network but I'm waiting for confirmation. Unfortunately, around 8:15 of today (27 August 2024), the Pending pods started to be very high. |
The number of pending pods grow very fast on even after stop the server for a while. I looked at the event log and the waiting time to pull images keeps increasing:
|
The GESIS server got to a point that the number of pending / terminating pods are locking the Kubernetes Cluster. 😭 @arnim and I will have to reduce the number of users that we can serve. |
@arnim and @rgaiacs noticed that the GESIS server is unstable since July 31, 2024 at 6:29:45 PM GMT+2.
@arnim and @rgaiacs are in contact with GESIS IT to resolve the network issue that is causing the server to be unstable. Unfortunately, part of GESIS IT is on summer holiday and resolve the network issue will take longer.
Related Topics in Jupyter Community Forum
The text was updated successfully, but these errors were encountered: