-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
集群加入gpu后服务报错 #153
Comments
就是单机的集群配置都配好了之后,过一段时间就会在rancher中的许多服务里显示 Deployment does not have minimum availability. 然后看日志的话就是Readiness probe failed Liveness probe failed这样的问题 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我想问一下,就是在集群中加入了GPU之后,刚开始没报错,过了几个小时之后就是 cattle-cluster-agent canal coredns
kubeflow-prometheus-adapter 这些服务不停的重启更新,我看了一下大概是这三种,这种问题该怎么解决呢?
Readiness probe failed: Get http://10.42.0.14:9090/-/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Liveness probe failed: Get http://10.42.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 210.28.18.30 210.28.16.26 210.28.18.26
The text was updated successfully, but these errors were encountered: