-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource limitation for the sidecar container on Autopilot #35
Comments
Please consider also that Autopilot is officially the default and recommended GKE since April. |
@songjiaxun Do you prefer to have this on https://issuetracker.google.com ? |
Thanks for the question. I admit that the pytorch example may not work in Autopilot clusters. I am actively working on the AI/ML application tests and will update the example yaml soon. @bhack are you a Googler by any chance? Could you DM me with more context? |
I've DM to you. It is not only pytorch, It will not work any real DL scenario as the CPU limit on large nodes for the sidecard it will be MAX: |
I think we have regressed a bit here. Now autopilot is going to accept unlimited/burstable resource on the sidecard: #61 But it "secretly" overriding with minimal resource. Manually scaling sidecar cpu resources it is going to not let the pod scheduling on Autopilot (E.g. >
|
whats the status as it relates to auto pilot here? |
Looking at the default pytorch example in this repository I see some performance incompatibilities with the minimum autopilot resources request[1].
I think that we will have many problem allocating sidecar resources if we have these high min limits in autopilot.
gcs-fuse-csi-driver/examples/pytorch/train-job-pytorch.yaml
Lines 35 to 39 in 8a8d871
[1]https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests
The text was updated successfully, but these errors were encountered: