You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to run ARC Runner Scale Set on EKS using Bottlerocket Linux and my experience has not been good so far.
I pasted below some of the issues I have encountered (some not related to ARC). For now I will keep using ARC Legacy with DinD, although it's not great it has been stable in our systems. Feel free post your issues and solutions in the comments below. Hopefully with the shared experience this process can be a bit easier.
containerMode: kubernetes
Problem: Node OutofCPU: Cause: If ACTIONS_RUNNER_USE_KUBE_SCHEDULER is set to false, github container hook will create runner and worker pods in the same node. At some point, due to the container requests, the code tries to place runner and worker in the same node but kube scheduler rejects it, informing the node is out of resources.
Problem: Runner and worker container being created in different nodes and trying to attach the same EBS volume. Cause: When setting ACTIONS_RUNNER_USE_KUBE_SCHEDULER to true, Kube Scheduler will spawn the runner and worker in different nodes. Runner and Worker need to share the same volume. Using this option requires a storage that supports ReadWriteMany like EFS.
Problem: Actions runner container hook fails:
[WORKER 2024-03-25 19:14:14Z ERR StepsRunner] Caught exception from step: System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
[WORKER 2024-03-25 19:14:14Z ERR StepsRunner] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
”"TypeError [ERR_INVALID_ARG_TYPE]: The \"path\" argument must be of type string. Received null"
Cause: To be investigated. But once volumes were removed from containers in the job definition, the jobs succeeded
Problem: Jobs running on EFS take ages to complete Cause: In our workflow we have jobs that manage multiple small files (e.g.: node_modules). EFS is not suitable for that type of workload. Open and close file operations increase a lot the time. Jobs that used to take 3 minutes took 58 minutes to complete. Going for provisioned throughput to match EBS 125 Mb/s would make it become expensive
EFS setup:
Performance Mode: General Purpose
Throughput Mode: Bursting (50 KB/s per GB)
Problem: tar decompressing failing due to permissions when using EFS. “Cannot change ownership to uid xxxx, gid xxxx: Operation not permitted". Cause: Container security context fsGroup needs to match the runner 1001. On EFS I set uid and gid to be 0 and it solved the problem
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have been trying to run ARC Runner Scale Set on EKS using Bottlerocket Linux and my experience has not been good so far.
I pasted below some of the issues I have encountered (some not related to ARC). For now I will keep using ARC Legacy with DinD, although it's not great it has been stable in our systems. Feel free post your issues and solutions in the comments below. Hopefully with the shared experience this process can be a bit easier.
containerMode: kubernetes
Problem: Node OutofCPU:
Cause: If ACTIONS_RUNNER_USE_KUBE_SCHEDULER is set to false, github container hook will create runner and worker pods in the same node. At some point, due to the container requests, the code tries to place runner and worker in the same node but kube scheduler rejects it, informing the node is out of resources.
Problem: Runner and worker container being created in different nodes and trying to attach the same EBS volume.
Cause: When setting ACTIONS_RUNNER_USE_KUBE_SCHEDULER to true, Kube Scheduler will spawn the runner and worker in different nodes. Runner and Worker need to share the same volume. Using this option requires a storage that supports ReadWriteMany like EFS.
Problem: Actions runner container hook fails:
Cause: To be investigated. But once volumes were removed from containers in the job definition, the jobs succeeded
Problem: Path denied issue on Bottlerocket Linux
Cause: Selinux config Downloading runner update fails: "An error occurred: Access to the path is denied" · Issue #981 · actions/runner
Problem: Jobs running on EFS take ages to complete
Cause: In our workflow we have jobs that manage multiple small files (e.g.: node_modules). EFS is not suitable for that type of workload. Open and close file operations increase a lot the time. Jobs that used to take 3 minutes took 58 minutes to complete. Going for provisioned throughput to match EBS 125 Mb/s would make it become expensive
EFS setup:
Performance Mode: General Purpose
Throughput Mode: Bursting (50 KB/s per GB)
Problem: tar decompressing failing due to permissions when using EFS. “Cannot change ownership to uid xxxx, gid xxxx: Operation not permitted".
Cause: Container security context fsGroup needs to match the runner 1001. On EFS I set uid and gid to be 0 and it solved the problem
Beta Was this translation helpful? Give feedback.
All reactions