Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV build up with Reclaim policy set Delete #2266

Open
7 tasks done
harshaisgud opened this issue Feb 8, 2023 · 3 comments · May be fixed by #3326
Open
7 tasks done

PV build up with Reclaim policy set Delete #2266

harshaisgud opened this issue Feb 8, 2023 · 3 comments · May be fixed by #3326
Labels
bug Something isn't working needs triage Requires review from the maintainers

Comments

@harshaisgud
Copy link

harshaisgud commented Feb 8, 2023

Checks

Controller Version

0.27.0

Helm Chart Version

0.22.0

CertManager Version

1.10.1

Deployment Method

Helm

cert-manager installation

Yes I have installed cert manager following the steps mentioned in documentation.

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
  • I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
  name: example-1
spec:
  replicas: 1
  organization: xyz
  labels: 
    - arc-1
    - linux
  selector:
    matchLabels:
      app: example
  serviceName: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: docker
        volumeMounts:
        - name: var-lib-docker
          mountPath: /var/lib/docker
  volumeClaimTemplates:
  - metadata:
      name: var-lib-docker
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 7Gi
      storageClassName: gh-ebs
      dataSource:
        name: ebs-volume-snapshot
        kind: VolumeSnapshot
        apiGroup: snapshot.storage.k8s.io
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gh-ebs
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

To Reproduce

1. Install ARC.
2. Start a runnerSet with volumeClaimTemplate.
3. Run a couple of workflows.
4. Observe that PVs with ReclaimPolicy delete are building up despite PVCs being deleted.

Describe the bug

Dynamically provisioned Persistent Volumes that are in an available state are unable to cleaned up by EBS CSI with the error that the volume is still attached to the node.
Example log : delete "pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a": volume deletion failed: persistentvolume pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a is still attached to node ip-10-10-2-152.eu-central-1.compute.internal.

Describe the expected behavior

Dynamically provisioned persistent volumes with ReclaimPolicy set to Delete should be deleted when PVC is deleted.

Whole Controller Logs

2023-02-06T12:33:19Z	DEBUG	runnerpersistentvolume	Retrying sync until pvc gets released	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a", "requeueAfter": "10s"}
2023-02-06T12:33:19Z	ERROR	Reconciler error	{"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim", "PersistentVolumeClaim": {"name":"var-lib-docker-nitro-1-5d5sx-0","namespace":"actions-runner-system"}, "namespace": "actions-runner-system", "name": "var-lib-docker-example-1-5d5sx-0", "reconcileID": "7aeac10f-6998-430e-8c2a-adc94b385299", "error": "Operation cannot be fulfilled on persistentvolumes \"pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a\": the object has been modified; please apply your changes to the latest version and try again"}
2023-02-06T12:33:19Z	INFO	runnerpersistentvolume	PV should be Available now	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a"}
2023-02-06T14:29:08Z	DEBUG	runnerpersistentvolume	Retrying sync until pvc gets released	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a", "requeueAfter": "10s"}
2023-02-06T14:29:08Z	INFO	runnerpersistentvolume	PV should be Available now	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a"}
2023-02-06T14:32:22Z	DEBUG	runnerpersistentvolume	Retrying sync until pvc gets released	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a", "requeueAfter": "10s"}
2023-02-06T14:32:22Z	INFO	runnerpersistentvolume	PV should be Available now	{"pv": "/pvc-df682ae3-3b7b-4599-bdce-e9b17dda2a7a"}

Whole Runner Pod Logs

Not really related to runner logs

Additional Context

I suspect the issue is because of pending finalizer [kubernetes.io/pv-protection] on the PV. Deleting the Persistent volumes in Kubernetes does not delete the AWS EBS volumes.

@harshaisgud harshaisgud added bug Something isn't working needs triage Requires review from the maintainers labels Feb 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@irasnyd
Copy link

irasnyd commented Dec 20, 2023

I am hitting the same bug. It began after my transition from the built-in EBS provisioner to the EBS CSI provisioner.

For example, using dynamically allocated PV/PVC with a StorageClass that looks like this works correctly (PVs don't build up forever):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2
parameters:
  fsType: ext4
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: false

However, a dynamically allocated PV/PVC with a StorageClass that looks like this builds up PVs:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
parameters:
  csi.storage.k8s.io/fstype: xfs
  encrypted: "true"
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: false

@midnattsol
Copy link

midnattsol commented Feb 27, 2024

We are hitting the same bug.

We're currently testing a solution. If it keeps working good after a couple of days I will make a pr.

For those who wants to test it as well, I have a custom image for the version v.0.26.7 in dockerhub Currently under testing

@mumoshu mumoshu linked a pull request Mar 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Requires review from the maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants