Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources that don't match the label selector are backed up #7750

Open
ywk253100 opened this issue Apr 28, 2024 · 11 comments
Open

Resources that don't match the label selector are backed up #7750

ywk253100 opened this issue Apr 28, 2024 · 11 comments
Assignees

Comments

@ywk253100
Copy link
Contributor

The workload:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: etcd0-pv-claim
spec:
  storageClassName: "default"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: etcd
    etcd_node: etcd0
  name: etcd0
  annotations:
    "pre.hook.backup.velero.io/container": "etcd0"
    "pre.hook.backup.velero.io/command": "[\"/usr/local/bin/etcd\", \"-h\"]"
spec:
  volumes:
    - name: etcd0-storage
      persistentVolumeClaim:
        claimName: etcd0-pv-claim
  containers:
    - command:
        - /usr/local/bin/etcd
        - --name
        - etcd0
        - --initial-advertise-peer-urls
        - http://etcd0:2380
        - --listen-peer-urls
        - http://0.0.0.0:2380
        - --listen-client-urls
        - http://0.0.0.0:2379
        - --advertise-client-urls
        - http://etcd0:2379
        - --initial-cluster
        - etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
        - --initial-cluster-state
        - new
      image: quay.io/coreos/etcd:latest
      volumeMounts:
        - mountPath: "/etcd0.etcd"
          name: etcd0-storage
      name: etcd0
      ports:
        - containerPort: 2379
          name: client
          protocol: TCP
        - containerPort: 2380
          name: server
          protocol: TCP
  restartPolicy: Always

The backup command:

velero backup create test12 --include-namespaces default --snapshot-volumes --wait --selector app=etcd

The backup result:

velero backup describe test12 --details
Name:         test12
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.28.7+vmware.1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=28

Phase:  Completed


Namespaces:
  Included:  default
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=etcd

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  true
Snapshot Move Data:          true
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-04-28 08:37:32 +0000 UTC
Completed:  2024-04-28 08:38:11 +0000 UTC

Expiration:  2024-05-28 08:37:32 +0000 UTC

Total items to be backed up:  4
Items backed up:              4

Backup Item Operations:
  Operation for persistentvolumeclaims default/etcd0-pv-claim:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-e1522ce3-597d-40b9-96b4-5bbe26061542.480573e7-aa17-4900e451d
    Items to Update:
                           datauploads.velero.io velero/test12-wdxsx
    Phase:                 Completed
    Progress:              128020480 of 128020480 complete (Bytes)
    Progress description:  Completed
    Created:               2024-04-28 08:37:43 +0000 UTC
    Started:               2024-04-28 08:38:02 +0000 UTC
    Updated:               2024-04-28 08:38:03 +0000 UTC
Resource List:
  v1/Namespace:
    - default
  v1/PersistentVolume:
    - pvc-480573e7-aa17-4907-a0ee-b829d03aaebf
  v1/PersistentVolumeClaim:
    - default/etcd0-pv-claim
  v1/Pod:
    - default/etcd0

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots:
    default/etcd0-pv-claim:
      Data Movement:
        Operation ID: du-e1522ce3-597d-40b9-96b4-5bbe26061542.480573e7-aa17-4900e451d
        Data Mover: velero
        Uploader Type: kopia
        Moved data Size (bytes): 128020480

  Pod Volume Backups: <none included>

HooksAttempted:  1
HooksFailed:     0

The PVC/PV isn't labeled with app=etcd but is still included in the backup.
The issue also happens in restore.

@blackpiglet
Copy link
Contributor

This is related to a BIA called PodAction.

// Execute scans the pod's spec.volumes for persistentVolumeClaim volumes and returns a
// ResourceIdentifier list containing references to all of the persistentVolumeClaim volumes used by
// the pod. This ensures that when a pod is backed up, all referenced PVCs are backed up too.
func (a *PodAction) Execute(item runtime.Unstructured, backup *v1.Backup) (runtime.Unstructured, []velero.ResourceIdentifier, error) {

It lists all pod's mounting PVCs and returns them as additional items.

We may need to discuss whether this is the expected behavior.

@sseago
Copy link
Collaborator

sseago commented Apr 29, 2024

Generally speaking, the intent of the label selector on the backup is to back up only those resources that match the label. If the pod matches and the PVC doesn't, then only the pod is backed up not the volume. If you want both backed up, you should label both pod and PVC (and PV). If we automatically included every additional item from plugin response, even if it didn't match the backup spec (label selector, includes/excludes, etc.) then we'd break the use case where a user wants to back up only Pods.

@kaovilai
Copy link
Contributor

However if namespace have the labelselector, anything under that namespace would be backup as well per #7697 IIUC.

@blackpiglet
Copy link
Contributor

@kaovilai
It doesn't work like that. The namespace Include/Exclude filter has the effect you mentioned, but not the label selector on the namespace.

@sseago
I agree your proposal is valid. To achieve that, we need to skip the BIA returned additional items. There is no available mechanism to do that yet, and it seems a significant change in the Velero backup process. Is there any scenario definitely needs that?

@blackpiglet
Copy link
Contributor

/assign @blackpiglet

@sseago
Copy link
Collaborator

sseago commented Apr 30, 2024

@blackpiglet Actually, it should already work this way. When we call backupItem on the returned additional item, one of the first things we do is to check for exclusion. The only exception to this is if the mustInclude annotation is set -- if I'm remembering correctly, the CSI plugin sets this on the newly-created VolumeSnapshot resources since they're resources that Velero creates and the user may not know about (and thus may not include in backup).

@sseago
Copy link
Collaborator

sseago commented Apr 30, 2024

@blackpiglet but maybe we only check include/exclude resource types and not labels? I guess we check label selector matches in a different place, so maybe that doesn't happen on additional item return. So if it's not currently working this way, we may want to think about whether we want it to or not -- since any change would be a change for users who rely on current behavior.

@sseago
Copy link
Collaborator

sseago commented May 3, 2024

@blackpiglet thinking about this again, the pod_action returns PVCs, then the pvc_action returns PVs. At each of those points, we do check includes/excludes and only add those additional items to the backup if the includes/excludes call for it. In the above case, if we were excluding PVCs directly rather than using label selectors, it would have been excluded in the example above. I suspect that what we really need to do is just make sure that when we're checking includes/excludes we also check the label selector.

Also note that there are cases where we bypass these checks for additional items, using the "backup.velero.io/must-include-additional-items" annotation -- we do this for CSI snapshots, since there's no way the user could have labeled the VolumeSnapshot properly and probably doesn't know to include it in the included items list. I think the case could be made that the pvc action which returns the bound PV should set this annotation, meaning including the PVC should always include the PV, regardless of whether the PV has the matching label. I don't think we want that behavior for Pod->PVC, as it should be possible for a user to back up a Pod without its volumes (and vice versa).

@sseago
Copy link
Collaborator

sseago commented May 3, 2024

Quicker summary -- I think the desired outcome is that, in general, backup label selector only backs up labeled items, not every additional item returned by BIAs for labeled items, but we need certain exceptions using the mustInclude annotation. We already have exceptions in place using this annotation for CSI plugin additional items. We probably need this for PVC additional items returned in pvc_action.go. Main reason for wanting this is that since PVs are dynamically provisioned, if a user creates their PVCs with labels already applied, the PVs won't have the labels, so it's an extra (and error-prone) step to have to remember to do this. Also, I can't really see any use cases that would call for backing up a PVC without a PV, while the same is not true for other plugin additional item returns (Pod->PVC, for example, should require the label on the PVC for that to be included).

@blackpiglet
Copy link
Contributor

@sseago
There indeed is a mismatch between the selector and the include/exclude rules usage.
The selector and orLabelSelector are only used in the item collection phase.
It's reasonable to align them, but we may need to consider when the additional should be skipped.
Take the pod_action for example, backing up the pod without the mounting volumes is unexpected for most cases. Users could find out the pod cannot start during the restore.

@blackpiglet blackpiglet self-assigned this May 8, 2024
@blackpiglet
Copy link
Contributor

After thinking twice, allowing users to pick the labeled resources is reasonable.
The current way works as selecting related resources together.
It guarantees the restoration works in most cases.

The suggested change is valuable, but it requires more knowledge from the user, and it's breaking change.
I'm not sure whether we should support both of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants