-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skip invalid link path during gc #4343
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe what is it you are trying to achieve with this PR?
Specifically, can you give us an example of when a stored registry link contains an invalid digest? That'd be mildly worrying, tbh
hi @milosgajdos, we use Harbor as our image repository, and Harbor uses Registry to manage images underneath. I've noticed that after running for a while, Harbor tends to leave behind a lot of residual data, as described in the #4330. I tried running garbage collection (GC) to reclaim resources, but encountered errors. It turns out that there are many invalid link files in the Registry. These link files are either empty or have invalid digests, which directly cause the GC to interrupt. Actually, we can skip these invalid link files instead of directly interrupting the GC. ~/go/src/github.com/distribution/distribution (link ✗) bin/registry garbage-collect cmd/registry/config-dev.yml -m
.......
deeproute-all/docker-repo/data-ocean/nginx
deeproute-all/docker-repo/data-ocean/prod/data-ocean-front-end
deeproute-all/docker-repo/data-ocean/qa/data-ocean-front-end
deeproute-all/docker-repo/dataocean/adas-web-term
deeproute-all/docker-repo/dataocean/dataocean-account-management-service
deeproute-all/docker-repo/dataocean/dataocean-adas-monitor
deeproute-all/docker-repo/dataocean/dataocean-adas-tar
deeproute-all/docker-repo/dataocean/dataocean-api-docs
deeproute-all/docker-repo/dataocean/dataocean-api-gateway
deeproute-all/docker-repo/dataocean/dataocean-authentication-service
deeproute-all/docker-repo/dataocean/dataocean-availability-service
deeproute-all/docker-repo/dataocean/dataocean-bag-construction
deeproute-all/docker-repo/dataocean/dataocean-bag-constructor
deeproute-all/docker-repo/dataocean/dataocean-car-transmit
deeproute-all/docker-repo/dataocean/dataocean-cloud-vehicle-communication
deeproute-all/docker-repo/dataocean/dataocean-data-agent
deeproute-all/docker-repo/dataocean/dataocean-data-agent-overseer
deeproute-all/docker-repo/dataocean/dataocean-data-construction
deeproute-all/docker-repo/dataocean/dataocean-data-migration
deeproute-all/docker-repo/dataocean/dataocean-data-provider-service
deeproute-all/docker-repo/dataocean/dataocean-data-transformation-service
deeproute-all/docker-repo/dataocean/dataocean-data-tube-service
deeproute-all/docker-repo/dataocean/dataocean-data-tunnel
deeproute-all/docker-repo/dataocean/dataocean-database-migration
deeproute-all/docker-repo/dataocean/dataocean-database-migration-data-managment-v2
deeproute-all/docker-repo/dataocean/dataocean-download-test
deeproute-all/docker-repo/dataocean/dataocean-dr-pipeline
deeproute-all/docker-repo/dataocean/dataocean-drfile-archive-serivce
deeproute-all/docker-repo/dataocean/dataocean-drtag
deeproute-all/docker-repo/dataocean/dataocean-file-download-service
deeproute-all/docker-repo/dataocean/dataocean-file-galaxy
failed to garbage collect: failed to mark: filesystem: filesystem: invalid checksum digest format ~/go/src/github.com/distribution/distribution (link ✗) ll /mnt/cephfs/docker/registry/v2/repositories/infra-operation/image/build-env-image/_manifests/revisions/sha256/a68799c5a48d4cfe46ea1e122245097441f14720b0488d1d162eea637e3b96f7/link
-rw-r--r-- 1 10000 71 Apr 28 2022 /mnt/cephfs/docker/registry/v2/repositories/infra-operation/image/build-env-image/_manifests/revisions/sha256/a68799c5a48d4cfe46ea1e122245097441f14720b0488d1d162eea637e3b96f7/link
~/go/src/github.com/distribution/distribution (link ✗) cat /mnt/cephfs/docker/registry/v2/repositories/infra-operation/image/build-env-image/_manifests/revisions/sha256/a68799c5a48d4cfe46ea1e122245097441f14720b0488d1d162eea637e3b96f7/link
~/go/src/github.com/distribution/distribution (link ✗) ll /mnt/cephfs/docker/registry/v2/repositories/deeproute-simulation/single-frame-task-images/_manifests/revisions/sha256/20f5baba4c1fdd30c9e13984365ed873b06a945afb877569219daee8870da30b/link
-rw-r--r-- 1 10000 0 Jun 24 2022 /mnt/cephfs/docker/registry/v2/repositories/deeproute-simulation/single-frame-task-images/_manifests/revisions/sha256/20f5baba4c1fdd30c9e13984365ed873b06a945afb877569219daee8870da30b/link |
I'd prefer looking into the underlying issue rather than adding a bandaid on a symptom. |
Maybe this should be considered quite common. Taking the example of pushing an image, it involves several HTTP requests. For the push process, the entire operation is non-atomic, so it's very likely that failures or other anomalies midway could lead to data corrupt or incompleteness. This is when running GC would cause errors. I think we can skip the anomalous data. |
What happens to anomalous data if you leave them dangling in the storage and then a push with the same references is attempted? |
the link file will be overwritten if push the same references. |
if err != nil { | ||
if err == digest.ErrDigestInvalidFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering, how did you come across broken digest in your storage? How did it get there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run GC, it exits abnormally. To locate the issue, I added some logs and found that the error occurs when parsing the digest while reading some link paths. Then I checked the storage and discovered that these link paths are either empty or corrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but that's worrisome right? Because we might be addressing a symptom here, not the root cause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I don't have a good way to reproduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the link paths are empty/corrupt then wouldn't the better idea just to be to remove them and warn rather then leave them be? I worry we get into a situation with a bunch of these corrupt/empty link files sitting in someones blob store and the only way to remove them would be via manual intervention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@squizzi mind having a look at this?
Signed-off-by: Liang Zheng <[email protected]>
Honestly, both is probably the right way to go here. If GC is leaving a lot of cruft behind that's concerning because it's not actually garbage collecting. If every abnormal run of GC is stopping because it's hitting an error this could result in a lot of failed GC runs. I'm in support of band-aiding it but perhaps filing an issue or something to look into the underlying cause as a follow-up. |
So the question here for me is: is the GC leaving broken links behind or is it failed push. Because if it's the former I'm ok for merging this in, if it's the latter then we have a bigger problem on our hands 😓 Yeah maybe we can proceed with this and opening an issue to make note of it so it doesnt get forgotten. Mind actioning this @microyahoo |
Signed-off-by: Liang Zheng [email protected]