Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 sync operations do not work with S3 directory buckets (S3 Express One Zone) #8470

Closed
jeffgardnerdev opened this issue Jan 8, 2024 · 8 comments
Assignees
Labels
bug This issue is a bug. p0 This issue is the highest priority s3

Comments

@jeffgardnerdev
Copy link

Describe the bug

If you run aws s3 sync with the source from a standard bucket and the destination from a directory bucket, the comparison does not work properly. Some files that do exist in the source are not recognized and some files that do exist in the destination are also not recognized.

Expected Behavior

Only the source objects that changed since the last sync are copied to the destination.

Current Behavior

Some objects that did not change since the last sync are copied to the destination. Those objects are also deleted from the destination before copying from source if the --delete flag is supplied, indicating that it actually does see the object at the destination, but doesn't consider it the same object as what is in the source.

Reproduction Steps

Test scenario: a source prefix in a standard bucket is successfully synced to the same prefix in a directory bucket. The prefix has 5 objects. A subsequent sync performed immediately after results in copying one of those objects again, even though nothing has changed. The debug messages contain a message that "file does not exist at destination" even though it does exist at the destination. Interestingly, if the --delete flag is supplied, that file is deleted from the destination and then recopied.

Possible Solution

I believe this bug stems from an inconsistency in the response order in the list-objects-v2 API. For whatever reason the file that was considered missing in the test scenario is listed last in the response for the directory bucket, while it is listed first in the response for the standard bucket.

Additional Information/Context

No response

CLI version used

2.14.4

Environment details (OS name and version, etc.)

macOS Ventura 13.4.1

@jeffgardnerdev jeffgardnerdev added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 8, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Jan 9, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK added s3 p3 This is a minor priority issue investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. p3 This is a minor priority issue labels Jan 9, 2024
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @jeffgardnerdev, thanks for reaching out. Could you provide debug logs of this behavior? You can get debug logs by adding --debug to your command, and redacting any sensitive information. Logs with --delete and without --delete would both be appreciated. Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 9, 2024
@jeffgardnerdev
Copy link
Author

Here are the two debug outputs, one without the --delete option and one with the --delete option. In this case the two prefixes are identical but the sync command copies/deletes 4 of the 5 files every time. Bucket names are redacted. ListObjectsV2 responses for the two prefixes return the objects in a different order.
s3-sync-standard-to-directory-with-delete-debug-output.txt
s3-sync-standard-to-directory-debug-output.txt

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 10, 2024
@RyanFitzSimmonsAK
Copy link
Contributor

RyanFitzSimmonsAK commented Jan 22, 2024

@jeffgardnerdev, thanks for your patience. I was able to reproduce this behavior, and your theory that this is related to ListObjectsV2 not sorting directory buckets is likely correct. We've reached out to the service team, and I'll leave any updates in this issue.

If other people are experiencing this issue, providing details and any impact in this issue would be appreciated.

Ticket # for internal use : P114641353

@RyanFitzSimmonsAK RyanFitzSimmonsAK added p1 This is a high priority issue and removed p2 This is a standard priority issue labels Jan 22, 2024
@kellertk kellertk added p0 This issue is the highest priority and removed p1 This is a high priority issue labels Jan 22, 2024
@kellertk kellertk changed the title s3 sync general bucket --> directory bucket doesn't compare properly AWS S3 sync operations do not work with S3 directory buckets (S3 Express One Zone) Jan 22, 2024
@kellertk kellertk pinned this issue Jan 22, 2024
@jeffgardnerdev
Copy link
Author

@RyanFitzSimmonsAK Could you share some more information about the decision to disable the sync command for directory buckets entirely? This is useful functionality that would be nice to keep, provided the ordering issue could be fixed.

@kellertk
Copy link
Contributor

Hello,

Thank you for reporting this issue. We have released AWS CLI v1.32.25 and v2.15.13 and strongly recommend that you upgrade to address this issue. If you are unable to upgrade, do not run the aws s3 sync command with an S3 Express One Zone directory bucket.

@kellertk
Copy link
Contributor

@jeffgardnerdev I did see your question about further information, which I will post here if I am able.

@kellertk
Copy link
Contributor

Here are some additional technical details on this issue if you are interested.

The reason we’ve removed the sync command for directory buckets is what @jeffgardnerdev already noticed: there is an incompatibility between the way that the CLI compares a list of objects in a bucket and the results of the ListObjectsv2 API call on directory buckets[1]. Operations with S3 are threaded, and we’re comparing a list of objects as from the S3 API as that list is being populated. This isn’t compatible with directory buckets, because there’s no way to ensure the ordering of objects coming from S3. Because of the incompatibility, the sync command will not work properly on a directory bucket. There is no workaround for directory buckets and sync at this time, except to refrain from using sync and instead use cp or similar.

In the versions of the CLI I noted above, v1.32.25 and v2.15.13 and later, we removed sync with directory bucket destinations or sources to prevent anyone from using this command and getting inconsistent or incorrect results.

1: Specifically, "Sorting order of returned objects" on the linked documentation page.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p0 This issue is the highest priority s3
Projects
None yet
Development

No branches or pull requests

3 participants