Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List big buckets fix #1213

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zakalibit
Copy link

When trying to list a very large bucket ~1M objects s3cmd runs out of memory.

@fviard
Copy link
Contributor

fviard commented Sep 26, 2021

Thank you for your contribution.
The problem that you are trying to fix is a real one, I agree, but there is a practical problem with your PR:
With your change, the ls command will alternatively list "prefixes" (ie "folders") and "files".
In the current case it will create a regression, but also if it was for a new command, the behavior would still be strange.

It is probably why the ls command was not modified to use directly the "streaming list".

@zakalibit
Copy link
Author

zakalibit commented Sep 27, 2021

Thank you for looking, I would not call that behaviour strange, maybe unexpected. It is still consumable, it is way stranger to run out of memory and coredump :)
Maybe we can look at few other options:

  • streaming command line option, or automatic one depending on the size of the bucket (and maybe free memory on the system)
  • write prefixes and file data to tmp files, then print from there, instead of gathering in memory, still can run out of disk space, but probably can be monitored, so it would stop at the right time

Then out of curiosity I looked at aws cli code and it seems that it does stream and interleaves folders with files output
https://github.com/aws/aws-cli/blob/be17675af6b1e26fea2840dd7ac9006f30805d7a/awscli/customizations/s3/subcommands.py#L456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants