Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for specifying part sizes and stdin as input #3

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

imiric
Copy link

@imiric imiric commented Apr 24, 2019

Hi, thanks for this great tool.

I took a stab at addressing #1 to allow piping directly to tarsplitter from tar itself when input is specified as '-', which avoids the need for a large intermediary tar file. For example: tar -cvf - . | tarsplitter -i - -s 1G -o /tmp/archive-. This will create tar files that are at most 1GB in size, though individual sizes will vary depending on the input files and how they're sorted.

This comes at a cost of an external dependency (https://github.com/c2h5oh/datasize), but I hope you'll agree that it's best to not reinvent the wheel for parsing human-readable sizes.

The -p option doesn't make sense when input is stdin since we can't know the total input size in advance to calculate the part size. Similarly, if -s is not provided when input is stdin no splitting will occur and in both cases only a single tar file will be created, which defeats the purpose of using tarsplitter, so the user should ensure to always specify -s with -i -. Maybe we should enforce this explicitly, but I didn't think it was necessary.

On a separate note, I didn't test this with -m archive, which I think should be removed from tarsplitter, leaving this functionality to tar itself now that stdin is supported. I would also consider deprecating -p since the user usually wants control over the part size and not the quantity of files produced.

Ideally we should have unit/functional tests for all this, but I'll leave that for another PR. :)

Cheers,

Ivan

This allows piping directly to tarsplitter from tar itself when input is
specified as '-', which avoids the need for a large intermediary tar file.
For example: `tar -cvf - . | tarsplitter -i - -s 1G -o /tmp/archive-`.
This will create tar files that are at most 1GB in size, though
individual sizes will vary depending on the input files and how they're
sorted.

The `-p` option doesn't make sense when input is stdin since we can't
know the total input size in advance to calculate the part size.
Similarly, if `-s` is not provided when input is stdin no splitting will
occur and in both cases only a single tar file will be created, which
defeats the purpose of using tarsplitter, so make sure to always specify
`-s` with `-i -`.
fatalIf(err, "Failed statting input", *input)
defer file.Close()
var file io.Reader
var partSizeBytes int64
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use uint64 here since sizes are always positive, but didn't want to refactor the other int64 usages (bytesBeforeWrite, etc.).

@@ -11,13 +11,16 @@ import (
"path/filepath"
"strings"
"sync"
"github.com/c2h5oh/datasize"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

How would you feel about making this megabytes, and ditch the dependency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants