Git is an awesome tool for versioning and storing your textual files but it can fall short when you're looking to use it to store large (binary) files. git-bits is an extension that builds on top of Git that solves this problem in a simple and secure fashion through clever usage of Content Based Chunking (CBC), Content-Addressable Storage (CAS) and convergent encryption. It written in pure and portable Go code and is structured as a set of streaming commands that follow the unix philosopy.
Features:
- Normal Git workflow: it uses Git's smudge/clean filters with a pre-push hook to integrate seamlessly on top of your new or existing repository so you can continue to use your normal workflow.
- No Server Process: upon pushing your Git commits to a remote your large files are also send to a remote object store. By using a content-addressable storage scheme it doesn't require a coordinating server process that can become unavailable, it uploads directly to your own high-available AWS S3 bucket.
- Deduplication: Large files are stored in variable sized blocks based on the file's content. Each block is only stored once and as such it becomes economic to store many slightly-different versions. This allows for massive savings on both bandwidth and storage costs when you're large files only change partially between versions.
- Encryption-at-rest: Since large files are now stored at a third party, seperate from your actual Git repository, it becomes important that the data is encrypted at rest.
git-bits
encrypts each chunk using the AES-256 encryption standard before uploading them.
-
First, make sure Git itself is installed and available in your
PATH
-
Then, Choose one of you preferred installation methods for installing the git-bits extension:
Pre-compiled binaries are available for 64bit Windows,MacOS and Linux on the release page, simply download the binary for your platform and place it in your
PATH
.building from source is recommended for other platforms, this is made easy by the fact that
git-bits
is go-gettable. Simply install the Go SDK, make sure your$GOPATH
is setup and$GOPATH/bin
is added to youPATH
. Then run the following to install or update:go get -u github.com/nerdalize/git-bits
-
Verify that the installation succeeded by envoking the Git with the git-bits extension, it should show the git-bits subcommands. If it complains with "... is not a git command", make sure the above steps were executed correctly.
git bits
git-bits is build on top of Git, this guide assumes you have basic knowledge of working with a Git repository. Also, large file chunks are stored directly on AWS S3, as such you'll need a AWS account with an S3 bucket and a access_key_id
and the secret_access_key
to allow git-bits to put, get and list bucket objects. The bucket needs to be completely reserved for git-bits file chunks.
Note: For Windows, the documentation assumes you're using Git through a bash-like CLI but nothing about the implementation prevents you from using another approach.
- Use your terminal to navigate to a repository with some large/binary files you would like to store and initialize git-bits:
cd ~/my-project
git bits install
NOTE: If your git repository doesn't have any commits, a seemingly 'fatal' error appears, you can safely ignore this
-
Provide your AWS information when asked and git-bits will configure a pre-push hook and the correct Git filter.
-
The 'bits' filter requires you mark certain files for large-file storage using the
.gitattributes
file, the following marks all files ending with .bin for storage using git-bits:
echo '*.bin filter=bits' >> .gitattributes
- With the filter inplace you can now add your large file to the staging area and commit changes as usual. Upon moving large-files to the staging area, git-bits will split them into variable sized chunks and write them to
.git/chunks
, the key of each chunk will be listen to inform you of the progress:
git add ./my-large-file.bin
git commit -m "added a large file"
- Finally, to store your large files on S3 you can simply push the changes as you're used to. git-bits will index what chunks are already present and only upload new blocks:
git push