Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My /data/ dir doubled in size one night #362

Open
d5ve opened this issue Dec 10, 2022 · 6 comments
Open

My /data/ dir doubled in size one night #362

d5ve opened this issue Dec 10, 2022 · 6 comments

Comments

@d5ve
Copy link

d5ve commented Dec 10, 2022

Longtime and happy bupstash user here!

I use bupstash to take backup hourly during the working day from my macos laptop to a repo on a linux PC.

laptop$ BUPSTASH_REPOSITORY=ssh:... BUPSTASH_KEY=... bupstash put --exclude "a few dirs" /Users/d5ve

Then twice each night I rsync the whole bupstash repository from the PC to rsync.net.

pc$ /usr/bin/rsync -avH /backups/laptop ab-1234.rsync.net:bupstash

For some reason, the /data/ dir in the PC's bupstash repository recently doubled in size. I only noticed this due to my usage graph on rsync.net jumping from 380GB to 650GB on November 14th.

On the laptop, bupstash list shows a list of backups slowly growing from 240GB to 260GB over the past year.

On the PC, du -sh shows that the /data/ dir in the bupstash repo is now 610GB.

I ran a bupstash gc on the PC, and it deleted 4GB of chunks only.

The cronjob on the PC which performs the rsync runs at 2AM and 3AM. Looking at the logs around the night in question, the rsyncs the previous night showed the remote data on rsync.net being 389GB, then from the 14th Nov being 658GB.

Nov 14th doesn't seem to match any daylight savings changeover or anything like that.

Is there a way to interrogate the repository to find out what changed, and what the extra 300GB of files are?

@d5ve
Copy link
Author

d5ve commented Dec 10, 2022

The laptop is running bupstash-0.12.0 from homebrew (though I may have updated it since Nov 14th)

The PC is running bupstash-0.12.0 (though I may have updated it since Nov 14th)

@d5ve
Copy link
Author

d5ve commented Dec 10, 2022

bupstash diff id=some-id-on-the-10th-november :: id=some-id-from-today shows pretty much what I'd expect - some new photos and other documents. Maybe a couple of GBs of differences.

@andrewchambers
Copy link
Owner

I think the root cause may be that the most recent bupstash update has tweaked the deduplication algorithm (to enable higher performance) - this is not likely to happen automatically again in the future, and my apologies for the inconvenience.

@d5ve
Copy link
Author

d5ve commented Dec 12, 2022

Is there any way that I can "clean up" some of the extra data in the repo on the PC?

The total size of the data being backed up from the laptop is about 300GB, which zips down to about the 260GB reported by bupstash for each recent backup.

Most of the data is a photo library, so I wouldn't expect there to be another 300GB of "diffs" in the bupstash repo data dir on the PC.

@andrewchambers
Copy link
Owner

@d5ve You would need to remove the snapshots since before the version upgrade and run bupstash gc to prune away the old data.

To further explain the repository growth - bupstash splits your photo library into pseudo random sized chunks and only ever stores a single copy of each chunk no matter how many backups they are present in. I made an update to the chunking algorithm means you now have two similar, but not identical sets of chunks which has disrupted deduplication.

Currently the easiest way to cleanup the repository is just to cycle out the old snapshots, though I think in the future I could try to think of a better solution if I need to change the repository format ever again.

@ptman
Copy link

ptman commented Aug 3, 2023

There should probably be a way to rechunk old data. Especially if chunking is something that can be tweaked by the user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants