Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a put option to exclude files and directories that have a particular xattr #366

Open
smferris opened this issue Jan 9, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@smferris
Copy link

smferris commented Jan 9, 2023

macOS uses an xattr com.apple.metadata:com_apple_backup_excludeItem to mark items to exclude from Time Machine backups. A variety of software packages use this to mark their temporary and cache files to get them automatically excluded from backups, both Apple software and 3rd party apps such as Chrome, Skype, etc.

It would be nice to have a bupstash option to exclude a file or directory that has a specified xattr. Manually building an exclude list that covers all the files would be tedious and error prone.

I'll propose --exclude-xattr or --exclude-xattrname (analogous to macOS find -xattrname).

Then people using bupstash on macOS could run something like:

bupstash put --exclude-xattr="com.apple.metadata:com_apple_backup_excludeItem" $HOME

and exclude everything that would be excluded by a Time Machine backup.

The value of the xattr can vary on macOS (the value is a binary plist from what I've read), but as far as I know any xattr value will get the file excluded from Time Machine backups, so I think bupstash can just check for the existence of the xattr without caring about the value.

$ find $HOME -xattrname "com.apple.metadata:com_apple_backup_excludeItem" | wc
     508     730   46252
@andrewchambers
Copy link
Owner

andrewchambers commented Jan 10, 2023

There are so many ways to deal with exclusions and more keep being invented that I am considering a simple exclusion script system with some default scripts or functions. This way users could use the script system and check for xattrs, specific files, dynamically generating file lists or other things and share their scripts with each other.

Not sure if I will go down this path, but just something to think about - I would love to hear your opinion on it.

@andrewchambers andrewchambers added the enhancement New feature or request label Jan 10, 2023
@smferris
Copy link
Author

I'm neutral to the idea of scripting. The flexibility might be nice, but I have to wonder what the costs of it would be in terms of runtime performance, development effort, and maintenance effort. A fast backup is more important to me than maximum flexibility with excludes.

I suspect you can cover most people's exclude needs with a relatively small set of exclude capabilities, so I'm not yet convinced it would be worth the effort to make excludes scriptable, but maybe I'm just unaware of all the different approaches people want. Do you have a list of them collected somewhere? Off the top of my head, I'm only thinking of --exclude and --exclude-if-found, which you've already implemented, the --exclude-xattr I mention here, and something like --exclude-from (borg's name for it) to use patterns contained in a file rather than having to put them all on the command line.

borg-create has some interesting options to flip the problem around and specify which paths are desired, rather than which to exclude: --paths-from-command and --paths-from-stdin. Those let people generate the paths they want with another program and feed them in. That might cover some of the less common exclude needs better than adding a scripting language, since it doesn't limit people to one particular language. It does have the downside of requiring another program to do a filesystem traversal though, and the filesystem cache behavior might be unfortunate if that traversal and bupstash's are focusing on different areas of the filesystem.

It might be even better to have --include-from-{command,stdin} and --exclude-from-{command,stdin} rather than just --paths-from-{command,stdin}, so that you can generate both includes and excludes from another program.

I doubt I'd ever use -from-{command,stdin} functionality myself though, if you've got what I'd consider the core set of options: --exclude, --exclude-if-found, --exclude-xattr, and --exclude-from. Those 4 are all I'd really need in a backup program myself, as long as the patterns are reasonably expressive.

@smferris
Copy link
Author

smferris commented Jan 10, 2023

Another (possibly crazy) approach that avoids the duplicate traversal of the -from-{command,stdin} would be a --filter-command option, which specifies a filter program for bupstash to run with pipes for stdin and stdout. bupstash provides pathnames it has found (that aren't excluded by other options) to the filter's stdin, and the filter outputs only the pathnames that bupstash should actually use to stdout. That way there's just one filesystem traversal (bupstash's), and people can use any programming or scripting language to implement the filter. I think I like that idea even better than the -from-{command,stdin} borg has.

The context switching of using a separate program might hurt performance compared to doing filtering inside of bupstash itself though.

@andrewchambers
Copy link
Owner

andrewchambers commented Jan 10, 2023

One example that many people want is git excludes where users often keep git extra exclude information in their ~/.git configuration as well as individual repositories. In that case the I wondered if a good approach would be to invoke git as the designated user in order to gather the exclude list and somehow communicate it back to the process walking the filesystem.

I have an aversion to special casing tools like git, even if they are common and would prefer a general mechanism can instead be added. The problem there seems to be balancing ease of use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants