Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for gocryptfs, ecryptfs, borg, duplicity, and rsync #374

Open
wants to merge 82 commits into
base: master
Choose a base branch
from

Conversation

bgemmill
Copy link

@bgemmill bgemmill commented Aug 8, 2016

Edited summary follows since this PR thread has gotten long.

First, please direct any issues you have with this PR here so this thread doesn't get any more out of control:
https://github.com/bgemmill/acd_cli

This PR provides two primary features to allow rsyncing into a layered encrypted filesystem:

  • out of order rewriting
  • mtime support

And a few caches to make the above features performant:

  • node_id to node (in memory)
  • path to node_id (in memory)
  • the content of small files and symlinks (in nodes.db)

With those implemented, it was pretty simple to add a few other things to flesh out our filesystem support:

  • uid/gid/mode
  • symlinks
  • fs block size for du and stat

The rationale for out of order rewriting is that most encrypting file systems maintain a header around the beginning of the file that gets updated as the rest of the file is written. This means that write patterns typically look like sets of [append to end, overwrite at beginning]. I'm solving this issue by using a write-back cache that stores file writes in a SpooledTemporaryFile until all file handles are closed, and only then pushing to amazon.

The rationale for mtime is that rsync uses it for file equality testing. I'm implementing this by using one of the 10 properties an amazon app gets to store all file xattrs as a json object. Once mtime and xattrs were in place, it was straightforward to add the others.

Considerations:

  • The SpooledTemporaryFile will keep writes smaller than 1G in memory; opening, writing, and not closing a lot of files below that limit will use up a lot of ram.
  • Because pushing to amazon happens on file handle releases and not write calls, expect writes to appear very fast; the actual work happens later.
  • Amazon has been reducing the length of properties that it will allow. Setting many or long xattrs will yield errors.
  • Because the write back caching is triggered off file handle counts going to zero, mmap will not work as intended.
  • Due to how fusepy handles timestamps, there are some files that rsync will think are always changed. https://github.com/terencehonles/fusepy/issues/70

Please enjoy, and let me know if anything goes wrong!


Original post


Ecryptfs has two properties that we need to overcome in order to get it working with acd_cli.

Luckily, this PR addresses both :-)

  1. ecryptfs writes files 4096 bytes at a time, using a different file handle each time. This PR allows multiple file handles to share a write buffer if they all write sequentially. To make this performant for large files (large numbers of file descriptors), I've added some lookup caching to how nodes are obtained.

  2. ecryptfs wants to write a cryptographic checksum at the beginning of the file once it's done. We could either buffer everything before sending, which would be memory intensive for big files, or we could have ecryptfs store this checksum in the file's xattr instead. I've opted to go this route, which required implementing xattrs over ACD using one of our allowed properties.

Additionally, ecryptfs is extremely chatty about when it decides to write to this buffer. To deal with this, xattrs are marked as dirty and only sent over the wire when any file has all of it's handles closed, or when fuse is unloaded.

With these changes, I can get about 80% of my unencrypted speed to ACD at home using an encrypted mount. If everything in this PR looks good, I have a few ideas of where to push that a bit more.

Please let me know if I grokked the fusepy threading model properly, that's the piece I was the least sure about, especially how safe/unsafe some things were with the GIL.

…tiple file descriptors, as well as support for getting and setting xattrs on files.
@bgemmill
Copy link
Author

bgemmill commented Aug 8, 2016

Addresses issue:
#368

@jetbalsa
Copy link

jetbalsa commented Aug 8, 2016

not bad, would be nice to have a configurable lru style cache to help with reads/writes (with some kind of read ahead)

@bgemmill
Copy link
Author

bgemmill commented Aug 8, 2016

@yadayada looks like the buildbot needs a new oauth token to test properly. I see this in the logs:
CRITICAL:acdcli.api.oauth:Invalid authentication token: Invalid JSON or missing key.Token:
{"refresh_token": "bar", "expires_in": 3600}
16-08-08 00:06:11.286 [CRITICAL] [acdcli.api.oauth] - Invalid authentication token: Invalid JSON or missing key.Token:

@bgemmill
Copy link
Author

I've implemented proper mtime handling in one of the xattrs so that rsync over acd_cli can work as expected. This addresses:
#58

@Thinkscape
Copy link

Why not "backport" the write buffer feature as a general write-back cache for acd_cli? That'd fix problems with ecryptfs, encfs and any other applications where data is appended in small blocks (and overloads the acd_cli API eating 100% cpu).

[email protected] and others added 2 commits August 11, 2016 18:32
…ructs leads to epsilon problems, causing rsync to think that mtime is different when it isn't.
… using xattrs for crypto headers, we have to allow re-writing the first bytes of a file to make ecryptfs happy. once they fix their bug, this can be removed and we can go back to xattrs.
@bgemmill
Copy link
Author

Turns out that ecryptfs has a subtle bug when it stores its crypto headers in xattrs; it reports file size incorrectly on the next time it's mounted:
https://bugs.launchpad.net/ecryptfs/+bug/1612492

That means rsync will behave properly only if your mount has perfect uptime! :-)

Until they fix that, I've allowed the acd fuse mount to overwrite the first few bytes of a file where the crypto header would go. Because we still need to write to amazon sequentially, I'm solving this by storing the header in xattr space, and splicing it back into the byte stream on read. This still seems better than requiring whole files to be kept in memory until fully written.

…there are some rsync flags that write multiple times to the same memory location, for reasons unknown. this keeps the whole file in a buffer until it's flushed to amazon on file handle closed. future work will be for super large files, we should use a temp file as backing.
…rtial) buffer in the middle of writing is hard enough that we bail on it. We don't care about pre-allocating files since we have infinite space, and shortening a file is only possible when it's being written to.... so we can only catch the rare use case of file overwrites and truncate back. Neither are worth it.
@bgemmill
Copy link
Author

I've finally gotten rsync, ecryptfs, and acd_fuse playing nice together. There were enough corner cases around rsync flags I can't control (thanks Synology!) and some older versions of the kernel that make ecryptfs call useless truncates before flushing (thanks Synology!) that the best way to make it all go is to build a write buffer in memory until all the interested file handles are closed. This allows multiple writes to the same offset, out of order writes as long as nothing leaps forward with a gap, and eliminates the hack of putting encrypted headers into xattr space.

Further work will be to use temp file backing rather than memory backing if individual files get too large.

@Thinkscape
Copy link

Thinkscape commented Aug 16, 2016

Further work will be to use temp file backing rather than memory backing if individual files get too large.

@bgemmill this is covered in #314 and is not ecryptfs specific. It'd help with performance and other apps which write file handles non-linearly. It'd be awesome if you could port the write buffer feature as separate PR (separate flag/option) which this one can depend on.

hint: https://github.com/redbo/cloudfuse/blob/master/cloudfuse.c#L256-L289

@bgemmill
Copy link
Author

@Thinkscape I'm only going to pursue the file backing if the write memory backing is too memory intensive. At the moment this PR makes both ecryptfs and rsync work properly, uses memory for only the files being written at any given moment, and that seems like a good place to leave it.

The way I'm looking at it is that this PR is the one that the file backing PR should depend on.

File caching is going to require a bit of thought too, because unless we're smart about LRU like @jrwr pointed out, we'd end up doubling the on-disk space in the process of rsyncing to Amazon.

@bgemmill bgemmill changed the title support for ecryptfs support for ecryptfs and rsync Aug 16, 2016
@Thinkscape
Copy link

LRU cache is something different to what I meant.

The caching Swift FUSE does is per file handle - a process opens a file handle for writing, writes as much or little as it likes and closes the handle. That's what most rsync-like streamers and updaters will do.

Of course memory backing will be too memory intensive. If you attempt to rsync or random-write a 8GB file, it'll gladly consume 8+ GB of RAM.

@bgemmill
Copy link
Author

@Thinkscape Thanks for clarifying. @jrwr's point as I understood it was what do you do with that temporary file once you're done. Delete it immediately, keep it around for faster reading, LRU, something else?

As to memory backing, I'm in the middle of going through my wedding videos, and haven't seen a huge hiccup. I'd imagine that's virtual memory doing what you suggest with swapping; I'll have more info tomorrow when my rsync job finishes.

Looking at the job in the middle of today:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 4212 376 300 S 0.0 0.0 0:00.07 minit
22 root 20 0 902184 339612 4892 S 0.0 4.2 117:54.59 acd_cli
30 root 20 0 11128 1076 896 S 0.0 0.0 0:00.06 rsync
923 root 20 0 25956 2532 1216 S 0.0 0.0 0:01.09 rsync
924 root 20 0 26224 1756 260 S 0.0 0.0 26:21.43 rsync
2898 root 20 0 18228 1836 1436 S 0.0 0.0 0:00.04 bash
2914 root 20 0 36660 1716 1256 R 0.0 0.0 0:00.00 top

For me, the steady state usage seems to be about ~400M for this docker image on an 8G box, and a few big files passed through since virtual is around 900M now. Caveat: this is an instantaneous measure rather than peak, and I don't know what reserved was when the big file went through.

I can tell experimentally that this hasn't ground to a halt on swap or thrown python MemoryErrors. We'll see how the rest of the day goes.

Once it finishes I'll look more.

If you want to give it a go before then, fire up a docker container with 6G ram limit and do:
dd if=/dev/urandom of=file.blob bs=1MB count=8000
rsync file.blob /amazon/

@Thinkscape
Copy link

If you want to give it a go before then, fire up a docker container with 6G ram limit and do: dd if=/dev/urandom of=file.blob bs=1MB count=8000 rsync file.blob /amazon/

Yeah, but why? If it buffers it in RAM, of course it'll die with a big file.
Furthermore, I do not expect or want my tools to eat up all my server's RAM depending on what it stumbles upon in dir tree. It must not do that, regardless of what I upload to ACD, it's just not the way to go...

@bgemmill
Copy link
Author

@Thinkscape It turns out if you run that example you'd see what I did; no real performance hiccups because the docker memory clamping forces the older bits of big buffers into swap. File backing the old-school way.

To make this change set more palatable to non-docker users of fuse, I put in file backing if writing gets too large. At the moment the default is 1G.

On a different note, it looks like Synology's rsync directory querying fails when directories contain around 10k things; that many calls to getattr take too long for a timeout. I'm going to tackle that next since everyone probably wants 'ls -al' to complete quickly.

@Thinkscape
Copy link

To make this change set more palatable to non-docker users of fuse, I put in file backing if writing gets too large. At the moment the default is 1G.

Thanks. We cannot depend on any specific virtualization or OS feature to automagically manage memory for us. Rsync usually takes just a few megs of RAM regardless of the tree size or individual files' grandeur, and that's what I'd expect from a fuse driver as well. Even 1G seems excessive to me, but at least it's configurable.

@bgemmill bgemmill mentioned this pull request Aug 18, 2016
@BabyDino
Copy link

BabyDino commented Mar 17, 2017

One thing I am noticing is that something in the c/p phase is very slow. I am using CouchPotato to move some stuff around and the initial upload and transfer are fast. I can see the results within ACD no problem. However it takes almost a day before CP reports that the copy process is complete. See timestamps:

03-16 13:36:50 INFO [tato.core.plugins.renamer] Copying "/storage/downloads/DirName/file.mkv" to "/storage/media/DirName/file.mkv"
03-17 08:41:54 INFO [tato.core.plugins.scanner] Found 1 files in the folder /storage/media/DirName

Has this something to do with locking? Using encfs atm. My acd_cli.log doesn't go that far back, I will see if I can reproduce and get those logs.

@Hoppersusa
Copy link

@bgemmill Thank you for the schema fix! That was throwing me for a loop, unfortunately am hitting memory issues with the update. I started seeing in the syslog-> Out of memory: Kill process 8398 (acdcli) score 850 or sacrifice child. So I doubled the ram from 4 to 8GB but continue to see acdcli use more memory than available. This is running on Ubuntu 16.04 if that makes any difference. How much RAM are you assuming for the small syncs?

@bgemmill
Copy link
Author

@BabyDino: I'm not familiar with CP; do you see the same behavior with rsync?

@Hoppersusa: 1G on the small syncs, same as the write-back cache size. Above that it goes to the disk, or at least should according to SpooledTemporaryFile:
https://docs.python.org/3/library/tempfile.html
That also only happens once per sync call too. Are you seeing this during sync calls?

@BabyDino
Copy link

@bgemmill I will test rsync. For CP, this is their source code:

log.info('Copying "%s" to "%s"', (old, dest))
shutil.copy(old, dest)

The shutil.copy() takes forever. I am not familiar with Python, so I am not sure how that is handled at file system level.

@Hoppersusa
Copy link

@bgemmill Thank you for the detail, it does not appear to be during sync calls. I don't believe it is the SpooledTemporaryFile change that is causing a problem. It appears when writes are made, the system allocates memory but the memory is not released once the write to amazon is completed. I increased the memory to 16GB and acdcli consumed all of the memory again. I rolled back to commit 26325db and could not reproduce the issue (although the performance is not as good as your current commit) but I could reproduce on the builds since. Hoped it was something on my system but have not been able to find a cause. The commit that did not run out of memory may just be because it takes longer to process the files. Is there anyway to see lower level detail than the standard acdcli debug output?


r = self.BOReq.post(self.metadata_url + 'nodes', acc_codes=acc_codes, data=body_str)
r = self.BOReq.post(self.metadata_url + 'nodes', acc_codes=acc_codes, data=body_str)
if r.status_code in RETRY_CODES: continue # the fault lies not in our stars, but in amazon
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lead theoretically to an endless loop. A max retry count could avoid that.
In conditions were the amazon server gives always a 500 back this will never end and will useless only stress the client and the server.

Copy link
Author

@bgemmill bgemmill Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, Amazon kicks the client well before any infinite loops happen, usually with an error in the 400 range that we're deliberately not catching. I've seen errors in the family of requests/second, unable to process, and sometimes just unavailable when amazon goes down for short periods and needs to kick clients. We're not catching those on purpose!

Otherwise, for file system stability the goal here is to reduce errors we'd need to propagate upwards to the user.

If you have an example of a non-ending loop, please send me the response codes involved and I'll change the set we retry over.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is that the amazon server always returns a 500 for every retry. My comment was meant to change the code in a way to be prepared for the exceptional case. We all make mistakes and therefore it is possible that the loop can happen in the future. To have a limit for retry would allow handling such situations gracefully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agreed there are things that could break. At the moment I'm focusing on the things that are breaking. If you have an acdcli log of an infinite loop happening in practice, please post it!

@bgemmill
Copy link
Author

@Hoppersusa are you sure that the set of operations you're using on files ultimately releases the file handles? I'd be interested if you can repeat this error with just dd and rsync or cp. Looking at proc for acdcli on my setup, even during an rsync of a huge tree my high water mark doesn't get too far over 1G, which fits with the 1G default for the spooled temporary files we use for write back caching.

Is it possible you're doing multithreaded writing and have many of these write back caches open at once?

@ro345
Copy link

ro345 commented Mar 23, 2017

@bgemmill At some point your branch stopped working with the "subdir" option that I use. If I use the official branch from yadayada it seems to work. If I omit the subdir option I can mount on the root. Here is the mount command that I use:

/usr/local/bin/acd_cli -nl mount -i 60 --allow-other --modules="subdir,subdir=/acd_mount" /mnt/acd_mount_enc

Here is how it mounts (ls -al /mnt/acd_mount_enc):

drwxrwxr-x 1 xbox xbox    0 Mar 23  2017 .
drwxrwxrwx 7 root root 4096 Mar 21 12:32 ..
?????????? ? ?    ?       ?            ? acd_mount
?????????? ? ?    ?       ?            ? Documents
?????????? ? ?    ?       ?            ? Pictures
?????????? ? ?    ?       ?            ? Videos

It attempts to mount it on the root of ACD rather than the subdir, and even then it isnt mounted properly.

If someone can tell me how to roll back commits and install older versions of code, I could figure out which commit this occurred on. It seems like it has happened within the last two weeks.

BTW, thanks for all of the great work on this. Being able to use rsync has been a great help for me.

@bgemmill
Copy link
Author

@ro345 thanks for pointing it out; I'm pretty sure the failure comes from the work I was doing on memory caching to speed up operations. I probably won't have time to look at this until next week or so. My money is on this commit:
1d68ecf
Since it'll continue resolving upwards until it runs out of parents, which is probably not what subdir expects.

If you want to verify, check out my repo in git, git checkout 1d68ecfc22e2e5b2aa6a310e7ece6fde800beda7 where the hash is any commit you want to test, and call acd_cli.py mount from inside that git checkout.

@bgemmill
Copy link
Author

@ro345 turns out that the fuse subdir module was the culprit. Let me know if that happens again.

@ro345
Copy link

ro345 commented Apr 1, 2017

Thank you for the fix, really appreciate it. I probably cant try it for for a few days, but will post back if there are issues.

@SchnorcherSepp
Copy link

Hi!

I have used acd_cli from yadayada and have many files in the amazon cloud.
Today I wanted to try the bgemmill customized version because it sounds really cool and have a few questions.

Installation

sudo apt install fuse python3-pip git-core python3-appdirs python3-colorama python3-dateutil python3-requests python3-sqlalchemy
sudo pip3 install --upgrade git+https://github.com/bgemmill/acd_cli.git

First steps

first i create the oauth_data file

then acd_cli init and this error happens:

Traceback (most recent call last):
  File "/usr/local/bin/acd_cli", line 9, in <module>
    load_entry_point('acdcli==0.3.2', 'console_scripts', 'acd_cli')()
  File "/usr/local/bin/acd_cli.py", line 1664, in main
    if not check_cache():
  File "/usr/local/bin/acd_cli.py", line 1295, in check_cache
    if not cache.resolve('/'):
  File "/usr/local/lib/python3.5/dist-packages/acdcli/cache/query.py", line 207, in resolve
    self.node_id_to_node_cache[r.id] = r
AttributeError: 'NoneType' object has no attribute 'id'

after acd_cli sync with [ERROR] [acd_cli] - Root node not found. Sync may have been incomplete.
and acd_cli psync /
all works fine.

Is this a normal behavior?

Errors in log

mount with acd_cli mount --allow-other mount

If i use ls -lha this error [Errno 61] No data availabl is in the logfile

17-04-17 15:25:37.349 [DEBUG] [acdcli.acd_fuse] - -> opendir /test ()
17-04-17 15:25:37.349 [DEBUG] [acdcli.acd_fuse] - <- opendir 0
17-04-17 15:25:37.349 [DEBUG] [acdcli.acd_fuse] - -> getattr /test (None,)
17-04-17 15:25:37.350 [DEBUG] [acdcli.acd_fuse] - <- getattr {'st_ctime': 1492428231.08, 'st_uid': 1000, 'st_mode': 16893, 'st_atime': 1492435537.3501751, 'st_gid': 1000, 'st_nlink': 1, 'st_mtime': 1492428231.08}
17-04-17 15:25:37.350 [DEBUG] [acdcli.acd_fuse] - -> readdir /test (0,)
17-04-17 15:25:37.352 [DEBUG] [acdcli.acd_fuse] - <- readdir ['.', '..', 'test.h']
17-04-17 15:25:37.352 [DEBUG] [acdcli.acd_fuse] - -> getattr /test (None,)
17-04-17 15:25:37.352 [DEBUG] [acdcli.acd_fuse] - <- getattr {'st_ctime': 1492428231.08, 'st_uid': 1000, 'st_mode': 16893, 'st_atime': 1492435537.3528886, 'st_gid': 1000, 'st_nlink': 1, 'st_mtime': 1492428231.08}
17-04-17 15:25:37.353 [DEBUG] [acdcli.acd_fuse] - -> getxattr /test ('security.selinux',)
17-04-17 15:25:37.353 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.353 [DEBUG] [acdcli.acd_fuse] - -> getxattr /test ('system.posix_acl_access',)
17-04-17 15:25:37.354 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.354 [DEBUG] [acdcli.acd_fuse] - -> getxattr /test ('system.posix_acl_default',)
17-04-17 15:25:37.354 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.355 [DEBUG] [acdcli.acd_fuse] - -> getattr / (None,)
17-04-17 15:25:37.355 [DEBUG] [acdcli.acd_fuse] - <- getattr {'st_ctime': 1461868358.03, 'st_uid': 1000, 'st_mode': 16893, 'st_atime': 1492435537.3557737, 'st_gid': 1000, 'st_nlink': 1, 'st_mtime': 1492426619.83}
17-04-17 15:25:37.356 [DEBUG] [acdcli.acd_fuse] - -> getxattr / ('security.selinux',)
17-04-17 15:25:37.356 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.356 [DEBUG] [acdcli.acd_fuse] - -> getxattr / ('system.posix_acl_access',)
17-04-17 15:25:37.357 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.357 [DEBUG] [acdcli.acd_fuse] - -> getxattr / ('system.posix_acl_default',)
17-04-17 15:25:37.357 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.357 [DEBUG] [acdcli.acd_fuse] - -> getattr /test/test.h (None,)
17-04-17 15:25:37.358 [DEBUG] [acdcli.acd_fuse] - <- getattr {'st_uid': 1000, 'st_gid': 1000, 'st_size': 18, 'st_blocks': 1, 'st_ctime': 1492428250.352, 'st_mode': 33204, 'st_blksize': 8192, 'st_atime': 1492435537.3582494, 'st_nlink': 1, 'st_mtime': 1492428252.616}
17-04-17 15:25:37.358 [DEBUG] [acdcli.acd_fuse] - -> getxattr /test/test.h ('security.selinux',)
17-04-17 15:25:37.358 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.359 [DEBUG] [acdcli.acd_fuse] - -> getxattr /test/test.h ('system.posix_acl_access',)
17-04-17 15:25:37.359 [DEBUG] [acdcli.acd_fuse] - <- getxattr '[Errno 61] No data available'
17-04-17 15:25:37.359 [DEBUG] [acdcli.acd_fuse] - -> releasedir /test (0,)
17-04-17 15:25:37.359 [DEBUG] [acdcli.acd_fuse] - <- releasedir 0

Is this a normal behavior?

thx for your attention

@bgemmill
Copy link
Author

@SchnorcherSepp That error looks like acdcli can't find the root node in the cache, and you mention that sync gave you a root node not found error; you'll want to run sync until it completes normally.

Since your nodes.db may be in a funny state now, I'd recommend deleting it and calling sync until sync completes normally.

# Conflicts:
#	acdcli/cache/schema.py
#	docs/contributors.rst
# Conflicts:
#	acdcli/cache/schema.py
#	docs/contributors.rst
@bgemmill
Copy link
Author

For everyone still following this PR, we're back with a caveat: amazon deletes properties from it's records for apps that have been banned. This means mtime/uid/gid/xattrs will be gone from your files, and naive rsync calls will attempt to re-transfer everything.

For future work I'm going to look at a few ways to automate setting properties on files with matching md5s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.