-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not copy files into archive #408
Comments
anarcat
added a commit
to anarcat/pywb
that referenced
this issue
Nov 13, 2018
This allows users to manage collections of large WARC files without duplicating space. Hardlinks are used instead of symlinks to reflect the original mechanism, where the file is copied (so it can be safely removed from the source). If we used symlinks, we would break that expectation which could lead to data loss. Inversely, hardlinks can lead to data loss as well. For example, pywb could somehow edit the file, which would modify the original as well. But we assume here pywb does not modify the file, and each side of the hardlink can have their own permissions to ensure this (or not) as well. Closes: webrecorder#408
This was referenced Nov 13, 2018
On copy-on-wire filesystems like btrfs it would be nice if we could use btrfs's reflinks. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
I find it difficult to use pywb on large datasets because the files are copied into the collections instead of just "referenced" there.
Describe the solution you'd like
When I add a file to a collection, it should be just treated as if it's in the collection somehow, without having to copy gigantic files around.
Describe alternatives you've considered
I have looked for options in the
wb-manager
program to see if something could fit the bill, particularly a way to symlink or hardlink files around. Haven't found anything. I also looked at the auto-indexer but i'm not sure how that works.I also know about the
wb-manager index
command, but that only works for files already in the archive directory.I also suspect custom user-defined collections might fit the bill, but I haven't figured out how to use those just yet, plus they probably require restarting the wayback process every time since a special configuration needs to be made for every archive...
The text was updated successfully, but these errors were encountered: