Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements for OpenWRT self-hosting of wyng backup archives #195

Open
tlaurion opened this issue May 19, 2024 · 14 comments
Open

Requirements for OpenWRT self-hosting of wyng backup archives #195

tlaurion opened this issue May 19, 2024 · 14 comments

Comments

@tlaurion
Copy link
Contributor

tlaurion commented May 19, 2024

Four differences spotted between ash/bash/tar/rm between busybox/std unix systems that fails on busybox with e7bc559

  1. dedup on send/arch-deduplication fails on ash based shell with string concat error (not tested otherwise)
  2. tar --no-same-owner doesn't exist -> -o
  3. tar -m doesn't exist (no keeping of modified timestamps) prevents pruning ops to succeed with busybox's tar
  4. rm -d (remove empty dir) doesn't exist under busybox

1 and 2 are easy to fix and have no known side effects being changed, where I guess 3 and 4 were optimizationz?

Code change looks like this against e7bc559

diff --git a/src/wyng b/src/wyng
index a67eb21..a87f4f4 100755
--- a/src/wyng
+++ b/src/wyng
@@ -3384,7 +3384,7 @@ def send_volume(storage, vol, curtime, ses_tags, send_all=False, benchmark=False
 
         # Finalize on VM/remote
         catch_signals()
-        dest.run(["rm -df "+sdir+" && mv -T "+sdir+"-tmp "+sdir
+        dest.run(["rm -rf "+sdir+" && mv -T "+sdir+"-tmp "+sdir
                  +" && mv "+vol.vid+"/volinfo.tmp "+vol.vid+"/volinfo"
                  +" && mv archive.ini.tmp archive.ini"
                  +(" && ( nohup sync -f . 2&>/dev/null & )" if options.maxsync else "")
@@ -3503,7 +3503,7 @@ def dedup_existing(aset):
 
     print(" linking...", end="", flush=True)
     do_exec( [dest.run_args([
-               +"/bin/cat >"+dest.dtmp+"/dest.lst.gz"
+               "/bin/cat >"+dest.dtmp+"/dest.lst.gz"
                +" && /usr/bin/python3 "+dest.dtmp+"/dest_helper.py dedup"
                ], destcd=dest.path),
               [CP.cat,"-v"],  [CP.tail,"--bytes=2000"]
@@ -3960,7 +3960,7 @@ def merge_sessions(volume, sources, target, clear_sources=False):
         volume.sessions[target].save_info(".tmp")
         cmds += [CP.tar, "-cf", "-", "../archive.ini", "../archive.ini.tmp", "merge.lst.gz",
                             target+"/manifest.z.tmp", target+"/info.tmp", "volinfo.tmp"]
-        dest_cmds += " && tar --no-same-owner -xmf -"
+        dest_cmds += " && tar -o -xf -"
 
     # Start merge operation on dest
     catch_signals()

Is 3 really wanted/needed? Tested working on openwrt with qubes-ssh destination.

Can do PR but wanted to clarify 3, since awk output could be used to replicate tar -m behavior if really needed.

@tasket
Copy link
Owner

tasket commented May 19, 2024

  1. I'm not sure how to go about testing with ash. Perhaps during testing I could prepend remote commands with busybox ash ?
  2. Sorry about that. I didn't realize there was more than one occurrence of --no-same-owner in there. Easy fix.
  3. IIRC -m was added to the commands almost as a cosmetic fix. The idea was to only have the dest system's timestamps on archive files. It certainly can work without -m (its not critical) but I'd still prefer to use the remote system's time if possible.

@tlaurion
Copy link
Contributor Author

  1. I'm not sure how to go about testing with ash. Perhaps during testing I could prepend remote commands with busybox ash ?

I don't think it's your problem.
It's not just about ash, but depends of what is packed into busybox. I can do regression test on that ecosystem when we move forward.
With this patch, everything seems to work as of now, more testing needed of course.

  1. Sorry about that. I didn't realize there was more than one occurrence of --no-same-owner in there. Easy fix.

Didn't see it either.

  1. IIRC -m was added to the commands almost as a cosmetic fix. The idea was to only have the dest system's timestamps on archive files. It certainly can work without -m (its not critical) but I'd still prefer to use the remote system's time if possible.

Alternative @tasket? I have none but to pipe tar to awk which would lower performance.

@tlaurion
Copy link
Contributor Author

Added 4 in OP: rm -df (-d: delete empty dir) doesn't exist under busybox

@tasket
Copy link
Owner

tasket commented May 20, 2024

Added 4 in OP: rm -df (-d: delete empty dir) doesn't exist under busybox

I'll review the need for it. When it was proposed as a fix, there was a mkdir creating a spurious dir and that no longer seems to be the case. Ref issue #175

@tasket
Copy link
Owner

tasket commented May 20, 2024

Incidentally, the tar in Debian's busybox includes -m:

$ tar
BusyBox v1.35.0 (Debian 1:1.35.0-4+b3) multi-call binary.

Usage: tar c|x|t [-ZzJjahmvokO] [-f TARFILE] [-C DIR] [FILE]...

Create, extract, or list files from a tar file

        c       Create
        x       Extract
        t       List
        -f FILE Name of TARFILE ('-' for stdin/out)
        -C DIR  Change to DIR before operation
        -v      Verbose
        -O      Extract to stdout
        -m      Don't restore mtime
        -o      Don't restore user:group
        -k      Don't replace existing files
        -Z      (De)compress using compress
        -z      (De)compress using gzip
        -J      (De)compress using xz
        -j      (De)compress using bzip2
        --lzma  (De)compress using lzma
        -a      (De)compress based on extension
        -h      Follow symlinks
        --overwrite             Replace existing files
        --strip-components NUM  NUM of leading components to strip
        --no-recursion          Don't descend in directories
        --numeric-owner         Use numeric user:group
        --no-same-permissions   Don't restore access permissions
        --to-command COMMAND    Pipe files to COMMAND
`

tasket added a commit that referenced this issue May 20, 2024
Handle cleanup earlier in arch_check()

Track snapshot parents

Remove rm -df, now unneeded, issue #195

tar: remove -m and use -o option, issue #195
@tlaurion
Copy link
Contributor Author

tlaurion commented May 20, 2024

@tasket OpenWrt would be unfortunately more representative of the embedded world way of configuring/building busybox:


BusyBox v1.36.1 (2024-05-17 09:51:14 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.3, r23809-234f1a2efa
 -----------------------------------------------------
root@Insurgo-LabRouter:~# tar --help
BusyBox v1.36.1 (2024-05-17 09:51:14 UTC) multi-call binary.

Usage: tar c|x|t [-zahvokO] [-f TARFILE] [-C DIR] [-T FILE] [-X FILE] [FILE]...

Create, extract, or list files from a tar file

	c	Create
	x	Extract
	t	List
	-f FILE	Name of TARFILE ('-' for stdin/out)
	-C DIR	Change to DIR before operation
	-v	Verbose
	-O	Extract to stdout
	-o	Don't restore user:group
	-k	Don't replace existing files
	-z	(De)compress using gzip
	-a	(De)compress based on extension
	-h	Follow symlinks
	-T FILE	File with names to include
	-X FILE	File with glob patterns to exclude

Note that it's possible to install coreutils to openwrt, but if it's possible to not do that and rely on busybox instead, it would be better for compatibility reasons.

Otherwise I can also poke openwrt downstream to add that in their build recipe, I guess it's a question of a kb of additional binary size max under busybox, but not sure all busybox distributions (those are configured in for compilation) would follow.

@tasket
Copy link
Owner

tasket commented May 20, 2024

Yeah, this shows how using tar as a generic transport can be a problem: There is big pressure to cut down a 225K executable to 223K by excising options with very simple conditional logic. But then they include the full-fat gnu awk, django, numpy, etc. in their repo. Wild and crazy.

Taking the issue title into account "work on busybox + ash based systems", we see that even defining a target system is problematic because in this case it didn't provide a functional standard.

Please let me know if you've reached a functional state on your OpenWRT router with the changes I posted today. @tlaurion

@tlaurion
Copy link
Contributor Author

tlaurion commented May 20, 2024

Did send with dedup, arch-deduplucate and prune without issues yes @tasket !

@tlaurion
Copy link
Contributor Author

tlaurion commented May 20, 2024

Yeah, this shows how using tar as a generic transport can be a problem: There is big pressure to cut down a 225K executable to 223K by excising options with very simple conditional logic. But then they include the full-fat gnu awk, django, numpy, etc. in their repo. Wild and crazy.

Taking the issue title into account "work on busybox + ash based systems", we see that even defining a target system is problematic because in this case it didn't provide a functional standard.

Please let me know if you've reached a functional state on your OpenWRT router with the changes I posted today. @tlaurion

"their repo" being openwrt? Well the logic is simply that by default, openwrt targets embedded low end devices like routers, but nowadays also new fledged raspberries and also X86 targets to turn mostly everything into high end routers+switching devices, where low end devices with everything in flash and low memory defaults to busybox with its ash shell by default.

I still think openwrt represents the busybox(and it's ash shell) as a representative of what low end devices in the embedded world deploys by default. A quick check on the coreutils packages and description bounce back into reminding the end user that most of those tools are deployed into busybox, and each of those coreutils binaries can be selected by needs, where a big fat warning advise to not do that.

Happy for wyng-backup use case, there is a python 3-light which fits the need, without having to deploy the whole python3 full fledge suite which fits the need the same way. busybox is considered pretty limited in features, while support extends for compatibility with limitations. This is also why awk is available for those relying on it extensively in their scripts if needs be. It's the embedded world dictating this from really limited storage, to recently unlimited extended storage with nvme support directly on the motherboard.

Tldr: Openwrt original embedded devices as routers are now able to be NAS+vpn+suricata+router if one's want to. But busybox is still targeting the common lower denominator for low end router only platforms where people can chose with granularity what other special additional role they want to have there. The ACM3200 router still being one of tbe the most used router for such use cases if we look at attended-sysupgrade baked upgrade images for their all deployed openwrt supported platforms out there.

Edit: stats https://sysupgrade.openwrt.org/stats/public-dashboards/5f0750ebb59c4666a957dc4261f7b90e?orgId=1&refresh=1m

@tlaurion tlaurion changed the title Have wyng work on busybox + ash based systems Have wyng work on busybox + ash based systems (Openwrt routers that coukd be used for backup archives) May 20, 2024
@tlaurion tlaurion changed the title Have wyng work on busybox + ash based systems (Openwrt routers that coukd be used for backup archives) Have wyng work on busybox + ash based systems (Openwrt routers to self-host wyng backup archives) May 21, 2024
@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 3, 2024

Consider changing default sparse-write restore op (made for local archive receive/restore) to sparse and use-snapshot and compare backup restoration perf for networked archives

Ref tasket/wyng-util-qubes#30 (comment)

@tasket
Copy link
Owner

tasket commented Jun 3, 2024

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

@tlaurion tlaurion changed the title Have wyng work on busybox + ash based systems (Openwrt routers to self-host wyng backup archives) Requirements for OpenWRT self-hosting of wyng backup archives Jun 4, 2024
@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 4, 2024

Well: 300mb of available memory won't be enough to self-host softraid5+tor+dropbear+python3 in memory in send/backup operations.

OOM killer kills wifi at some point as well as mdadm.
512 MB+ with be hard requirement with quad core, making WRT3200ACM unfit candidate with its 2 core and 500mb.

2024-06-04-121114
2024-06-04-121043

@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 4, 2024

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

@tasket a missing keyword was missing in my past comment:

Consider changing default sparse-write restore op (made for local archive receive/restore) to sparse and use-snapshot and compare backup restoration perf for networked archives

Ref tasket/wyng-util-qubes#30 (comment)

What I meant is sparse instead of sparse-write in case of qubes-ssh scenario solely, where in my opinion it doesn't really make any sense to have sparse-write as default there.

@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 4, 2024

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

@tasket what do you mean by certain connection type? You mean link bandwidth? Once again, from memory, I understood that

  • sparse-write: everything goes through the link. If link is slow, compression with --ssh-opt -C helps, where "slow" here means that the bottleneck is the link speed, not that resource of neither dom0/hdd/ssd/back end server
  • sparse: cpu is involved into doing more calculation from meta-data on the local host (dom0) end and where link throughput is consumed only for the delta bits being sent on the link, where again if link speed is the bottleneck, ssh compression helps. reminder here again that the bottleneck most of the time here is in the co-dependencies of routing under QubesOS. This normally means that link speed is slowed down if host becomes slower because archive qube-ssh still need to talk through sys-firewall+sys-net minimally, where a vpn might also go in the way and then the local router and IPS link and then archive hoster. The local tests here is conservative in the sense that we bypass the variation of the hoster being on the internet, and the network bottlenecks are reduced to wifi/ethernet (there is a difference) and then the hoster being a little low end for the taks (but where a RPI4b would totally be up the task and where WRT3200ACM could be if raid5 was dodged as a requirement, but that is in scope of my PoC and I do not want to compromise on that).

Otherwise, could you please refresh my memory/source the conclusions held here for normal/sparse/sparse-write and connection types?

I think last time we talked about it was a failed attempt on my side of finding proper hosting which unfortunately lead to a PoC for bash script which ran over sshfs mounted ssh endpoint. This was suboptimal for many reasons. My PoC went in many directions there as well, because IO were the bottleneck as well as link speed and hardware IO limits on older hardware which if local hardware is slowed down at local IO (SATA2 SSD+sshfs mouted loop device because rsync.net was a Unix server not offering python3) where all "testing" went useless.

Wyng is not at the same place as before now where other hosters could probably be tested. But this is not the thread for that.

Here i'm simply trying to wrap up any misunderstanding I could still have on current state of normal/sparse/sparse-write in case archives are definitely on slower "link" in qubes-ssh mode, as opposed to be local disk to dom0 (ssparse-write there is blazing fast no doubt) or "qubes" mode. But in case of qubes-ssh, I feel I'm missing something from your response: I ddon't get why qube-ssh would benefit in any case from sparse-write here, unless someone is being lucky/rich and have a real local hoster which link speed would not be the bottleneck and writing to ssh server would be par to writing into qube or dom0, in which case I? just think it's physically impossible. No?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants