-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with using Bees.in setup script #254
Comments
Well, I'm using bees as a service without the wrapper script. You could use that as a starting point. It sets up some scheduling and resource parameters (which you probably don't need for your use case), and it statically adds the parameters which are otherwise dynamically created by the wrapper script: # /etc/systemd/system/bees.service
[Unit]
Description=Bees
Documentation=https://github.com/Zygo/bees
After=local-fs.target
RequiresMountsFor=/mnt/btrfs-pool
[Service]
Type=simple
Environment=BEESSTATUS=%t/bees/bees.status
ExecStart=/usr/libexec/bees --no-timestamps --strip-paths --thread-count=6 --scan-mode=3 --loadavg-target=5 --verbose=5 /mnt/btrfs-pool
CPUSchedulingPolicy=idle
IOSchedulingClass=idle
IOSchedulingPriority=7
KillMode=control-group
KillSignal=SIGTERM
Nice=19
Restart=on-abnormal
ReadWritePaths=/mnt/btrfs-pool
RuntimeDirectory=bees
StartupCPUWeight=25
WorkingDirectory=/run/bees
# Runtime hardening
ProtectProc=invisible
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateNetwork=true
PrivateIPC=true
ProtectHostname=true
ProtectKernelTunables=true
ProtectControlGroups=true
AmbientCapabilities=CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_FOWNER CAP_SYS_ADMIN
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/bees.service.d/override.conf
[Service]
Slice=maintenance.slice
CPUSchedulingPolicy=batch
IOWeight=10
StartupIOWeight=25
MemoryLow=2G So you could simply run the |
Thanks for the info... i tried this out and i got the following error when trying to start the service. [root@HOSTNAME system]# systemctl status bees Apr 04 17:45:47 HOSTNAME bees[8148]: 2023-04-04 17:45:47 8148.8148<6> bees: set_root_path /var/lib/bees/f56b314c-db00-47ab-b3bb-222ed702682d also here is my service config file [Unit] [Service] Runtime hardeningProtectProc=invisible [Install] |
I'm mounting
Previously, the wrapper script would have tried to setup the mount. But if you're not using it, you'll have to mount it via fstab. |
thanks for the info and help.. appreciate it. I decided to change my plan and run it as a service using the scripts. i created the directory /run/bees and /etc/bees but i don't see where it created the service... or do i just manually copy the [email protected] files somewhere? not sure where they should go. |
update: i copied the "[email protected]" file to the /usr/lib/systemd/system folder the i ran systemctl daemon-reload but now i have this issue: (but getting closer) [root@HOSTNAME scripts]# systemctl status beesd@f56b314c-db00-47ab-b3bb-222ed702682d Apr 04 22:19:13 HOSTNAME bees[5225]: bees[5225]: setting root path to 'f56b314c-db00-47ab-b3bb-222ed702682d' |
Don't copy these files, create a directory In this file, create (and only these lines): [Service]
ExecStart=
ExecStart=<CORRECTEDCALLHERE> The empty
This way, your own adjustments will always derive from the system-installed service file, even after updates, and updates won't override your local adjustments. The problem you're facing is probably due to mount namespacing: The service is quite heavily locked down and does see only a few specific writable directories, maybe even in different locations than your host system. It expects the btrfs root mounted within You can use |
Ok I think I have most of this figured out.. thanks for the assist here. one final question.. my system has very little new writes.. so i am not too worried about write performance. the bees command line options are pretty straight forward.. is it a percentage of CPU... so 100 would mean to keep all cores busy all the time? what "system load" is it looking at? what number would i use to max out my system and scan as aggressively as possible? also how do I know if scanning is completed or how much has been scanned? |
loadavg is a standard (and very classic) system metric in Linux and Unix-likes. Traditionally, on Unix, it counted the average number of processes waiting in the scheduler queue and ready to run (state "runnable"). I think (but I may be wrong here), Linux later (hey, but probably in the 90s) added to that counter the number of processes waiting on resources other than just CPU (state "disk sleep"), OTOH maybe it had it from the beginning of its life time, I don't know. Technically, it's the average length of the scheduler run queue over a period of time today, counting processes in runnable and disk sleep states. So if your system has 4 cores, and your loadavg hovers around 4, it means you are optimally using the resources because (for CPU-bound tasks at least) there's always at least one process waiting to be run immediately per core. Go beyond that, and your system will go slower as it could be because it's starving of resources, stay below, and your system is able to immediately run the next process. Think of it as: higher loadavg = higher perceived latency of operation (although it doesn't measure latency but queue length which can mean any amount of random latency). The reality is a little different because loadavg does not count only a single resource (CPU, or IO, or memory), it's a mixed bag. But the general idea still holds true: If your system has 8 cores, run bees at a limit of 4 to make it hog a maximum fair share to 50% resources (whatever that mixture is). In a more practical manner, look at your loadavg while you're doing the usual stuff on your machine, and then set bees one or two numbers above that limit: This generally sets a more realistic value for bees to actually make progress without disturbing your workflow or performance. If you see your system blocking for IO, lower the bees limit a little bit, and if you feel like bees could do more without disturbing, raise it a little bit. (see The loadavg is measured by three values in this order: short average (last minute), medium average (last 5 min), long average (last 15 min). By these three values, you can determine if your load is trending up, down, or stays around the same. Modern kernels have some more values not of importance here: Modern Linux has a better measurement for resource usage, called PSI: pressure stall information. It will measure, in percent, how much time the system has been partially blocking processes (some), or blocked all processes (full, not available for CPU pressure for obvious reasons) when waiting on a particular resource, again in average over a period of time (10s, 60s, 300s) plus total number of microseconds waiting on a resource. If you read the values multiple times, you can use the total microseconds delta to compute the average wait time using the percentage values. (see
You really never know, the job as implemented in bees is designed as an endless job. But you can look at the generation number of your subvolumes before starting bees, and then run bees until its state file reaches this number. Technically, it has done a full pass then. But practically, it has created a lot new writes, and your system has created a lot new writes, thus resulting in a higher current generation number of the subvolume, so bees hasn't really seen all the data it could have seen while it was running. BTW: While any IO intensive task is active, IO is the dominating contributor to loadavg, so it is a good value for bees to throttle itself. In this case, you want to choose a value that more likely accounts for the number of spindles/disks rather than the number of CPU cores - so maybe choose something in between those two values (this pretends that all your data is optimally and equally distributed across all spindles for the access patterns of bees which is practically never the case). Or choose something like half of your spindles... You may need to experiment a little bit but I think you got the idea. |
I have bees successfully installed\built from source.
but I have been reading the documentation and I see info on how to configure Bees without the use of the bees.in script.
but I don't see any information on how to actually use the script.
--
do I even need to run it as a service?
what I would really like to do is juts manually run Bees as needed.. as my BTRFS file system has almost no changes happening.
Its juts an archive dump of old data. So even running it once a month or less to find duplicate blocks is more than enough.
any thoughts or guidance would be great.
The text was updated successfully, but these errors were encountered: