Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make-disk-image: improve xargs invocation #910

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

danjujan
Copy link
Contributor

@danjujan danjujan commented Dec 4, 2024

When the disk image is too small, cp fails because it is out of space. However, xargs does not abort immediately but still invokes all remaining cp commands, which will of course also fail. That can lead to long waiting times until the make-disk-image script exits with a failure. This PR decreases the time to failure by letting xargs abort directly by causing cp to fail with exti code 255.

https://www.man7.org/linux/man-pages/man1/xargs.1.html#DESCRIPTION

       If any invocation of the command exits with a status of 255,
       xargs will stop immediately without reading any further input.
       An error message is issued on stderr when this happens.

Additionally allow xargs to invoke cp nproc times in parallel with -P 0 as argument. Since by default the vm uses only one core, this does not change anything. However, for custom imageBuilder.qemu options like -smp 4 this can cause a major speedup. For example, I observed a time reduction from 22 to 12 minutes when building an image with aarch64 binfmt emulation.

Copy link
Member

@Enzime Enzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

the flags -P and -I both exist on macOS as well 👍

@Enzime
Copy link
Member

Enzime commented Dec 4, 2024

@mergify queue

Copy link
Contributor

mergify bot commented Dec 4, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #910 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@@ -94,7 +94,7 @@ let

# We copy files with cp because `nix copy` seems to have a large memory leak
mkdir -p ${systemToInstall.config.disko.rootMountPoint}/nix/store
xargs cp --recursive --target ${systemToInstall.config.disko.rootMountPoint}/nix/store < ${closureInfo}/store-paths
xargs -P 0 -I {} sh -c 'cp --recursive --target ${systemToInstall.config.disko.rootMountPoint}/nix/store "$1" || exit 255' sh {} < ${closureInfo}/store-paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not run than one cp for every store path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently yes. We could wrap the cp invocation in a short script with $@ to avoid that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that to reduce the overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by e6c2ca8

@Mic92
Copy link
Member

Mic92 commented Dec 4, 2024

@mergify dequeue

Copy link
Contributor

mergify bot commented Dec 4, 2024

dequeue

✅ The pull request has been removed from the queue default

@Mic92
Copy link
Member

Mic92 commented Dec 4, 2024

Btw we also have parallel copy implemented here: #788

@danjujan
Copy link
Contributor Author

danjujan commented Dec 5, 2024

Btw we also have parallel copy implemented here: #788

I have seen that. However, this does not do anything about aborting early in out of space situations which was my main motivation for this PR.

@Mic92
Copy link
Member

Mic92 commented Dec 5, 2024

I also tried xargs -P 0 but so far I wasn't able to get any speed up for it. It only got slower. Do you have an example?

@Mic92
Copy link
Member

Mic92 commented Dec 5, 2024

Also if the disk is full, how long does it actually take for cp to finish? Should this not rather quickly?

@danjujan
Copy link
Contributor Author

danjujan commented Dec 5, 2024

I also tried xargs -P 0 but so far I wasn't able to get any speed up for it. It only got slower. Do you have an example?

No public example code but I build a 2GB sized aarch64-linux image on my x86_64-linux host. Just tested it again.
12 mins with xargs -P 0 and 8 cores.
18 mins with just xargs and 8 cores.
22 mins with just xargs and 1 core.

Also if the disk is full, how long does it actually take for cp to finish? Should this not rather quickly?

cp exits instantly. The problem is that xargs keeps invoking cp with the remaining arguments. I've observed 1-5 mins delay because of that depending at which point the disk gets full.

@danjujan
Copy link
Contributor Author

danjujan commented Dec 6, 2024

Made the xargs flags configurable. Since I had to use -L 100 after dropping -I to get to the 12 minutes. Just -P 0 without a limit took 16 mins for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants