Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guest/net: New implementation of network setup with SLAAC and own DHC… #111

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sbrivio-rh
Copy link
Contributor

…P client

The existing implementation has a couple of issues:

  • it doesn't support IPv6 or SLAAC

  • it relies on either dhclient(8) or dhcpcd(8), which need a significant amount of time to configure the network as they are rather generic DHCP clients

  • on top of this, dhcpcd, by default, unless --noarp is given, will spend five seconds ARP-probing the address it just received before configuring it

Replace the IPv4 part with a minimalistic, 73-line DHCP client that just does what we need, using option 80 (Rapid Commit) to speed up the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying on the kernel to perform SLAAC. Safely avoid DAD (we're the only node on the link) by disabling router solicitations, starting SLAAC, and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait until everything is configured, so that connectivity is guaranteed to be ready before any further process runs in the guest, say:

$ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

--- 2a01:4f8:222:904::2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both IPv4 and IPv6), with the DHCP exchange and configuration taking somewhere around 300-500 µs out of that, instead of hundreds of milliseconds to seconds.

Matching support in passt for option 80 (RFC 4039) and for the DHCP "broadcast" flag (RFC 2131) needs this series:

https://archives.passt.top/passt-dev/[email protected]/

[I'll update this commit message once we have an upstream release with it]

@sbrivio-rh
Copy link
Contributor Author

Supersedes #64. Test passt static builds, along with RPMs and Debian packages (x86 only, sorry) at https://passt.top/builds/latest/x86_64/

crates/muvm/Cargo.toml Outdated Show resolved Hide resolved
@sbrivio-rh sbrivio-rh force-pushed the main branch 2 times, most recently from ca7ea9c to e1eb9df Compare November 25, 2024 18:42
@sbrivio-rh
Copy link
Contributor Author

Hmm, I can't test this anymore after rebasing to latest upstream. I'm getting one of these two errors ("Failed to create the microVM" about 30% of the times, Failed to execute muvm-server as child process about 70%). These are attempts in a relatively tight loop:

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

With RUST_LOG=debug:

[2024-11-25T19:05:58Z ERROR krun] Building the microVM failed: GuestMemoryMmap(UnsortedMemoryRegions)

or:

[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 3
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 15
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] read: 56
Error: [2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 25
Failed to execute `muvm-server` as child process

I reverted a few commits but I can't seem to get this to work anymore.

@sbrivio-rh
Copy link
Contributor Author

Oh, okay, I didn't pull for a while. It works if I do:

284b520 (HEAD) Revert "Share /dev/shm as a separate mount with DAX"
44ef0b6 Revert "Protect muvm-server with a cookie"
cdf7027 Revert "Start a privileged muvm-server"
df6c549 Revert "Implement a host memory monitor"

...posting another version of that commit with two functions taken out of configure_network() now that I can test things. Those other issues I'm facing, I have no idea where to start debugging them...

@sbrivio-rh
Copy link
Contributor Author

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

Fixed by #112

$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)

Fixed by updating libkrun.

@sbrivio-rh
Copy link
Contributor Author

This is now supported by passt 2024_11_27.c0fbc7e, matching Fedora updates passt-0^20241127.gc0fbc7e-1.fc40, passt-0^20241127.gc0fbc7e-1.fc41, passt-0^20241127.gc0fbc7e-1.fc42, as well as Debian's passt-0.0~git20241127.c0fbc7e-1.

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

Thanks a lot @sbrivio-rh , I really like this approach. A couple questions:

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.
  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

@sbrivio-rh
Copy link
Contributor Author

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.

Ah, right, I thought resolv.conf could simply be the one from the host, but with systemd-resolved it's common to have a loopback address as resolver, and that wouldn't work.

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Oops, I didn't check. I'm not exactly the right person as I barely understand what a crate is (do I?) and I don't even use Fedora regularly, but yes, I can probably do that, it should be low effort.

I found this abandoned Copr by the way, https://copr.fedorainfracloud.org/coprs/zurdo/i3status-rs-update/package/rust-neli/, it looks pretty easy. Let me give it a try, but if you know somebody else who could be interested...

@sbrivio-rh
Copy link
Contributor Author

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

@sbrivio-rh
Copy link
Contributor Author

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

I just added support for nameservers over DHCP (option 6), omitting for the moment:

  • handling of the domain search list (option 119) also implemented by passt. It's at least 10-20 LoCs due to domain compression encoding from RFC 1035, and I doubt it's of any use for muvm... we can add it later for completeness anyway but I'd rather fix the current situation first
  • handling of NDP option 25 (RDNSS), implemented by passt as well, for DNS resolvers over SLAAC. That's also some extra bit of complexity. I think this one makes sense no matter what, for IPv6-only setups, but I'd rather avoid to make this patch explode at the moment

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

Gosh, the reformatted version looks horrible, with 100 columns that don't fit pretty much anywhere and things wildly misaligned. Is cargo fmt enforced?

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

Package reviews:
https://bugzilla.redhat.com/show_bug.cgi?id=2329411
https://bugzilla.redhat.com/show_bug.cgi?id=2329412

@sbrivio-rh sbrivio-rh force-pushed the main branch 2 times, most recently from 0030cc1 to af9afc7 Compare November 29, 2024 16:49
…P client

The existing implementation has a couple of issues:

- it doesn't support IPv6 or SLAAC

- it relies on either dhclient(8) or dhcpcd(8), which need a
  significant amount of time to configure the network as they are
  rather generic DHCP clients

- on top of this, dhcpcd, by default, unless --noarp is given, will
  spend five seconds ARP-probing the address it just received before
  configuring it

Replace the IPv4 part with a minimalistic, 90-line DHCP client that
just does what we need, using option 80 (Rapid Commit) to speed up
the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying
on the kernel to perform SLAAC. Safely avoid DAD (we're the only
node on the link) by disabling router solicitations, starting SLAAC,
and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait
until everything is configured, so that connectivity is guaranteed to
be ready before any further process runs in the guest, say:

  $ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
  PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
  64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

  --- 2a01:4f8:222:904::2 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both
IPv4 and IPv6), with the DHCP exchange and configuration taking
somewhere around 300-500 µs out of that, instead of hundreds of
milliseconds to seconds.

Configure nameservers received via DHCP option 6 as well: passt
already takes care care of translating DNS traffic directed to
loopback addresses read from resolv.conf, so we can just write those
to resolv.conf in the guest.

At least for the moment being, for simplicity, omit handling of
option 119 (domain search list), as I doubt it's going to be of much
use for muvm.

I'm not adding handling of the NDP RDNSS option (25, RFC 8106) either,
for the moment, as it involves a second netlink socket subscribing to
the RTNLGRP_ND_USEROPT group and listening to events while we receive
the first router advertisement. The equivalent userspace tool would be
rdnssd(8), which is not called before this change anyway. I would
rather add it at a later time instead of making this patch explode.

Matching support in passt for option 80 (RFC 4039) and for the DHCP
"broadcast" flag (RFC 2131) needs at least passt 2024_11_27.c0fbc7e:

  https://archives.passt.top/passt-user/20241127142126.3c53066e@elisabeth/

Signed-off-by: Stefano Brivio <[email protected]>
Co-authored-by: Teoh Han Hui <[email protected]>
@slp
Copy link
Collaborator

slp commented Dec 2, 2024

LGTM, but please let's wait to merge this one until the packaging stuff is solved, so we aren't blocked on making releases.

@teohhanhui
Copy link
Collaborator

packaging stuff

Now we have one more crate to package: const_str 😆

@sbrivio-rh
Copy link
Contributor Author

packaging stuff

Now we have one more crate to package: const_str 😆

Gosh. I can try to do it as well, but that will take even longer... is it really worth it? Should we consider temporarily going back to my original version that didn't use it?

@teohhanhui
Copy link
Collaborator

Gosh. I can try to do it as well, but that will take even longer... is it really worth it? Should we consider temporarily going back to my original version that didn't use it?

There's another way. We could create a muvm_macros crate and put concat_bytes in there?

@sbrivio-rh
Copy link
Contributor Author

There's another way. We could create a muvm_macros crate and put concat_bytes in there?

I don't know what the difference would be. I'm trying to build a spec file for const_str meanwhile, currently struggling with the license file.

@teohhanhui
Copy link
Collaborator

I don't know what the difference would be.

I mean copying the code into our own crate, if the license permits... I think that means we don't have to go through package review because it won't be a separate package at all?

@sbrivio-rh
Copy link
Contributor Author

I think that means we don't have to go through package review because it won't be a separate package at all?

True but...

I mean copying the code into our own crate

from a maintainability perspective, I'd say that screen-scraping the output of passt (#64) is actually preferable to that.

@sbrivio-rh
Copy link
Contributor Author

Well, current spec file and corresponding error:

https://hastebin.skyra.pw/hudaqicari.prolog

I'm stopping here for a while.

@slp
Copy link
Collaborator

slp commented Dec 3, 2024

Well, current spec file and corresponding error:

https://hastebin.skyra.pw/hudaqicari.prolog

We had this problem with some of the rust-vmm packages. If the LICENSE file is in the root of the workspace, it doesn't get included in the crate. The trick here is adding the file as a source as done here:

https://src.fedoraproject.org/rpms/rust-virtio-queue/blob/rawhide/f/rust-virtio-queue.spec

@sbrivio-rh
Copy link
Contributor Author

Well, current spec file and corresponding error:
https://hastebin.skyra.pw/hudaqicari.prolog

We had this problem with some of the rust-vmm packages. If the LICENSE file is in the root of the workspace, it doesn't get included in the crate. The trick here is adding the file as a source as done here:

https://src.fedoraproject.org/rpms/rust-virtio-queue/blob/rawhide/f/rust-virtio-queue.spec

That's what I did there (I think?):

Source1:        https://raw.githubusercontent.com/Nugine/const-str/00ce0d143164e1e393a5dcdda6ab2f0dbae9a243/LICENSE

[...]

%license %{crate_instdir}/LICENSE

...there must be some subtle detail I'm missing.

@sbrivio-rh
Copy link
Contributor Author

That's what I did there (I think?):

Source1:        https://raw.githubusercontent.com/Nugine/const-str/00ce0d143164e1e393a5dcdda6ab2f0dbae9a243/LICENSE

[...]

%license %{crate_instdir}/LICENSE

...there must be some subtle detail I'm missing.

Grr, of course. From the rust-virtio-queue spec file:

%prep
%autosetup -n %{crate}-%{version} -p1
%cargo_prep
cp %{SOURCE1} .
cp %{SOURCE2} .

Thanks @slp for the example.

@sbrivio-rh
Copy link
Contributor Author

Fedora package review request for rust-const-str at: https://bugzilla.redhat.com/show_bug.cgi?id=2330150

@teohhanhui
Copy link
Collaborator

Looks like the merge conflicts have not been resolved... 🙈

@sbrivio-rh
Copy link
Contributor Author

sbrivio-rh commented Jan 2, 2025

Looks like the merge conflicts have not been resolved... 🙈

I just clicked around randomly, I'm anyway not able to test it anymore because of #117 (comment) and a new Error: Address already in use (os error 98) coming out of nowhere (no EADDRINUSE from strace, and extra difficult to bisect because of that commit I need to keep reverted plus Cargo.lock committed in the repository... now writing some scripts to handle that at least for my git sanity).

I'll push fixed conflicts once I can test this again.

@sbrivio-rh
Copy link
Contributor Author

and a new Error: Address already in use (os error 98) coming out of nowhere

Oh, that's because my usual trick of commenting out setgid(gid).context("Failed to setgid")?; and setuid(uid).context("Failed to setuid")?; in setup_user() to get a root shell conveniently (I need to do stuff with iproute2 in the guest) doesn't work anymore. The second instance of muvm-server fails to start.

On the other hand, if I skip starting it, then everything hangs...

@WhatAmISupposedToPutHere
Copy link
Collaborator

WhatAmISupposedToPutHere commented Jan 2, 2025

Looks like the merge conflicts have not been resolved... 🙈

I just clicked around randomly, I'm anyway not able to test it anymore because of #17 (comment) and a new Error: Address already in use (os error 98) coming out of nowhere (no EADDRINUSE from strace, and extra difficult to bisect because of that commit I need to keep reverted plus Cargo.lock committed in the repository... now writing some scripts to handle that at least for my git sanity).

I'll push fixed conflicts once I can test this again.

Try using --sommelier, it might on a headless machine that way (sorry for close/reopen, literally a misclick)

@sbrivio-rh
Copy link
Contributor Author

sbrivio-rh commented Jan 2, 2025

Try using --sommelier, it might on a headless machine that way

...thanks, yes, that also works because it reverts the effect of commit b40bf03d2ce4 ("Default to using x11bridge instead of sommelier"), and I guess that the sommelier path has a proper fallback path (I didn't really check): I don't have sommelier installed.

On the other hand I guess we should have the same fallback with no options at all (at least as long as muvm is supposed to be a generic tool).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants