Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolver woes #17

Closed
teohhanhui opened this issue May 30, 2024 · 55 comments
Closed

DNS resolver woes #17

teohhanhui opened this issue May 30, 2024 · 55 comments

Comments

@teohhanhui
Copy link
Collaborator

teohhanhui commented May 30, 2024

In the VM, my /etc/resolv.conf looks like this:

; generated by /usr/sbin/dhclient-script
nameserver 192.168.0.1

Which is bad, because that's my router's IP address... (And not what's sent as DNS servers via DHCP.)

Others have reported no Internet access in the VM, probably related to this.

If I understand correctly, I think that's why krunvm set nameserver 1.1.1.1 as the default:
https://github.com/containers/krunvm/blob/5494d84a66bee3b802a0392cf8d662158ac7287d/src/main.rs#L51

But that's also not a good solution as it'd break local domains and search domains among other things...

@slp
Copy link
Collaborator

slp commented May 30, 2024

Which is bad, because that's my router's IP address... (And not what's sent as DNS servers via DHCP.)

That IP address points to passt's DNS server, which will in turn use the host's DNS (as configured in the host's view of /etc/resolv.conf. This is how it's expected to work.

Which kind of problems are you having with DNS resolution? We would need to diagnose them first to be able to look for a solution.

@teohhanhui
Copy link
Collaborator Author

That IP address points to passt's DNS server, which will in turn use the host's DNS (as configured in the host's view of /etc/resolv.conf. This is how it's expected to work.

Yeah, I realized that after looking around. It works for me anyway.

Which kind of problems are you having with DNS resolution? We would need to diagnose them first to be able to look for a solution.

@aqrln Could you share what's in your /etc/resolv.conf both on the host and inside the VM?

@dylanchapell is somehow left only with the search domain inside the VM... Probably a passt bug? https://gist.github.com/teohhanhui/042a395010d9946ceee14768736e3780?permalink_comment_id=5073025#gistcomment-5073025

@aqrln
Copy link

aqrln commented May 30, 2024

@teohhanhui

@aqrln Could you share what's in your /etc/resolv.conf both on the host and inside the VM?

Sorry I didn't have a chance to check yet (I needed to be on macOS since the beginning of the week), I saw your response to me in the gist as well. I'll get back to you asap next time I boot into Linux.

@aqrln
Copy link

aqrln commented May 30, 2024

I'm also not sure if the problem was with DNS in my case, if I remember correctly, I also tried pinging some IPs from within the VM and that didn't work either (although I could reach them from the host).

@teohhanhui
Copy link
Collaborator Author

@aqrln ping currently doesn't work, that's as expected

@aqrln
Copy link

aqrln commented May 30, 2024

Ah, okay, good to know!

Regarding my /etc/resolv.conf, I can post it in about 8-ish hours or tomorrow.

@RossComputerGuy
Copy link

This is my /etc/resolv.conf.

# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search localdomain tailde5a8.ts.net

@teohhanhui
Copy link
Collaborator Author

@RossComputerGuy Please also share /etc/resolv.conf from within the VM.

@RossComputerGuy
Copy link

Cannot due to #19 preventing me from running anything.

@aqrln
Copy link

aqrln commented May 31, 2024

@teohhanhui apologies for the delay, here's my /etc/resolv.conf from the host system:

# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search tail4df9f.ts.net localdomain

and inside the krun VM:

; generated by /usr/sbin/dhclient-script
search tail4df9f.ts.net. localdomain.

There are also the following warnings in krun output:

No IPv4 nameserver available for DHCP
No IPv6 nameserver available for NDP/DHCPv6

@aqrln
Copy link

aqrln commented May 31, 2024

An detail that stands out looking at @RossComputerGuy's resolv.conf above too is that we both are using Tailscale and have our tailnets in the search domains in /etc/resolv.conf.

@aqrln
Copy link

aqrln commented May 31, 2024

With this hack:

diff --git a/crates/krun-guest/src/net.rs b/crates/krun-guest/src/net.rs
index 3710bd5..27e4d37 100644
--- a/crates/krun-guest/src/net.rs
+++ b/crates/krun-guest/src/net.rs
@@ -1,4 +1,4 @@
-use std::{fs, os::unix::process::ExitStatusExt, process::Command};
+use std::{fs, io::Write, os::unix::process::ExitStatusExt, process::Command};

 use anyhow::{anyhow, Context, Result};
 use log::debug;
@@ -35,5 +35,10 @@ pub fn configure_network() -> Result<()> {
         Err(err)?;
     }

+    fs::File::options()
+        .append(true)
+        .open("/etc/resolv.conf")?
+        .write_all("nameserver 1.1.1.1\n".as_bytes())?;
+
     Ok(())
 }

DNS works inside the VM:

$ krun curl https://google.com
No IPv4 nameserver available for DHCP
No IPv6 nameserver available for NDP/DHCPv6
_XSERVTransmkdir: Owner of /tmp/.X11-unix should be set to root
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Could not resolve keysym XF86CameraAccessEnable
> Warning:          Could not resolve keysym XF86CameraAccessDisable
> Warning:          Could not resolve keysym XF86CameraAccessToggle
> Warning:          Could not resolve keysym XF86NextElement
> Warning:          Could not resolve keysym XF86PreviousElement
> Warning:          Could not resolve keysym XF86AutopilotEngageToggle
> Warning:          Could not resolve keysym XF86MarkWaypoint
> Warning:          Could not resolve keysym XF86Sos
> Warning:          Could not resolve keysym XF86NavChart
> Warning:          Could not resolve keysym XF86FishingChart
> Warning:          Could not resolve keysym XF86SingleRangeRadar
> Warning:          Could not resolve keysym XF86DualRangeRadar
> Warning:          Could not resolve keysym XF86RadarOverlay
> Warning:          Could not resolve keysym XF86TraditionalSonar
> Warning:          Could not resolve keysym XF86ClearvuSonar
> Warning:          Could not resolve keysym XF86SidevuSonar
> Warning:          Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
Got error or hangup (mask 5) on X connection, exiting

@aqrln
Copy link

aqrln commented May 31, 2024

I also tried hardcoding 192.168.0.1 to mimic what @teohhanhui has in their resolv.conf but that just makes curl hang for like half a minute before printing

curl: (6) Could not resolve host: google.com

@teohhanhui
Copy link
Collaborator Author

@aqrln 192.168.0.1 is the default gateway on my current network, you'd need to change it accordingly.

@teohhanhui
Copy link
Collaborator Author

Did some digging...

No IPv4 nameserver available for DHCP

The error comes from passt:
https://passt.top/passt/tree/conf.c?h=2024_05_10.7288448#n476

I don't really understand why it fails to do the 127.0.0.1 -> default gateway mapping when Tailscale is up, given that /etc/resolv.conf on host looks essentially unchanged, save for the addition of the search domain from Tailscale ("MagicDNS").

@dylanchapell
Copy link

With this hack:

@aqrln 's hack worked for me. I also use tailscale.

@sbrivio-rh
Copy link
Contributor

Which is bad, because that's my router's IP address... (And not what's sent as DNS servers via DHCP.)

That IP address points to passt's DNS server, which will in turn use the host's DNS (as configured in the host's view of /etc/resolv.conf.

Minor correction to this statement: passt doesn't actually implement a DNS server, not even a relay or a responder: it will just do NAT for UDP packets to/from port 53 if it sees a loopback address configured as resolver on the host (or if explicitly configured with --dns-forward).

@dylanchapell is somehow left only with the search domain inside the VM... Probably a passt bug? https://gist.github.com/teohhanhui/042a395010d9946ceee14768736e3780?permalink_comment_id=5073025#gistcomment-5073025

Weird, that's unexpected. Can you have a look at which command line options (say, from ps) passt is running with? Would it be possible for you to ask passt to take a packet capture (passt's --pcap / -p FILE) and have a look at DHCP/NDP/DHCPv6 exchanges?

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Jun 30, 2024

This is working with passt 0^20240624.g1ee2eca-1.fc40.aarch64 + dhcpcd 10.0.6, with Tailscale service running and network connected.

$ cat /etc/resolv.conf
# Generated by dhcpcd from eth0.dhcp
# /etc/resolv.conf.head can replace this line
search lan taild21a6.ts.net
nameserver 192.168.1.1
# /etc/resolv.conf.tail can replace this line

I think we can close this?

@sbrivio-rh
Copy link
Contributor

This is working with passt 0^20240624.g1ee2eca-1.fc40.aarch64 + dhcpcd 10.0.6, with Tailscale service running and network connected.

We recently fixed a few issues with UDP forwarding, possibly affecting DNS responses as well. that might have caused this, even though I'm not sure exactly which fix applies here.

Anyway, glad to hear it now works for you, and thanks for re-testing with the latest version!

@slp
Copy link
Collaborator

slp commented Jul 10, 2024

@teohhanhui @sbrivio-rh thank you both!

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Aug 17, 2024

@sbrivio-rh I regret to inform that the a recent passt update in Fedora 40 (probably https://bodhi.fedoraproject.org/updates/FEDORA-2024-6f376dadf7 https://bodhi.fedoraproject.org/updates/FEDORA-2024-d127500a41) has broken this again...

EDIT: Actually this seems like a different issue. I just realized I don't have tailscaled running... Yet /etc/resolv.conf inside the krun VM does not contain any entries...

@teohhanhui teohhanhui reopened this Aug 17, 2024
@teohhanhui
Copy link
Collaborator Author

dhcpcd seems to be doing its thing just like before?

dhcpcd-10.0.6 starting
eth0: waiting for carrier
eth0: carrier acquired
duid_get: cannot write duid: Permission denied
DUID 00:03:00:01:5a:94:ef:e4:0c:ee
eth0: IAID ef:e4:0c:ee
eth0: soliciting a DHCP lease
eth0: probing for an IPv4LL address
eth0: using IPv4LL address 169.254.26.58
eth0: adding route to 169.254.0.0/16
eth0: adding default route

ip addr from within krun VM:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 5a:94:ef:e4:0c:ee brd ff:ff:ff:ff:ff:ff
    inet 169.254.26.58/16 brd 169.254.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever

ip addr from host (actually, container):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: wlp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 12:b7:c8:38:29:e3 brd ff:ff:ff:ff:ff:ff permaddr ac:c9:06:23:40:a4
    inet 192.168.1.105/24 brd 192.168.1.255 scope global dynamic noprefixroute wlp1s0f0
       valid_lft 42779sec preferred_lft 42779sec
    inet6 fd13:b3ad:92bd::fcc/128 scope global dynamic noprefixroute 
       valid_lft 42779sec preferred_lft 42779sec
    inet6 2001:e68:5451:1f4::fcc/128 scope global dynamic noprefixroute 
       valid_lft 42779sec preferred_lft 42779sec
    inet6 2001:e68:5451:1f4:eb38:ae0f:964a:f4b6/64 scope global dynamic noprefixroute 
       valid_lft 176747sec preferred_lft 90347sec
    inet6 fd13:b3ad:92bd:0:69a0:bd52:18fa:2f2/64 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::854c:3eb3:7289:7d46/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

@teohhanhui
Copy link
Collaborator Author

Doesn't seem to be a DNS issue... There's no connectivity from within the krun VM.

$ curl -v https://1.1.1.1
*   Trying 1.1.1.1:443...
* connect to 1.1.1.1 port 443 from 169.254.26.58 port 43216 failed: No route to host
* Failed to connect to 1.1.1.1 port 443 after 3081 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 1.1.1.1 port 443 after 3081 ms: Couldn't connect to server

@sbrivio-rh
Copy link
Contributor

@sbrivio-rh I regret to inform that the a recent passt update in Fedora 40 (probably https://bodhi.fedoraproject.org/updates/FEDORA-2024-6f376dadf7 https://bodhi.fedoraproject.org/updates/FEDORA-2024-d127500a41) has broken this again...

Ouch, strange, that one shouldn't have any related change. Do you remember what the "good" version was?

dhcpcd seems to be doing its thing just like before?

Not really, look:

dhcpcd-10.0.6 starting
eth0: waiting for carrier
eth0: carrier acquired
duid_get: cannot write duid: Permission denied
DUID 00:03:00:01:5a:94:ef:e4:0c:ee
eth0: IAID ef:e4:0c:ee
eth0: soliciting a DHCP lease
eth0: probing for an IPv4LL address
eth0: using IPv4LL address 169.254.26.58

Here it gives up and picks an IPv4 link-local (196.254.0.0/16) as fallback, which you see here:

ip addr from within krun VM:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 5a:94:ef:e4:0c:ee brd ff:ff:ff:ff:ff:ff
    inet 169.254.26.58/16 brd 169.254.255.255 scope global noprefixroute eth0

But it should be the same as the host, 192.168.1.105/24. For some reason, it's not getting any address via DHCP.

I'll try to reproduce this (not right now, but soon). I don't need Asahi, right? I guess I can just use krun on Fedora 40 x86_64?

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Aug 17, 2024

@sbrivio-rh

Do you remember what the "good" version was?

I need to actually try a downgrade (wonder if that might break many things lol), but I'm sure https://bodhi.fedoraproject.org/updates/FEDORA-2024-2dbdafb9b3 was working fine. actually no, it's also broken...

I don't need Asahi, right? I guess I can just use krun on Fedora 40 x86_64?

Yes.

@teohhanhui
Copy link
Collaborator Author

Tried with https://bodhi.fedoraproject.org/updates/FEDORA-2024-4632bfa865 - it's also broken. So I don't think it's passt. We would have noticed if it's been broken for that long...

@sbrivio-rh
Copy link
Contributor

Well, not elegant, but I just commented out most the matching part in setup_directories() of user.rs, and it works. Connectivity is fine here, but this is Fedora rawhide (krun at HEAD and passt-0:0^20240814.g61c0b0d-1.fc42.x86_64):

[sbrivio@ ~]$ nslookup example.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
Name:	example.com
Address: 93.184.215.14
Name:	example.com
Address: 2606:2800:21f:cb07:6820:80da:af6b:8b2c

[sbrivio@ ~]$ telnet 93.184.215.14 80
Trying 93.184.215.14...
Connected to 93.184.215.14.
Escape character is '^]'.
GET /

HTTP/1.0 404 Not Found
Content-Type: text/html
Date: Sun, 18 Aug 2024 20:35:35 GMT
Server: ECAcc (nyd/D125)
Content-Length: 345
Connection: close

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head>
		<title>404 - Not Found</title>
	</head>
	<body>
		<h1>404 - Not Found</h1>
	</body>
</html>
Connection closed by foreign host.

I'll try on Fedora 40 next. If you want to help debugging this, you could pass some options to start_passt() in src/net.rs: I would start with --debug, --log-file /tmp/krun.log, and --pcap /tmp/krun.pcap.

@teohhanhui
Copy link
Collaborator Author

@sbrivio-rh
Copy link
Contributor

That looks fine, even though there are no connections shown. Any luck with the traffic capture (--pcap)?

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Aug 18, 2024

The pcap is attached in the gist, and it's only 24 bytes... So it's just... empty? 😆

Capture tap-facing (that is, guest-side or namespace-side) network packets

Seems like nothing is reaching passt then... The pcap file stays at 24 bytes no matter what I do.

@sbrivio-rh
Copy link
Contributor

The pcap is attached in the gist, and it's only 24 bytes... So it's just... empty? 😆

Oops, I didn't even notice... yes, that's just the header, no packets.

Capture tap-facing (that is, guest-side or namespace-side) network packets

Seems like nothing is reaching passt then... The pcap file stays at 24 bytes no matter what I do.

Right. I wonder if it's something with SELinux. Could you try running this in permissive mode? I haven't checked with Fedora 40 yet but I think I'll do it tomorrow.

@teohhanhui
Copy link
Collaborator Author

Could you try running this in permissive mode?

Just tried. No change.

@waterdragon78
Copy link

I wish I could help but I did want to pipe in and say I'm having this issue as well. I have no idea how all this works but I did try to compile using the first version of crates/krun-guest/src/net.rs that exists in this repo by going back to the first commit, using that file and applying the patch, and then compiling. No dice. I'm aware that this probably isn't too helpful because I have no clue what I'm doing but I did just want to show what I have done. Thought process was maybe it would work with dhclient.

With this hack:

diff --git a/crates/krun-guest/src/net.rs b/crates/krun-guest/src/net.rs
index 3710bd5..27e4d37 100644
--- a/crates/krun-guest/src/net.rs
+++ b/crates/krun-guest/src/net.rs
@@ -1,4 +1,4 @@
-use std::{fs, os::unix::process::ExitStatusExt, process::Command};
+use std::{fs, io::Write, os::unix::process::ExitStatusExt, process::Command};

 use anyhow::{anyhow, Context, Result};
 use log::debug;
@@ -35,5 +35,10 @@ pub fn configure_network() -> Result<()> {
         Err(err)?;
     }

+    fs::File::options()
+        .append(true)
+        .open("/etc/resolv.conf")?
+        .write_all("nameserver 1.1.1.1\n".as_bytes())?;
+
     Ok(())
 }

DNS works inside the VM:

$ krun curl https://google.com
No IPv4 nameserver available for DHCP
No IPv6 nameserver available for NDP/DHCPv6
_XSERVTransmkdir: Owner of /tmp/.X11-unix should be set to root
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Could not resolve keysym XF86CameraAccessEnable
> Warning:          Could not resolve keysym XF86CameraAccessDisable
> Warning:          Could not resolve keysym XF86CameraAccessToggle
> Warning:          Could not resolve keysym XF86NextElement
> Warning:          Could not resolve keysym XF86PreviousElement
> Warning:          Could not resolve keysym XF86AutopilotEngageToggle
> Warning:          Could not resolve keysym XF86MarkWaypoint
> Warning:          Could not resolve keysym XF86Sos
> Warning:          Could not resolve keysym XF86NavChart
> Warning:          Could not resolve keysym XF86FishingChart
> Warning:          Could not resolve keysym XF86SingleRangeRadar
> Warning:          Could not resolve keysym XF86DualRangeRadar
> Warning:          Could not resolve keysym XF86RadarOverlay
> Warning:          Could not resolve keysym XF86TraditionalSonar
> Warning:          Could not resolve keysym XF86ClearvuSonar
> Warning:          Could not resolve keysym XF86SidevuSonar
> Warning:          Could not resolve keysym XF86NavInfo
Errors from xkbcomp are not fatal to the X server
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
Got error or hangup (mask 5) on X connection, exiting

@sbrivio-rh
Copy link
Contributor

Hah, I reproduced it on Fedora 40, with passt 0^20240814.g61c0b0d-1.fc40.x86_64, which is the same version as I was using on Fedora rawhide, so I doubt it has anything to do with passt. Anyway, I can finally debug it now.

@sbrivio-rh
Copy link
Contributor

It works, also on Fedora 40, if I start passt separately:

$ passt -f
UNIX domain socket bound at /tmp/passt_1.socket
...

and then make krun connect to it, say:

$ ./krun --passt-socket=/tmp/passt_1.socket nslookup passt.top
Server:		88.198.0.161
Address:	88.198.0.161#53

Non-authoritative answer:
Name:	passt.top
Address: 88.198.0.164
Name:	passt.top
Address: 2a01:4f8:222:904::2

@sbrivio-rh
Copy link
Contributor

@teohhanhui at this point, I would like to trace reads and writes to child_fd in krun, and compare them between Fedora 40 and Fedora rawhide. Is there a convenient way to do this? Any tips?

@teohhanhui
Copy link
Collaborator Author

You can use something like:

RUST_LOG='krun=debug,krun_guest=debug,krun_server=debug' RUST_BACKTRACE=1 krun -e=RUST_BACKTRACE bash

(change debug to trace as appropriate)

And add the trace! calls...

@sbrivio-rh
Copy link
Contributor

Without adding any additional print, I happened to reproduce this, just once, also on Fedora rawhide, with this output:

$ RUST_LOG='krun=trace,krun_guest=trace,krun_server=trace' RUST_BACKTRACE=1 ./krun -e=RUST_BACKTRACE bash
[...]
[2024-08-20T15:23:02Z DEBUG krun::net] passing fd to passt fd=6
[2024-08-20T15:23:02Z DEBUG krun::env] env vars env={"PATH": "/home/sbrivio/.local/bin:/home/sbrivio/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin", "RUST_BACKTRACE": "1", "RUST_LOG": "krun=trace,krun_guest=trace,krun_server=trace"}
No IPv6 nameserver available for NDP/DHCPv6
[2024-08-20T15:23:15Z DEBUG krun_guest::net] dhcpcd output output=Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "dhcpcd-10.0.8 starting\nDropped protocol specifier '.link' from 'eth0.link'. Using 'eth0' (ifindex=2).\nsd_bus_open_system: No such file or directory\neth0: waiting for carrier\neth0: carrier acquired\nduid_get: cannot write duid: Permission denied\nDUID 00:03:00:01:5a:94:ef:e4:0c:ee\neth0: IAID ef:e4:0c:ee\neth0: soliciting a DHCP lease\neth0: probing for an IPv4LL address\neth0: using IPv4LL address 169.254.26.58\neth0: adding route to 169.254.0.0/16\neth0: adding default route\nDropped protocol specifier '.ipv4ll' from 'eth0.ipv4ll'. Using 'eth0' (ifindex=2).\nsd_bus_open_system: No such file or directory\n" }
[2024-08-20T15:23:15Z DEBUG krun_guest] exec command="/home/sbrivio/krun/target/debug/krun-server" command_args=["bash"]

On Fedora 40, instead, it's like this:

$ RUST_LOG='krun=trace,krun_guest=trace,krun_server=trace' RUST_BACKTRACE=1 ./krun -e=RUST_BACKTRACE bash
[...]
[2024-08-20T15:20:39Z DEBUG krun::net] passing fd to passt fd=6
No IPv6 nameserver available for NDP/DHCPv6
[2024-08-20T15:20:40Z DEBUG krun::env] env vars env={"RUST_BACKTRACE": "1", "RUST_LOG": "krun=trace,krun_guest=trace,krun_server=trace", "PATH": "/home/sbrivio/.local/bin:/home/sbrivio/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"}
[2024-08-20T15:21:43Z DEBUG krun_guest::net] dhclient output output=Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\ngrep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\ngrep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\ngrep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\ngrep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\ngrep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory\n" }
[2024-08-20T15:21:43Z DEBUG krun_guest] exec command="/home/sbrivio/krun/target/debug/krun-server" command_args=["bash"]

I guess Fedora rawhide comes with dhcpcd by default, while Fedora 40 doesn't.

By the way, krun leaks the passt process. It should be started with --one-off / -1, and I'm not sure if there's a valid reason to use -F / --fd instead of letting passt create its own UNIX domain socket.

Anyway, I looked a bit into the code, but I can't quite understand where reads and writes onto this file descriptor are. I guess they're implemented in libkrun instead... but I would have no idea how to get output from libkrun onto a standard stream, or even just how to change libkrun and then make krun use the new version with changes.

So I'm afraid I can't very productively help with this. Let me know if there's anything I can debug on passt side, though.

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Aug 20, 2024

I guess they're implemented in libkrun instead...

how to change libkrun and then make krun use the new version with changes

Probably here: https://github.com/containers/libkrun/blob/main/src/devices/src/virtio/net/passt.rs

When you build krun, it should automatically pick up the system's libkrun (dynamically linked):

but I would have no idea how to get output from libkrun onto a standard stream

Something like:

RUST_LOG='krun=debug,krun_guest=debug,krun_server=debug,devices::virtio::net=debug' RUST_BACKTRACE=1 krun -e=RUST_BACKTRACE bash

@sbrivio-rh
Copy link
Contributor

Thanks for the pointers. Despite being thick I'm quite confident I could eventually grasp how it all works one day, given enough time... which I don't have, though. ;)

Just one thing occurred to me with your latest command line:

[2024-08-20T16:15:51Z DEBUG krun::net] passing fd to passt fd=6
[2024-08-20T16:15:51Z DEBUG krun::env] env vars env={"RUST_BACKTRACE": "1", "PATH": "/home/sbrivio/.local/bin:/home/sbrivio/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin", "RUST_LOG": "krun=debug,krun_guest=debug,krun_server=debug,devices::virtio::net=trace"}
No IPv6 nameserver available for NDP/DHCPv6
[2024-08-20T16:15:52Z DEBUG devices::virtio::net::passt] passt socket (fd 4) buffer sizes: SndBuf=Ok(425984) RcvBuf=Ok(212992)

Why is the file descriptor 6, then 4?

@teohhanhui
Copy link
Collaborator Author

Why is the file descriptor 6, then 4?

From your log, 6 is the fd we pass to passt (child_socket / child_fd):

https://github.com/slp/krun/blob/d6e7e17e5bbccdfd1021cd552b92a5c1774e044e/crates/krun/src/net.rs#L35

while 4 is the fd we pass to libkrun (parent_socket / passt_fd):

https://github.com/slp/krun/blob/d6e7e17e5bbccdfd1021cd552b92a5c1774e044e/crates/krun/src/bin/krun.rs#L148

@sbrivio-rh
Copy link
Contributor

...and they are two endpoints of a connected socket pair, I guess?

@teohhanhui
Copy link
Collaborator Author

Yes.

@sbrivio-rh
Copy link
Contributor

I was confused by the fact that usually Linux gives back contiguous numbers (even though sure, it might be that 4 was just closed and 5 is still in use)... but now I see 6 is duplicated from 5:

$ RUST_LOG='krun=trace,krun_guest=trace,krun_server=trace' RUST_BACKTRACE=1 strace -e socketpair,dup,close ./krun -e=RUST_BACKTRACE bash
[...]
socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [4, 5]) = 0
dup(5)                                  = 6
close(5)                                = 0
[2024-08-20T16:41:51Z DEBUG krun::net] passing fd to passt fd=6

@sbrivio-rh
Copy link
Contributor

Okay, this is weird:

$ RUST_LOG='krun=trace,krun_guest=trace,krun_server=trace' RUST_BACKTRACE=1 strace -f -e trace-fds=4,6 ./krun -e=RUST_BACKTRACE bash
[...]

[pid  9573] setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  9573] bind(6, {sa_family=AF_INET6, sin6_port=htons(3334), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_scope_id=0}, 28) = 0
[pid  9573] listen(6, 128)              = 0
[pid  9573] epoll_ctl(3, EPOLL_CTL_ADD, 6, {events=EPOLLIN, data={u32=1539, u64=295794397677059}}) = 0

[...]

[pid  9573] epoll_ctl(3, EPOLL_CTL_ADD, 6, {events=EPOLLIN|EPOLLRDHUP, data={u32=1547, u64=1547}}) = -1 EEXIST (File exists)

PID 9573 is passt. 6 is the file descriptor returned by open(), used to forward port 3334... but it's also the file descriptor passed via --fd, so passt fails to add it to the set of file descriptors to monitor. Are we closing 6 for some reason...? Still checking.

@sbrivio-rh
Copy link
Contributor

sbrivio-rh commented Aug 20, 2024

I think krun fails to actually drop SOCK_CLOEXEC before running passt:

$ RUST_LOG='krun=trace,krun_guest=trace,krun_server=trace' RUST_BACKTRACE=1 strace -f -e socket,bind,open,close,execve,epoll_ctl ./krun -e=RUST_BACKTRACE bash
[...]
[2024-08-20T17:16:51Z DEBUG krun::net] passing fd to passt fd=6
[...]
[pid  9633] execve("/usr/local/bin/passt.avx2", ["passt", "-f", "-t", "3334:3334", "--trace", "-l", "/tmp/krun.log", "--fd", "6"], 0x7ffc62bb3098 /* 33 vars */) = 0
[pid  9633] close(3)                    = 0
[pid  9633] close(3)                    = 0
[pid  9633] close(3)                    = 0
[pid  9633] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid  9633] close(4)                    = 0
[pid  9633] socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 5
[pid  9633] bind(5, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 0
[pid  9633] socket(AF_INET6, SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 6

For passt, 6 is now a TCP socket, because the old "6" was closed on exec. There's no explicit close between socket() and execve().

@sbrivio-rh
Copy link
Contributor

Ouch, no:

[pid  9651] close_range(3, 4294967295, CLOSE_RANGE_UNSHARE) = 0

So I introduced some issue in https://passt.top/passt/commit/?id=09603cab28f9883baf1d7b48bdc102d6641dc300, I guess.

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Aug 20, 2024

And I've just realized why my previous downgrade of passt didn't do anything. Because I was downgrading the packages on host, while I'm actually running krun from within a container (lmao)

@sbrivio-rh
Copy link
Contributor

On the other hand

$ strace -e close_range passt -f --fd 6
close_range(3, 5, CLOSE_RANGE_UNSHARE)  = 0
close_range(7, 4294967295, CLOSE_RANGE_UNSHARE) = 0

...so we're closing 6 only if started by krun. Still checking.

@sbrivio-rh
Copy link
Contributor

sbrivio-rh commented Aug 20, 2024

Fix (in passt):

diff --git a/util.c b/util.c
index 0b41404..3fce3c2 100644
--- a/util.c
+++ b/util.c
@@ -710,7 +710,7 @@ void close_open_files(int argc, char **argv)
 	int name, rc;
 
 	do {
-		name = getopt_long(argc, argv, "+:F", optfd, NULL);
+		name = getopt_long(argc, argv, "-:F:", optfd, NULL);
 
 		if (name == 'F') {
 			errno = 0;

I'll submit it for review in a bit.

The problem is that if there are non-option arguments (including values for other options) before --fd n, we ignore --fd it in close_open_files() (because I got getopt_long() flags wrong, so we would stop at the first unhandled option), and close all the file descriptors after the standard streams (starting from 3).

@sbrivio-rh
Copy link
Contributor

By the way, krun leaks the passt process. It should be started with --one-off / -1

I forgot: --fd implies --one-off anyway. The reason why we were leaking the process is that passt itself wouldn't quit as it lost that file descriptor (by accidentally closing it), so there's nothing to fix on krun side with this regard.

@sbrivio-rh
Copy link
Contributor

The fix is included in passt version 2024_08_21.1d6142f, and its matching Fedora 40 update.

hswong3i referenced this issue in alvistack/passt-top-passt Aug 22, 2024
…pen_files()

Seen with krun: we get a file descriptor via --fd, but we close it and
happily use the same number for TCP files.

The issue is that if we also get other options before --fd, with
arguments, getopt_long() stops parsing them because it sees them as
non-option values.

Use the - modifier at the beginning of optstring (before :, which is
needed to avoid printing errors) instead of +, which means we'll
continue parsing after finding unrelated option values, but
getopt_long() won't reorder them anyway: they'll be passed with option
value '1', which we can ignore.

By the way, we also need to add : after F in the optstring, so that
we're able to parse the option when given as short name as well.

Now that we change the parsing mode between close_open_files() and
conf(), we need to reset optind to 0, not to 1, whenever we call
getopt_long() again in conf(), so that the internal initialisation
of getopt_long() evaluating GNU extensions is re-triggered.

Link: https://github.com/slp/krun/issues/17#issuecomment-2294943828
Fixes: baccfb9 ("conf: Stop parsing options at first non-option argument")
Fixes: 09603ca ("passt, util: Close any open file that the parent might have leaked")
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants