Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious name conflicts #117

Open
callegar opened this issue Apr 26, 2017 · 224 comments
Open

Spurious name conflicts #117

callegar opened this issue Apr 26, 2017 · 224 comments
Labels
bug important High priority
Milestone

Comments

@callegar
Copy link

Hi, hope this is the right place for reporting issues with avahi-daemon as the readme on my system points to the bugtracker on freedesktop.org that does not list avahi as a bug report target.

I am experiencing spurious name conflicts on various systems, all of which have a common trait in having two interfaces, one on the local lan, having a static IP address and the other getting a dhcp address from somewhere (typically an ADSL router).

What happens is the following. Suppose that the host is called "foo". Initially, it is correctly advertised as foo.local. After some time the name conflict occurs and the host starts being advertised as foo-2.local, foo-3.local, etc., even if it is certainly the sole host named foo on the network. In practice there is a spurious name conflict with the host itself, probably due to some race in avahi. The unfortunate result is that no other system cannot find "foo" no more on the network, since they look for foo.local.

I see the issue on a couple of debian jessie systems (avahi version 0.6.31); on a raspbian jessie system (same); and on an openwrt chaos calmer system (avahi version 0.6.31 again).

I see a lot of reports for this same issue (or possibly something similar) on many distro bugtrackers, applications bugtrackers and question sites:

I wonder if there is something misconfigured on my systems (and in this case some hit at diagnosing would be appreciated) or if this is an issue (possibly a race) with the avahi daemon.

Even if this cannot be fixed rapidly, I'd like to suggest an interim point release of avahi with an option to disable the name conflict analysis when he/she is absolutely sure that it won't be needed on his/her network.

@lathiat
Copy link
Contributor

lathiat commented Apr 27, 2017

I agree that I have seen this from time to time, unfortunately I am not currently sure what causes it. I think in some cases it might be related to the reflector, but if that is not in use I am not sure.

How often is this happening? I wonder if we can setup a long term pcap capture to try and figure out what happens.

@callegar
Copy link
Author

Rather frequently, I see it almost every odd day.

It seems to be associated to a lease expire on the interface getting the address from dhcp and probably has to do with the fact that there is both an IPv4 and an IPv6 address configured for the interface...

Apr 27 07:30:57 xyz dhcpcd[365]: eth0: soliciting a DHCPv6 lease
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: fe80::6a7f:74ff:fe15:6a2e router available
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: ADV fd57:81fe:80da::218/128 from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: REPLY6 received from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding address fd57:81fe:80da::218/128
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: renew in 21600 seconds, rebind in 34560 seconds
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: using IPv4LL address 169.254.139.60
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding route to 169.254.0.0/16
Apr 27 07:30:57 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv4 with address 169.254.139.60.
Apr 27 07:30:57 xyz avahi-daemon[24276]: New relevant interface eth0.IPv4 for mDNS.
Apr 27 07:30:57 xyz avahi-daemon[24276]: Registering new address record for 169.254.139.60 on eth0.IPv4.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::600c:b99e:4f17:ce61.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv6 with address fd57:81fe:80da:0:c99b:6cc1:2a7c:c139.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Registering new address record for fd57:81fe:80da:0:c99b:6cc1:2a7c:c139 on eth0.*.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::600c:b99e:4f17:ce61 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::a92:e068:3cb:7ae2 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.208.59 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 192.168.32.1 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.139.60 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for lo.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Host name conflict, retrying with xyz-2

@callegar
Copy link
Author

I don't think that the reflector should be on anywhere, as it should be disabled by default, shouldn't it?

@callegar
Copy link
Author

callegar commented May 1, 2017

Preventing avahi-daemon from using the interface where the address is received from a dhcp server makes the issue disappear, but obviously it is not a solution.

@midicase
Copy link

midicase commented May 1, 2017

Avahi can't handle inter-connected multi-homed systems. We have to use the option to disable one of the interfaces to avoid the daemon from seeing mutliple name registration requests (one from each network).

Best I can tell there isn't a better solution since this is really an issue with the design of the protocol.

@callegar
Copy link
Author

callegar commented May 1, 2017

Still I wonder...

  1. why do I not just see a -1, but also a -2 and every now and then a -3 too?
  2. wouldn't it be possible to have the two interfaces both managed by avahi-daemon with a reproducible name assignment? Like getting from the very start hostname.local for the IP on the one of the two nics and hostname-2.local for the name on the other nic, rather than having things in one way at boot and then getting the -x suffix when the dhcp lease is renewed?
    I am asking because the issue is not the -2, but not knowing in advance how an host will be reachable.

@lathiat
Copy link
Contributor

lathiat commented May 18, 2017

I totally had this happen on one of my own systems, with a very similar looking log to you. Downed and uped a bunch of interfaces rapidly. There must definitely be a bug there I'll have to try and figure out if I can make it reproducible.

Some kind of race to do with the new interfaces appearing while probing perhaps.. there is a related issue for services that get stuck registering. So maybe the logic for interfaces coming and going needs to be reviewed.

@lathiat
Copy link
Contributor

lathiat commented May 20, 2017

OK I think I figured it out. What's happening is an address is withdrawn before it finishes probing, but we receive a copy of our own probe immediately after and thus assume a conflict (our own multicast probes are mirrored back to us by the kernel). A bit of a race condition.

This happens a lot with IPv6 where we withdraw the fe80 link-local address once we receive a global address and can happen very rapidly on boot. Of note you are using IPv6 on your site, as well as mine where I am seeing this. On IPv4 address withdrawls while probing are quite uncommon.

So we'll need to identify those in some way, either with a ghost list or otherwise determining that the probe looped back. I'll look at that.

@lathiat lathiat added the bug label May 20, 2017
@lathiat lathiat added this to the v0.6.33 milestone May 20, 2017
@lathiat
Copy link
Contributor

lathiat commented Jun 21, 2017

Confirmed the issue as I suspected, we withdraw our address record but only then receive a copy of our own probe and decide it is a conflict:

Jun 20 15:40:58 hyper avahi-daemon[6567]: Joining mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Registering new address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.*.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Leaving mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Withdrawing address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Received conflicting probe [hyper.local#011IN#011AAAA fe80::9cf5:4ff:fef6:ec81 ; ttl=120]. Local host lost. Withdrawing.

This happens because we revoke the link-local address from being advertised once we receive a global address.

Hope to have a fix for this shortly

@Fonta
Copy link

Fonta commented Aug 4, 2017

Would a workaround be to disable ipv6 in the config when you're not using it?

@dmosberger
Copy link

Any updates on this?

@strayer
Copy link

strayer commented Jun 22, 2018

Hey @lathiat, sorry for bugging you.

I think this just happened to me as well. I have a usual IPv4/6 dual stack network at home and run avahi-daemon in a Docker container with network_mode host.

This is the log:

daemon_1  | 2018-06-21T11:54:39.577225354Z Found user 'avahi' (UID 102) and group 'avahi' (GID 102).
daemon_1  | 2018-06-21T11:54:39.577682651Z Successfully dropped root privileges.
daemon_1  | 2018-06-21T11:54:39.578293096Z avahi-daemon 0.6.32 starting up.
daemon_1  | 2018-06-21T11:54:39.579337101Z WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
daemon_1  | 2018-06-21T11:54:39.579720148Z Successfully called chroot().
daemon_1  | 2018-06-21T11:54:39.580163570Z Successfully dropped remaining capabilities.
daemon_1  | 2018-06-21T11:54:39.580505155Z Loading service file /services/smbd.service.
daemon_1  | 2018-06-21T11:54:39.583626444Z Joining mDNS multicast group on interface enp5s0.IPv6 with address 2003:e5:d70e:bc00:265e:beff:fe06:ed43.
daemon_1  | 2018-06-21T11:54:39.583822517Z New relevant interface enp5s0.IPv6 for mDNS.
daemon_1  | 2018-06-21T11:54:39.583868742Z Joining mDNS multicast group on interface enp5s0.IPv4 with address 192.168.178.58.
daemon_1  | 2018-06-21T11:54:39.583883829Z New relevant interface enp5s0.IPv4 for mDNS.
daemon_1  | 2018-06-21T11:54:39.584342326Z Network interface enumeration completed.
daemon_1  | 2018-06-21T11:54:39.585046921Z Registering new address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-21T11:54:39.585065258Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-21T11:54:40.492960211Z Server startup complete. Host name is nibelungenhort.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-21T11:54:41.400508244Z Service "nibelungenhort" (/services/smbd.service) successfully established.
daemon_1  | 2018-06-22T02:48:49.975073115Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:49.984858578Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:49.984962940Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983388121Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:50.983547607Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983631244Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269257637Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269418273Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269500173Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269576160Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.686728887Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.690760136Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690858948Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690937072Z Withdrawing address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.691012722Z Withdrawing address record for 192.168.178.58 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.696120164Z Host name conflict, retrying with nibelungenhort-2
daemon_1  | 2018-06-22T02:48:52.697784540Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.697876765Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-22T02:48:54.587279952Z Server startup complete. Host name is nibelungenhort-2.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-22T02:48:55.489180837Z Service "nibelungenhort-2" (/services/smbd.service) successfully established.

@midicase
Copy link

two interfaces
dual stack

Make sure you are not hitting a shortcoming in the protocol involving the daemon seeing other daemons through multiple paths. System announces itself through one interface and gets rejected after announcing itself on subsequent interfaces due to being a duplicate.

I always use allow-interfaces/deny-interfaces to force avahi to use only a single interface (in my industry this is typically the management interface). After that I have not had this issue.

@strayer
Copy link

strayer commented Jun 22, 2018

There are two interfaces on my system, although only one is actually up and connected to the network. As far as I can see avahi-daemon only works on the connected interface (enp5s0), but I'll try manually allowing it.

@ondrej1024
Copy link

is anyone working on a fix for this? @lathiat said

Hope to have a fix for this shortly

exactly a year ago? Any progess? Thanks

@lathiat
Copy link
Contributor

lathiat commented Jun 29, 2018

allow-interfaces will work around the issue as it's a bug in handling interfaces rapidly adding and removing addresses (particularly noticeable if you have globally routable IPv6 addresses, as we add then remove the link local address)

Still planning a fix

@strayer
Copy link

strayer commented Jul 13, 2018

So, this is still happening to me. I've set allow-interfaces=enp5s0 in avahi-daemon.conf as suggested here, but that didn't help. Still the same log messages as posted above.

@knro
Copy link

knro commented Aug 17, 2018

I tried the allow-interfaces method as well but it's not working. Is there another work-around for this? How about a fix?

@lathiat Any ETA? Many distros are reporting the same bug.

@strayer
Copy link

strayer commented Aug 18, 2018

The only workaround for me is a daily restart of avahi-daemon. I'll soon replace this by an automatic restart if the daemon logs the error message, but for now this works for me. Not ideal, but eh… it's just for my homelab and nothing critical.

@gramels
Copy link

gramels commented Nov 27, 2018

seems like a working work around is

cache-entries-max=0

@pemensik
Copy link
Member

pemensik commented Feb 8, 2024

Could you please test my attempt to log conflicts better in PR #554 ? That should help finding what exactly triggers the error. Is some reflection service responsible for conflict?

I am sorry, but I am quite confident conflict resolution is needed and is the right thing to do. What is not right thing to do is conflicting with our own announcements. But we need to understand what is the primary cause of the issue before correcting it properly. That is not yet clear, at least to me.

@marc-h38
Copy link

marc-h38 commented Feb 10, 2024

the root cause is a bug in the default DNS thing that's been here for at least 6-7 years.

This has been a "Heisenbug" of the highest order, that's the simple reason why. There have also been very misguided attempts to make reproduction LESS frequent, which is the exact opposite of how you should deal with Heisenbugs.

Just never show this Github issue to aliens, humanity would instantly fail any test if this is seen.

What I find much more amazing than those 6-7 years is this:

... my attempt to log conflicts better in PR #554

After 6-7 years, why is it still required to patch and rebuild from source to just GET BETTER LOGS? This should really be as easy as editing a verbosity level in a configuration file. Is this software still maintained by anyone?

Sure, verbosity levels often affect the reproduction rate of Heisenbugs. But that's not a reason to at least TRY. Also, the influence (if any) of the verbosity level on the reproduction rate is better data than no data at all.

This could have easily been worked around with a simple ignore-collisions=1 while a real fix is in the works.

+1

@gpshead
Copy link

gpshead commented Feb 11, 2024

I've been experiencing this race condition on most of my single interface Ubuntu and Debian machines and VMs on my home network for so many years now I cannot even remember when it started.

I recently had it happen on a host immediately after booting. Very frustrating to wake-on-lan a host, see it light up, yet find it doesn't appear to be on the network by name. Typically I see it happen after a host has been up for days during some routinely triggered network interface event that causes Avahi to Withdraw all interface records followed by immediately re-Registering them. Exactly the thing that'll eventually trigger the race condition and wind up falsely thinking its own advertisement is a conflict. I don't know what causes these frequent withdraw and re-registrations (v4 and v6 addresses and network interface names remain the same) - the default logs emitted by system daemons that might do it are insufficient to tell.

It is a really rotten experience and entirely sours the supposed utility of mDNS for small "just a collection of hosts" LANs.

@Walter-o
Copy link

Im coming back here because i found the root cause of my problem.

In my case it was a docker container running an mDNS inside it that conflicted with the avahi mDNS on my host.
(I even ran multiple instances of the container)

This caused the conflicts, however i still think there should be a toggle in avahi to ignore collisions.

I hope this helps some people as i saw a few other docker users here.

Also a good warning is to never use docker's host networking mode or any similar ones where you don't explicitly tell the ports.

@kelvie
Copy link

kelvie commented May 26, 2024

So I ran into this for a few months, and could have sworn something was amiss, in case anyone else runs into this.

Just look for things listening on port 5353, and in my case it was KDE connect (along with having docker interfaces) caused this for me almost every time after a user was logged in.

To check, use:

sudo lsof -i -P -n | grep 5353

And it should just be avahi.

@happyme531
Copy link

Conflict check can be disabled by adding a return 1 at the beginning of static int handle_conflict(AvahiServer *s, AvahiInterface *i, AvahiRecord *record, int unique) inside avahi-core/server.c.
Then recompile, install, and the issue is solved.
Tested for 2 days and no name change is ever happening!
You can also download libavahi-core.so.7.1.0 compiled by myself, replace the same so library inside your system(don't forget to make a backup first), then restart the avahi-daemon service.
For ARM64 and x64 arch: libavahi-core_noconflict.zip

@pemensik
Copy link
Member

pemensik commented Sep 3, 2024

Conflict check can be disabled by adding a return 1 at the beginning of static int handle_conflict(AvahiServer *s, AvahiInterface *i, AvahiRecord *record, int unique) inside avahi-core/server.c. Then recompile, install, and the issue is solved. Tested for 2 days and no name change is ever happening! You can also download libavahi-core.so.7.1.0 compiled by myself, replace the same so library inside your system(don't forget to make a backup first), then restart the avahi-daemon service. For ARM64 and x64 arch: libavahi-core_noconflict.zip

Disabling conflict checks completely makes Avahi not compliant with mDNS RFC. Please do not advice that to anyone. It must conflict, if there is a different device using the same name and one of them must choose a different name. This problem happens because Avahi fails to identify it conflicts with itself only and there is no other device with the same name on any network it is connected to.

@happyme531
Copy link

Conflict check can be disabled by adding a return 1 at the beginning of static int handle_conflict(AvahiServer *s, AvahiInterface *i, AvahiRecord *record, int unique) inside avahi-core/server.c. Then recompile, install, and the issue is solved. Tested for 2 days and no name change is ever happening! You can also download libavahi-core.so.7.1.0 compiled by myself, replace the same so library inside your system(don't forget to make a backup first), then restart the avahi-daemon service. For ARM64 and x64 arch: libavahi-core_noconflict.zip

Disabling conflict checks completely makes Avahi not compliant with mDNS RFC. Please do not advice that to anyone. It must conflict, if there is a different device using the same name and one of them must choose a different name. This problem happens because Avahi fails to identify it conflicts with itself only and there is no other device with the same name on any network it is connected to.

Anyway, In my use case I know it won't be any different device using the same name and this fixed my issue perfectly. It just works!

@pemensik
Copy link
Member

pemensik commented Sep 3, 2024

Do we have any proof this can happen without other device doing mdns reflection? That is there must be other device sending my machine announced record from different interface (and source address) than I am sending it? Probably with different link address too, since we should recognize our own link address announcing connected to the same network by bridge/switch.

The best candidate for solvable description seems #117 (comment). On the other hand, #117 (comment) suggests the issue is in forgetting our addresses too fast. When some device creates duplication sent a bit later

Solution for that might be storing removed local addresses marked with special flag at least the same time we wait for non-responding (non-existent) names. If we receive its query short after removed, just consider it still our own and ignore the conflict with it.

@marc-h38
Copy link

marc-h38 commented Sep 3, 2024

Disabling conflict checks completely makes Avahi not compliant with mDNS RFC. Please do not advice that to anyone

This was only sharing a well-tested, one-line change, there was no Pull Request submitted, in fact not even a plain diff was shared. This was a great data point, please keep stuff like these coming; it's not like Linux distributions are going to excavate random experiments buried very deep down this bug and ship them tomorrow.

Also, this bug has been opened for 7 years now, so at this stage I think everyone should be free to share pretty much whatever they like!

Do we have any proof this can happen without other device doing mdns reflection?

Interesting question, @happyme531 can you answer?

@pemensik
Copy link
Member

pemensik commented Sep 3, 2024

Main problem we have with this is unclear reproduction steps. Yes, we have seen that happen sometime. But for creating and testing fix, we need to identify exact cases how that happen and rule out wrong network configurations. Ideally logs combined with mdns traffic recording on all used interfaces. Especially someone who sees this issue often might help with that.

@marc-h38
Copy link

marc-h38 commented Sep 4, 2024

You can also download libavahi-core.so.7.1.0 compiled by myself, ...

Never, ever run binaries provided by random people on the Internet. No offence @happyme531 but no one has any idea who you are. Nor from me, from @pemensik or from anyone else.

(sorry I missed this earlier)

@mvduin
Copy link

mvduin commented Sep 4, 2024

Do we have any proof this can happen without other device doing mdns reflection?

I have no trouble reproducing this on an isolated VLAN containing three devices:

  • the test device 8c-131 (debian buster, avahi-daemon 0.7-4+b1)
  • the router (debian bullseye, avahi-daemon 0.8-5+deb11u2)
  • a development server (debian bullseye, avahi-daemon 0.8-5+deb11u2)

None of the systems have mDNS reflection enabled. avahi-daemon.conf of the test device (with comments removed):

[server]
use-ipv4=yes
use-ipv6=yes
disallow-other-stacks=yes
ratelimit-interval-usec=1000000
ratelimit-burst=1000

[wide-area]
enable-wide-area=yes

[publish]
publish-hinfo=no
publish-workstation=no
publish-resolv-conf-dns-servers=no
publish-aaaa-on-ipv4=yes
publish-a-on-ipv6=yes

I used SNMP to virtually unplug and replug the test device from the ethernet switch (disable port, wait 2 seconds, enable port, wait 10 seconds, repeat in an infinite loop).

Within a few minutes the following happened (IPv6 addresses partially censored):

Sep 04 16:58:08.410639 8c-131 kernel: cpsw 4a100000.ethernet eth0: Link is Down
Sep 04 16:58:08.410421 8c-131 systemd-networkd[114]: eth0: Lost carrier
Sep 04 16:58:08.410472 8c-131 systemd-networkd[114]: eth0: DHCP lease lost
Sep 04 16:58:08.455580 8c-131 avahi-daemon[226]: Withdrawing address record for 10.0.101.180 on eth0.
Sep 04 16:58:08.455793 8c-131 avahi-daemon[226]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
Sep 04 16:58:08.564574 8c-131 avahi-daemon[226]: Joining mDNS multicast group on interface eth0.IPv4 with address 169.254.182.72.
Sep 04 16:58:08.619584 8c-131 avahi-daemon[226]: Registering new address record for 169.254.182.72 on eth0.*.
Sep 04 16:58:08.622389 8c-131 avahi-daemon[226]: Withdrawing address record for 169.254.182.72 on eth0.
Sep 04 16:58:08.623453 8c-131 avahi-daemon[226]: Leaving mDNS multicast group on interface eth0.IPv4 with address 169.254.182.72.
Sep 04 16:58:08.664469 8c-131 avahi-daemon[226]: Interface eth0.IPv4 no longer relevant for mDNS.
Sep 04 16:58:08.684805 8c-131 avahi-daemon[226]: Withdrawing address record for 2001:470:XXXX:XXX:XXXX:16ff:fee2:XXXX on eth0.
Sep 04 16:58:08.687161 8c-131 avahi-daemon[226]: Leaving mDNS multicast group on interface eth0.IPv6 with address 2001:470:XXXX:XXX:XXXX:16ff:fee2:XXXX.
Sep 04 16:58:08.693577 8c-131 avahi-daemon[226]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::XXXX:16ff:fee2:XXXX.
Sep 04 16:58:08.713849 8c-131 avahi-daemon[226]: Registering new address record for fe80::XXXX:16ff:fee2:XXXX on eth0.*.
Sep 04 16:58:11.531776 8c-131 kernel: cpsw 4a100000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
Sep 04 16:58:11.530898 8c-131 systemd-networkd[114]: eth0: Gained carrier
Sep 04 16:58:15.623498 8c-131 systemd-networkd[114]: eth0: DHCPv4 address 10.0.101.180/24 via 10.0.101.1
Sep 04 16:58:15.641435 8c-131 avahi-daemon[226]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
Sep 04 16:58:15.703677 8c-131 avahi-daemon[226]: New relevant interface eth0.IPv4 for mDNS.
Sep 04 16:58:15.704202 8c-131 avahi-daemon[226]: Registering new address record for 10.0.101.180 on eth0.*.
Sep 04 16:58:16.046766 8c-131 avahi-daemon[226]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::XXXX:16ff:fee2:XXXX.
Sep 04 16:58:16.094519 8c-131 avahi-daemon[226]: Joining mDNS multicast group on interface eth0.IPv6 with address 2001:470:XXXX:XXX:XXXX:16ff:fee2:XXXX.
Sep 04 16:58:16.134501 8c-131 avahi-daemon[226]: Registering new address record for 2001:470:XXXX:XXX:XXXX:16ff:fee2:XXXX on eth0.*.
Sep 04 16:58:16.135857 8c-131 avahi-daemon[226]: Withdrawing address record for fe80::XXXX:16ff:fee2:XXXX on eth0.
Sep 04 16:58:16.138846 8c-131 avahi-daemon[226]: Withdrawing address record for 10.0.101.180 on eth0.
Sep 04 16:58:16.149680 8c-131 avahi-daemon[226]: Host name conflict, retrying with 8c-132
Sep 04 16:58:16.155897 8c-131 avahi-daemon[226]: Registering new address record for 2001:470:XXXX:XXX:XXXX:16ff:fee2:XXXX on eth0.*.
Sep 04 16:58:16.156973 8c-131 avahi-daemon[226]: Registering new address record for 10.0.101.180 on eth0.*.
Sep 04 16:58:17.799027 8c-131 systemd-networkd[114]: eth0: Configured
Sep 04 16:58:17.969851 8c-131 avahi-daemon[226]: Server startup complete. Host name is 8c-132.local. Local service cookie is 1726236826.

Also, I just checked my laptop's avahi hostname, it's chinchilla-7 currently (with its actual hostname being chinchilla), and I don't think any of the networks I connect to have an mDNS reflector.

@pemensik
Copy link
Member

pemensik commented Sep 4, 2024

One of reasons you see those issues might be usage of systemd-networkd. If I understand it correctly, it swaps used addresses after lost link from normal DHCP lease to IPv4 link local address. That is not common behaviour seen with Network Manager, at least not default configuration. I hoped you could record this interation with wireshark or tcpdump, so we would see timing of packets and their source addresses too.

Solution might be what @evverx mentioned at #554 (comment). Not giving up immediately, but retrying one second later. If there is a conflict indeed, it will get the same result again. If not, then it might have been random repetition of myself, delayed by whatever network elements. Remembering recent own withdrawn addresses should help too.

Thank you for that steps. I guess we would need something similar done between VMs or containers. So it can be tested automated way and not depend on specific network device.

@pemensik
Copy link
Member

pemensik commented Sep 4, 2024

Though we need to check whether recent development version behaves this way. I am not sure why you are testing avahi-daemon 0.7 version. Ideal reproducer should happen on snapshot version from master. There might be fixed related issues, which make it less reproducible since 0.7 release.

But yes, using publish-aaaa-on-ipv4=yes and publish-a-on-ipv6=yes make it more likely to happen, that were already reported.

@pemensik
Copy link
Member

pemensik commented Sep 4, 2024

Just to add to distributions Bug lists, we had one also on Fedora. https://bugzilla.redhat.com/show_bug.cgi?id=1657325
They all report frequent withdrawing of addresses and that ipv6 usage makes things worse.

I think we need to log source address of conflict, like I started at #554. It would be then clear that was our own address even from logs. Existing log does not help much.

@evverx
Copy link
Member

evverx commented Sep 4, 2024

Solution might be what @evverx mentioned at #554 (comment).

I think most of the conflicts mentioned here come from #554 (comment)

conflicting Multicast
DNS responses received before the first probe packet is sent MUST
be silently ignored (see discussion of stale probe packets in Section
8.2, "Simultaneous Probe Tiebreaking", below).

so that part can somewhat mitigate the issue. Ideally #554 (comment) should be implemented too to fix it once and for all.

(I have a test suite that can trigger all those conflicts and none of the patches suggested here and in some other PRs fully fix them unfortunately)

I think we need to log source address of conflict, like I started at #554. It would be then clear that was our own address even from logs. Existing log does not help much.

I agree that it can be improved.

Never, ever run binaries provided by random people on the Internet.

In light of https://www.bleepingcomputer.com/news/security/github-comments-abused-to-push-password-stealing-malware-masked-as-fixes/ and things like that I don't think it's even safe to quote comments with links because GitHub algorithms can think that they push malware (or whatever) too and their authors can get blocked too just in case.

@mvduin
Copy link

mvduin commented Sep 4, 2024

One of reasons you see those issues might be usage of systemd-networkd. If I understand it correctly, it swaps used addresses after lost link from normal DHCP lease to IPv4 link local address.

That is incorrect, all IPv4 addresses are removed when link is lost.

Link-local IPv4 is disabled by default in systemd-networkd, although it is enabled on the test device. On the older version of systemd-networkd that's running on the test device this will unconditionally acquire a link-local IPv4 address, independent of DHCP. We've seen avahi's self-conflicts also prior to enabling link-local IPv4 on these devices, I don't think it's related. Link-local IPv4 is disabled on my laptop, which also has this problem.

I hoped you could record this interation with wireshark or tcpdump, so we would see timing of packets and their source addresses too.

I'll see if I can do that quickly, I don't really have a ton of time to spend on this right now.

I am not sure why you are testing avahi-daemon 0.7 version.

Because that's what's running on this device (an embedded system), which is part of a test setup for which I already have scripting tools that let me "unplug" its ethernet link programmatically.

Note that my laptop is avahi-daemon 0.8-10 (latest package from debian bookworm) and also experiences the problem, I've checked the logs and found at least one occurrence was on a network which definitely doesn't have an mDNS reflector.

Until the root cause is found, changes that make it less reproducible just make it harder to debug.

@evverx
Copy link
Member

evverx commented Sep 4, 2024

Until the root cause is found, changes that make it less reproducible just make it harder to debug.

The issue is that avahi doesn't follow the RFC. #554 (comment) (where "MUST" is violated) and #554 (comment) (where stale probes aren't handled) lead to spurious conflicts.

@mvduin
Copy link

mvduin commented Sep 4, 2024

I hoped you could record this interation with wireshark or tcpdump, so we would see timing of packets and their source addresses too.

Here's another capture (still avahi-daemon 0.7) with link-local IPv4 and publish-a-on-ipv6 disabled (publish-aaaa-on-ipv4 is enabled by default), with additionally address-add/delete messages from rtnetlink logged and mDNS packets from tcpdump (using packet timestamps rather than the slightly later journal timestamps, and heavily cleaned up for readability).

All mDNS packets logged are coming from the local avahi-daemon, there are no incoming mDNS packets within this timeframe. IPv6 addresses have been abbreviated to fe80:... or 2001:....

21:50:06.171500 systemd-networkd: eth0: Lost carrier
21:50:06.171551 systemd-networkd: eth0: DHCP lease lost
21:50:06.171711 kernel: cpsw 4a100000.ethernet eth0: Link is Down
21:50:06.184841 avahi-daemon: Withdrawing address record for 10.0.101.180 on eth0.
21:50:06.185032 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
21:50:06.204365 rtnetlink: inet addr delete  10.0.101.180/24
21:50:06.204365 rtnetlink: inet6 addr delete  2001:.../64
21:50:06.218332 avahi-daemon: Interface eth0.IPv4 no longer relevant for mDNS.
21:50:06.218878 avahi-daemon: Withdrawing address record for 2001:... on eth0.
21:50:06.219006 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv6 with address 2001:....
21:50:06.268069 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv6 with address fe80:....
21:50:06.282608 avahi-daemon: Registering new address record for fe80:... on eth0.*.
21:50:08.237194 systemd-networkd: eth0: Gained carrier
21:50:08.252010 kernel: cpsw 4a100000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
21:50:10.318034 tcpdump: fe80:...:     announce: [fe80:...].arpa PTR 8c-132.local,  8c-132.local AAAA fe80:...
21:50:12.212405 systemd-networkd: eth0: DHCPv4 address 10.0.101.180/24 via 10.0.101.1
21:50:12.217326 rtnetlink: inet addr add  10.0.101.180/24
21:50:12.220208 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
21:50:12.248126 avahi-daemon: New relevant interface eth0.IPv4 for mDNS.
21:50:12.249227 avahi-daemon: Registering new address record for 10.0.101.180 on eth0.*.
21:50:12.251915 systemd-networkd: eth0: Configured
21:50:12.392259 tcpdump: 10.0.101.180: query: [fe80:...].arpa,  8c-132.local,  [10.0.101.180].arpa
21:50:12.392583 tcpdump: fe80:...:     query: [10.0.101.180].arpa,  8c-132.local
21:50:12.392979 tcpdump: fe80:...:     announce: 8c-132.local AAAA fe80:...
21:50:12.642782 tcpdump: 10.0.101.180: query: [fe80:...].arpa,  8c-132.local,  [10.0.101.180].arpa
21:50:12.643606 tcpdump: fe80:...:     query: [10.0.101.180].arpa,  8c-132.local
21:50:12.893273 tcpdump: 10.0.101.180: query: [fe80:...].arpa,  8c-132.local,  [10.0.101.180].arpa
21:50:12.894079 tcpdump: fe80:...:     query: [10.0.101.180].arpa,  8c-132.local
21:50:12.894899 tcpdump: fe80:...:     announce: 8c-132.local AAAA fe80:...
21:50:13.094002 tcpdump: 10.0.101.180: announce: [fe80:...].arpa PTR 8c-132.local,  8c-132.local A 10.0.101.180,  [10.0.101.180].arpa PTR 8c-132.local,  8c-132.local AAAA fe80:...
21:50:13.098000 tcpdump: fe80:...:     announce: [10.0.101.180].arpa PTR 8c-132.local,  8c-132.local A 10.0.101.180
21:50:13.098675 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80:....
21:50:13.106774 rtnetlink: inet6 addr add  2001:.../64
21:50:13.109935 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv6 with address 2001:....
21:50:13.119629 avahi-daemon: Registering new address record for 2001:... on eth0.*.
21:50:13.120595 avahi-daemon: Withdrawing address record for fe80:... on eth0.
21:50:13.121779 tcpdump: 2001:...:     announce: [fe80:...].arpa PTR 8c-132.local,  8c-132.local AAAA fe80:...
21:50:13.123522 tcpdump: 10.0.101.180: announce: [fe80:...].arpa PTR 8c-132.local,  8c-132.local AAAA fe80:...
21:50:13.126407 avahi-daemon: Withdrawing address record for 10.0.101.180 on eth0.
21:50:13.128151 avahi-daemon: Host name conflict, retrying with 8c-133
21:50:13.129259 avahi-daemon: Registering new address record for 2001:... on eth0.*.
21:50:13.130204 avahi-daemon: Registering new address record for 10.0.101.180 on eth0.*.
21:50:13.132022 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-132.local,  8c-132.local A 10.0.101.180
21:50:13.133682 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-132.local,  8c-132.local A 10.0.101.180
21:50:13.394035 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-133.local
21:50:13.397601 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-133.local
21:50:13.644955 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-133.local
21:50:13.645600 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-133.local
21:50:13.895132 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-133.local
21:50:13.895416 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-133.local
21:50:14.095971 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-133.local,  8c-133.local A 10.0.101.180
21:50:14.096264 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-133.local,  8c-133.local A 10.0.101.180
21:50:14.263993 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-133.local
21:50:14.264779 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-133.local
21:50:14.514425 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-133.local
21:50:14.514793 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-133.local
21:50:14.764677 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-133.local
21:50:14.765480 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-133.local
21:50:14.766290 tcpdump: 2001:...:     announce: 8c-133.local A 10.0.101.180
21:50:14.767037 tcpdump: 10.0.101.180: announce: 8c-133.local A 10.0.101.180
21:50:14.964763 avahi-daemon: Server startup complete. Host name is 8c-133.local. Local service cookie is 1726236826.
21:50:14.965456 tcpdump: 10.0.101.180: announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...
21:50:14.966730 tcpdump: 2001:...:     announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...
21:50:15.189474 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-133.local
21:50:15.189762 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-133.local
21:50:16.059088 tcpdump: 10.0.101.180: announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...
21:50:16.059465 tcpdump: 2001:...:     announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...
21:50:17.282986 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-133.local,  8c-133.local A 10.0.101.180
21:50:17.283282 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-133.local,  8c-133.local A 10.0.101.180
21:50:18.241489 tcpdump: 10.0.101.180: announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...
21:50:18.241880 tcpdump: 2001:...:     announce: [2001:...].arpa PTR 8c-133.local,  8c-133.local AAAA 2001:...

@evverx
Copy link
Member

evverx commented Sep 4, 2024

All mDNS packets logged are coming from the local avahi-daemon, there are no incoming mDNS packets within this timeframe

avahi itself receives those packets (including probes as well) and when they match they are marked ours in incoming_probe. Conflict occur because that basic check isn't sufficient.

@evverx
Copy link
Member

evverx commented Sep 5, 2024

@mvduin could you run avahi with --debug? It should show what avahi thinks it conflicts with. Messages like

Received conflicting probe [C.local        IN        A 192.168.141.46 ; ttl=120].

should appear. It should make it easier to figure out what happens in places like

21:50:13.126407 avahi-daemon: Withdrawing address record for 10.0.101.180 on eth0.
21:50:13.128151 avahi-daemon: Host name conflict, retrying with 8c-133

@mvduin
Copy link

mvduin commented Sep 5, 2024

avahi itself receives those packets (including probes as well)

Yeah I know, I just meant that there were no mDNS packets being received from other hosts or stacks, hence the source of the conflict is necessarily Avahi's own probes. (I also explicitly mentioned it since I was abbreviating all IPv6 addresses in a way that would be ambiguous if more than one host were present in the logs)

Here's with --debug, as expected the source of the conflict is simply a very recently transmitted probe:

13:13:15.531604 systemd-networkd: eth0: Lost carrier
13:13:15.531657 systemd-networkd: eth0: DHCP lease lost
13:13:15.532235 kernel: cpsw 4a100000.ethernet eth0: Link is Down
13:13:15.544817 avahi-daemon: Withdrawing address record for 10.0.101.180 on eth0.
13:13:15.545023 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
13:13:15.551420 avahi-daemon: Interface eth0.IPv4 no longer relevant for mDNS.
13:13:15.551752 avahi-daemon: sendmsg() to 0:0:ff02:: failed: Invalid argument
13:13:15.551882 avahi-daemon: Withdrawing address record for 2001:... on eth0.
13:13:15.552014 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv6 with address 2001:....
13:13:15.558786 rtnetlink: inet addr delete  10.0.101.180/24
13:13:15.558786 rtnetlink: inet6 addr delete  2001:.../64
13:13:15.577213 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv6 with address fe80:....
13:13:15.584630 avahi-daemon: Registering new address record for fe80:... on eth0.*.

13:13:17.599661 systemd-networkd: eth0: Gained carrier
13:13:17.606550 kernel: cpsw 4a100000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
13:13:17.683196 tcpdump: fe80:...:     announce: [fe80:...].arpa PTR 8c-131.local, 8c-131.local AAAA fe80:...
13:13:19.857481 tcpdump: fe80:...:     announce: [fe80:...].arpa PTR 8c-131.local, 8c-131.local AAAA fe80:...
13:13:21.985330 systemd-networkd: eth0: DHCPv4 address 10.0.101.180/24 via 10.0.101.1
13:13:21.990299 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv4 with address 10.0.101.180.
13:13:22.004591 rtnetlink: inet addr add  10.0.101.180/24
13:13:22.020546 avahi-daemon: New relevant interface eth0.IPv4 for mDNS.
13:13:22.021783 avahi-daemon: Registering new address record for 10.0.101.180 on eth0.*.
13:13:22.101788 tcpdump: 10.0.101.180: query: [fe80:...].arpa,  8c-131.local,  [10.0.101.180].arpa
13:13:22.102109 tcpdump: fe80:...:     query: [10.0.101.180].arpa,  8c-131.local
13:13:22.102501 tcpdump: fe80:...:     announce: 8c-131.local AAAA fe80:...
13:13:22.325223 systemd-networkd: eth0: Configured
13:13:22.328013 avahi-daemon: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80:....
13:13:22.341674 rtnetlink: inet6 addr add  2001:.../64
13:13:22.344335 avahi-daemon: Joining mDNS multicast group on interface eth0.IPv6 with address 2001:....
13:13:22.353917 avahi-daemon: Registering new address record for 2001:... on eth0.*.
13:13:22.355080 avahi-daemon: Withdrawing address record for fe80:... on eth0.
13:13:22.356324 tcpdump: 10.0.101.180: query: [fe80:...].arpa,  8c-131.local,  [10.0.101.180].arpa
13:13:22.358572 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-131.local
13:13:22.360162 tcpdump: 2001:...:     announce: [fe80:...].arpa PTR 8c-131.local, 8c-131.local AAAA fe80:...
13:13:22.362219 avahi-daemon: Received conflicting probe [8c-131.local        IN        AAAA fe80:... ; ttl=120]. Local host lost. Withdrawing.
13:13:22.363248 avahi-daemon: Withdrawing address record for 10.0.101.180 on eth0.
13:13:22.365099 avahi-daemon: Host name conflict, retrying with 8c-132
13:13:22.365386 avahi-daemon: Registering new address record for 2001:... on eth0.*.
13:13:22.365595 avahi-daemon: Registering new address record for 10.0.101.180 on eth0.*.
13:13:23.103441 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-132.local
13:13:23.107121 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-132.local
13:13:23.354053 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-132.local
13:13:23.354381 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-132.local
13:13:23.435870 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-132.local
13:13:23.436245 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-132.local
13:13:23.605207 tcpdump: 2001:...:     query: [10.0.101.180].arpa,  8c-132.local
13:13:23.605879 tcpdump: 10.0.101.180: query: [10.0.101.180].arpa,  8c-132.local
13:13:23.686258 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-132.local
13:13:23.686623 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-132.local
13:13:23.805291 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180
13:13:23.805576 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180
13:13:23.936852 tcpdump: 2001:...:     query: [2001:...].arpa,  8c-132.local
13:13:23.937206 tcpdump: 10.0.101.180: query: [2001:...].arpa,  8c-132.local
13:13:24.137757 avahi-daemon: Server startup complete. Host name is 8c-132.local. Local service cookie is 925266343.
13:13:24.138403 tcpdump: 10.0.101.180: announce: [2001:...].arpa PTR 8c-132.local, 8c-132.local AAAA 2001:...
13:13:24.139636 tcpdump: 2001:...:     announce: [2001:...].arpa PTR 8c-132.local, 8c-132.local AAAA 2001:...
13:13:24.835190 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180
13:13:24.835473 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180
13:13:25.168880 tcpdump: 10.0.101.180: announce: [2001:...].arpa PTR 8c-132.local, 8c-132.local AAAA 2001:...
13:13:25.169249 tcpdump: 2001:...:     announce: [2001:...].arpa PTR 8c-132.local, 8c-132.local AAAA 2001:...
13:13:26.866422 tcpdump: 10.0.101.180: announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180
13:13:26.866729 tcpdump: 2001:...:     announce: [10.0.101.180].arpa PTR 8c-132.local, 8c-132.local A 10.0.101.180

@ernstkl
Copy link

ernstkl commented Oct 15, 2024

All mDNS packets logged are coming from the local avahi-daemon, there are no incoming mDNS packets within this timeframe

avahi itself receives those packets (including probes as well) and when they match they are marked ours in incoming_probe. Conflict occur because that basic check isn't sufficient.

To me your remark raises the question: in which way could a probe that avahi has sent and has received it back be different from all records in the list it compares against, in the incoming_probe() function?

  • either the record that should be there and match the incoming probe (because avahi itself earlier sent it) is really in the list, but it has too much detail / wrong info, making the match fail.
  • or the record that should be there is not there at the point in time when the comparison is done. I can think of two reasons for that
    • the probe is sent before the record makes it to the list (solution: ...)
    • the probe is sent after the record has been added, as it should be, but then the record is quickly removed again for some reason (solution: keep those records for at least x seconds)

(I have been trying to collect data points and compare to this theory by logging more detail from inside incoming_probe() but the "spurious name conflict" I had for weeks just has vanished for now)

@ernstkl
Copy link

ernstkl commented Oct 16, 2024

There is a way to reproducibly switch on / off the problem being discussed here, namely that avahi appends '-2' to the host name because of conflict - on the systems where I have the problem.

Just have librespot running or not.

librespot integrated their own mdns responder in commit b25585a41b7a3cf35776e20345e5718c3abf16b7 back in 2016 which could be the source of the problem.

Maybe this is the root cause for many people being affected, rather than possible shortcomings in the avahi code.

@mvduin
Copy link

mvduin commented Oct 20, 2024

There is a way to reproducibly switch on / off the problem being discussed here, namely that avahi appends '-2' to the host name because of conflict - on the systems where I have the problem.

Just have librespot running or not.

That just sounds like librespot is publishing conflicting records, and the resulting rename is just Avahi behaving correctly. This thread is specifically about Avahi spuriously conflicting with itself

librespot integrated their own mdns responder in commit b25585a41b7a3cf35776e20345e5718c3abf16b7 back in 2016 which could be the source of the problem.

It 100% is, you don't want multiple mDNS stacks running on the same machine. The correct way to publish an mDNS service is by using the system mDNS stack, i.e. Avahi on linux. Using a custom mDNS stack should be avoided unless no system mDNS stack is available, or if some software insists on using one anyway then it must use a randomly generated hostname for publishing records, not the system hostname.

@evverx
Copy link
Member

evverx commented Oct 20, 2024

That just sounds like librespot is publishing conflicting records

I'm not sure what librespot does but if it does what the RFC says it advertises its link-local and global addresses at the same time and that should be handled by avahi properly with no conflicts. It doesn't work like that currently though #243. If that part of the RFC was implemented the spurious conflicts (including the ones where avahi conflicts with its own link-local addresses) would be much less likely to pop up as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug important High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.