-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compatibility with some IoT devices using avahi 0.8-rc1 #27
Fix compatibility with some IoT devices using avahi 0.8-rc1 #27
Conversation
This fixes browse, lookup and also register not working properly with devices running with avahi 0.8-rc1
Hi everyone, thanks for continuing the work with zeroconf! I had an issue with IoT devices running avahi 0.8-r1 not responding to requests and also not seeing a created service. When using tcpdump to compare why other devices on the network successfully found this IoT device, the only difference turned out to be the TTL set to 1 before this patch. Now everything works fine. |
Related PR: grandcat#108 |
@DerAndereAndi Thanks! A couple questions:
|
Hi @MarcoPolo thanks for your questions.
I am happy to assist providing anything I can. But so far I wasn't that deep into mDNS and this implementation. |
It looks like the Wireshark is also complaining about the TTL (which appears to be 1 by default): I'm still not 100% convinced that 255 is the correct value, as our old mDNS implementation also uses a TTL of 1. |
The Multicast DNS RFC has a section about IP TTLs: https://datatracker.ietf.org/doc/html/rfc6762#section-11
Note that this only applies to responses, and not to queries. I'd assume it's fair to say that we don't really care about interop with implementations of a draft version from 18 years ago. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, all the mDNS traffic I'm observing on my computer / local network has the TTL set to 255, so this is probably fine.
@DerAndereAndi Do we also need to set the Hop Limit on IPv6?
I did some more testing using the examples resolve client and changed the Running with out the TTL change results in: > go run examples/resolv/client.go
2022/06/19 11:22:43 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:22:43 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []}
2022/06/19 11:22:53 No more entries. Now running with the TTL change: > go run examples/resolv/client.go
2022/06/19 11:24:17 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:24:17 &{{Elli-Wallbox-2019A0OV8H _ship._tcp [] local _ship._tcp.local. Elli-Wallbox-2019A0OV8H._ship._tcp.local. _services._dns-sd._udp.local.} wallbox-2019A0OV8H.local. 4712 [model=Wallbox type=Wallbox brand=Elli ski=46b9642e684fc5274187487aad35a0508e32de3e register=false path=/ship/ id=Elli-Wallbox-2019A0OV8H txtvers=1 org.freedesktop.Avahi.cookie=413265520] 120 [192.168.1.14] [fe80::40b2:61cb:4f44:4ae7]}
2022/06/19 11:24:17 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []}
2022/06/19 11:24:27 No more entries. Now using avahi 0.8 on Ubuntu and run it via: > avahi-publish-service --version
avahi-publish-service 0.8
> avahi-publish-service -s Demo10 _ship._tcp 4713 textvers=1 path=/ship/ id=Demo-1ce34ca905f7013a ski=1c1f59ac545fdcc9dc1a085bfee5c94ef1348da2 brand=Demo model=Demo_Hems type=EnergyManagementSystem register=true
Established under name 'Demo10 Running the resolve client without the TTL change: > go run examples/resolv/client.go
2022/06/19 11:26:54 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:26:54 &{{Demo10 _ship._tcp [] local _ship._tcp.local. Demo10._ship._tcp.local. _services._dns-sd._udp.local.} primary.local. 4713 [textvers=1 path=/ship/ id=Demo-1ce34ca905f7013a ski=1c1f59ac545fdcc9dc1a085bfee5c94ef1348da2 brand=Demo model=Demo_Hems type=EnergyManagementSystem register=true] 120 [192.168.1.9] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:26:55 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []} The Demo service appears. So contrary to my statement, this doesn't seem to be a general issue with avahi 0.8. Still it all these devices (and more I have) are using an TTL of 255 when checking it with Wireshark. |
@marten-seemann I haven't seen any issues with that yet and also didn't see anything in Wireshark that hints for this. But I might have overseen something. |
Thank you @DerAndereAndi. As I've said in #27 (review), I think this change is fine. I'm just wondering if we need to make the equivalent change for IPv6. Would be good to fix this once and for all. |
https://man7.org/linux/man-pages/man7/ip.7.html
According to the man page we want this to be as small as possible? |
I tried multiple IP TTLs for this device, only using 255 will find the device reliably. Even 254 will not work. They must be doing something locally on the device with routing that causes this. As the Avahi code the IP TTL was set to 255 18 years ago when that code was introduced and never changed, they surely wouldn't find an issue with their setup. The question for me is, if one would want the number to be as low as possible: what is the benefit of doing it differently than the "de-facto" standard implementation? Right now I only see a downside, of the library not working with this specific device that is sold in the thousands. |
I think I agree with you here. I'm just confused in general why the manpage would recommend the exact opposite of what is seen in practice. Do you happen to know if this is an issue with ipv6 as well? My guess is that ipv6 implementations would be a bit newer and not have this issue. |
I just disabled IPv4 and tested with IPv6 only. And you are right, this device also requires to up the hop limit on IPv6. So I added this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I dislike setting a TTL of 255, it seems like this is what all mDNS implementations do, and that it actually fixes a problem with certain devices.
Thanks for digging into this @DerAndereAndi! |
Thanks for merging @marten-seemann. What I can't wrap my head around is why this is necessary at all? This is the IP TTL and that is imho the number of hops. Inside the local network there would typically be a single hop only. Are we patching network stack issues here or what is the likely root cause? |
The root cause is most likely the one described in the RFC (see #27 (comment)): Terribly broken and horribly outdated mDNS implementations. |
This fixes browse, lookup and also register not working properly with some devices running with avahi 0.8-rc1
The problem came up with Elli Charger wallboxes, which are using avahi 0.8-rc1 which couldn't be seen using this library and the device also didn't see the service announced with this library. Using avahi 0.8-rc1 on a linux device (ubuntu) does not show the same effects. So something within this device is causing this.
Upping the IPv4 TTL to 255 and the IPv6 HopLimit to 255 solves this. The avahi library also uses these values since almost 18 years and they haven't been changed.