Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(cargo): upgrade iroh to 0.27.0 #6133

Closed
wants to merge 1 commit into from
Closed

chore(cargo): upgrade iroh to 0.27.0 #6133

wants to merge 1 commit into from

Conversation

link2xt
Copy link
Collaborator

@link2xt link2xt commented Oct 31, 2024

No description provided.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

I just successfully upgraded to 0.26.0 in #6132.

But iroh 0.27.0 fails all tests in CI.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

Debug logs:

2024-10-31T14:20:00.270729Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:probe: iroh_net::netcheck::reportgen: probe set aborted: no relay node addr: Not implemented probe=Https { delay: 300ms, node: RelayNode { url: RelayUrl("https://iroh.testrun.org.:4443/"), stun_only: false, stun_port: 3478 } }
2024-10-31T14:20:00.270751Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:run_probe{probe=ICMPv4 after 300ms to https://iroh.testrun.org.:4443/}: iroh_net::netcheck::reportgen: Performing DNS A lookup for relay addr proto=IcmpV4 hostname=iroh.testrun.org.
2024-10-31T14:20:00.271923Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:run_probe{probe=ICMPv4 after 300ms to https://iroh.testrun.org.:4443/}: iroh_net::netcheck::reportgen: ICMP Ping started dst=5.9.136.70:3478 len=15
2024-10-31T14:20:00.272072Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:run_probe{probe=ICMPv4 after 300ms to https://iroh.testrun.org.:4443/}: iroh_net::ping: Creating pinger addr=5.9.136.70 ident=52209
2024-10-31T14:20:00.272149Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:captive-portal: iroh_net::netcheck::reportgen: check_captive_portal url=http://iroh.testrun.org./generate_204 status_code=308 Permanent Redirect valid_response=false
2024-10-31T14:20:00.312787Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:run_probe{probe=ICMPv4 after 300ms to https://iroh.testrun.org.:4443/}: iroh_net::ping: 23 bytes from 5.9.136.70: icmp_seq=0 ttl=None time=40.51ms
2024-10-31T14:20:00.312863Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor:run_probe{probe=ICMPv4 after 300ms to https://iroh.testrun.org.:4443/}: iroh_net::netcheck::reportgen: ICMP ping done dst=5.9.136.70:3478 len=15 latency=40.507195ms
2024-10-31T14:20:00.313002Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor: iroh_net::netcheck::reportgen: finished probe probe_report=ProbeReport { ipv4_can_send: true, ipv6_can_send: false, icmpv4: Some(true), icmpv6: None, latency: Some(40.507195ms), probe: IcmpV4 { delay: 300ms, node: RelayNode { url: RelayUrl("https://iroh.testrun.org.:4443/"), stun_only: false, stun_port: 3478 } }, addr: None }
2024-10-31T14:20:00.313091Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor: iroh_net::netcheck::reportgen: Have enough probe reports, aborting further probes soon reports=1 delay=81.01439ms
2024-10-31T14:20:00.471727Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper::pcp: probe failed: deadline has elapsed
2024-10-31T14:20:00.473027Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper::nat_pmp: probe failed: deadline has elapsed
2024-10-31T14:20:02.970233Z  WARN ep{me=hscwll6rwithecy4}:magicsock:netcheck.actor:reportgen.actor: iroh_net::netcheck::reportgen: tick: probes timed out
2024-10-31T14:20:04.969744Z DEBUG ep{me=hscwll6rwithecy4}:magicsock:portmapper.service: iroh_net::portmapper: failed to get a port mapping No response within timeout
2024-10-31T14:20:04.971251Z  WARN ep{me=hscwll6rwithecy4}:magicsock:actor: iroh_net::magicsock: failed to generate netcheck report for: periodic: report aborted

I will bisect to the iroh commit that broke it. Can also try with a newer relay, but don't see why it should not work with older relay especially if no network compatibility breaking changes are in the release notes.

iroh-relay running on iroh.tesrun.org is 0.25.0.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

I have bisected the problem to commit n0-computer/iroh@5e40fe1
(PR n0-computer/iroh#2804)

git-bisect log:
0.26.0: good
0.27.0: bad
c349c435f82494cf640fdea0d2026588d26e7f15: good
ed903ae0b5b5496aa2793e256843fec84dab81d2: bad
14363891777e9857a7f6e0fd90a284f5a50d5fd9: bad
5e40fe138f9581a195d47c251992e3de8b1ec8c1: bad
2d04306a518f060bdaa7adfb42630ae01e04e093: good
5e40fe138f9581a195d47c251992e3de8b1ec8c1 is the first bad commit

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

And the following patch fixes the problem, so it is indeed this commit that broke it:

From 85c8b7ae93c0ca6fe687b058de8d26e6feb12c7e Mon Sep 17 00:00:00 2001
From: user <[email protected]>
Date: Thu, 31 Oct 2024 15:07:09 +0000
Subject: [PATCH] Revert "deps(iroh-net): Upgrade igd-next, remove hyper 0.14
 (#2804)"

This reverts commit 5e40fe138f9581a195d47c251992e3de8b1ec8c1.
---
 Cargo.lock          | 69 ++++++++++++++++++++++++++++++++++++---------
 iroh-net/Cargo.toml |  2 +-
 2 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/Cargo.lock b/Cargo.lock
index 6fb8e958..0f962859 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -282,7 +282,7 @@ dependencies = [
  "http 1.1.0",
  "http-body 1.0.1",
  "http-body-util",
- "hyper",
+ "hyper 1.4.1",
  "hyper-util",
  "itoa",
  "matchit",
@@ -348,7 +348,7 @@ dependencies = [
  "http 1.1.0",
  "http-body 1.0.1",
  "http-body-util",
- "hyper",
+ "hyper 1.4.1",
  "hyper-util",
  "pin-project-lite",
  "rustls",
@@ -1841,6 +1841,25 @@ dependencies = [
  "subtle",
 ]
 
+[[package]]
+name = "h2"
+version = "0.3.26"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "81fe527a889e1532da5c525686d96d4c2e74cdd345badf8dfef9f6b39dd5f5e8"
+dependencies = [
+ "bytes",
+ "fnv",
+ "futures-core",
+ "futures-sink",
+ "futures-util",
+ "http 0.2.12",
+ "indexmap 2.2.6",
+ "slab",
+ "tokio",
+ "tokio-util",
+ "tracing",
+]
+
 [[package]]
 name = "h2"
 version = "0.4.5"
@@ -2218,6 +2237,30 @@ dependencies = [
  "syn 1.0.109",
 ]
 
+[[package]]
+name = "hyper"
+version = "0.14.30"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a152ddd61dfaec7273fe8419ab357f33aee0d914c5f4efbf0d96fa749eea5ec9"
+dependencies = [
+ "bytes",
+ "futures-channel",
+ "futures-core",
+ "futures-util",
+ "h2 0.3.26",
+ "http 0.2.12",
+ "http-body 0.4.6",
+ "httparse",
+ "httpdate",
+ "itoa",
+ "pin-project-lite",
+ "socket2",
+ "tokio",
+ "tower-service",
+ "tracing",
+ "want",
+]
+
 [[package]]
 name = "hyper"
 version = "1.4.1"
@@ -2227,7 +2270,7 @@ dependencies = [
  "bytes",
  "futures-channel",
  "futures-util",
- "h2",
+ "h2 0.4.5",
  "http 1.1.0",
  "http-body 1.0.1",
  "httparse",
@@ -2247,7 +2290,7 @@ checksum = "5ee4be2c948921a1a5320b629c4193916ed787a7f7f293fd3f7f5a6c9de74155"
 dependencies = [
  "futures-util",
  "http 1.1.0",
- "hyper",
+ "hyper 1.4.1",
  "hyper-util",
  "rustls",
  "rustls-pki-types",
@@ -2268,7 +2311,7 @@ dependencies = [
  "futures-util",
  "http 1.1.0",
  "http-body 1.0.1",
- "hyper",
+ "hyper 1.4.1",
  "pin-project-lite",
  "socket2",
  "tokio",
@@ -2328,18 +2371,16 @@ dependencies = [
 
 [[package]]
 name = "igd-next"
-version = "0.15.1"
+version = "0.14.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "76b0d7d4541def58a37bf8efc559683f21edce7c82f0d866c93ac21f7e098f93"
+checksum = "064d90fec10d541084e7b39ead8875a5a80d9114a2b18791565253bae25f49e4"
 dependencies = [
  "async-trait",
  "attohttpc",
  "bytes",
  "futures",
- "http 1.1.0",
- "http-body-util",
- "hyper",
- "hyper-util",
+ "http 0.2.12",
+ "hyper 0.14.30",
  "log",
  "rand",
  "tokio",
@@ -2780,7 +2821,7 @@ dependencies = [
  "anyhow",
  "erased_set",
  "http-body-util",
- "hyper",
+ "hyper 1.4.1",
  "hyper-util",
  "once_cell",
  "prometheus-client",
@@ -2820,7 +2861,7 @@ dependencies = [
  "hostname",
  "http 1.1.0",
  "http-body-util",
- "hyper",
+ "hyper 1.4.1",
  "hyper-util",
  "igd-next",
  "iroh-base",
@@ -4525,7 +4566,7 @@ dependencies = [
  "http 1.1.0",
  "http-body 1.0.1",
  "http-body-util",
- "hyper",
+ "hyper 1.4.1",
  "hyper-rustls",
  "hyper-util",
  "ipnet",
diff --git a/iroh-net/Cargo.toml b/iroh-net/Cargo.toml
index e94fb778..a33a2555 100644
--- a/iroh-net/Cargo.toml
+++ b/iroh-net/Cargo.toml
@@ -37,7 +37,7 @@ http = "1"
 http-body-util = "0.1.0"
 hyper = { version = "1", features = ["server", "client", "http1"] }
 hyper-util = "0.1.1"
-igd-next = { version = "0.15.1", features = ["aio_tokio"] }
+igd-next = { version = "0.14.3", features = ["aio_tokio"] }
 iroh-base = { version = "0.27.0", path = "../iroh-base", features = ["key"] }
 libc = "0.2.139"
 num_enum = "0.7"
-- 
2.47.0

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

And now on a different connection it runs. Fails in GitHub Actions, on mobile hotspot connected to LTE and on one of the Wi-Fi networks, but works on another Wi-Fi network.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

On a failing connection:

2024-10-31T16:52:12.648809Z DEBUG ep{me=reo5fqnlgre6x7br}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper::upnp: upnp probe failed: No response within timeout
2024-10-31T16:52:12.648913Z TRACE ep{me=reo5fqnlgre6x7br}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper: tick: upnp probe ready

On a successful connection:

2024-10-31T16:54:20.141869Z DEBUG ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper::nat_pmp: probe failed: Response is malformed
2024-10-31T16:54:20.141884Z TRACE ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper: tick: nat_pmp probe ready
2024-10-31T16:54:20.141991Z DEBUG ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper::pcp: probe failed: operation not authorized
2024-10-31T16:54:20.141998Z TRACE ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper: tick: pcp probe ready
DEBUG    root:rpc.py:180 account_id=2 got an event {'kind': 'Info', 'msg': 'src/imap/select_folder.rs:39: close/expunge succeeded'}
2024-10-31T16:54:20.192006Z TRACE ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service:portmapper.probe: iroh_net::portmapper: tick: upnp probe ready
2024-10-31T16:54:20.192057Z TRACE ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service: iroh_net::portmapper: tick: probe ready
2024-10-31T16:54:20.192092Z DEBUG ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service: iroh_net::portmapper: upnp gateway changed "None" -> "http://192.168.178.1:49000/igdupnp/control/WANIPConn1"
2024-10-31T16:54:20.192102Z TRACE ep{me=eeel4ei4p45ythpj}:magicsock:portmapper.service: iroh_net::portmapper: probe output output=ProbeOutput { upnp: true, pcp: false, nat_pmp: false }

So basically iroh now works only on a connection with working UPNP.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

On a network without UPNP running CHATMAIL_DOMAIN=nine.testrun.org RUST_LOG=iroh_net=trace pytest tests/test_iroh_webxdc.py::test_realtime_sequentially -s (which uses https://iroh.testrun.org.:4443/ relay on version 0.25.0) gets stuck, running CHATMAIL_DOMAIN=bcc.chat RUST_LOG=iroh_net=trace pytest tests/test_iroh_webxdc.py::test_realtime_sequentially -s (which falls back to default iroh relay) passes.
On a network with UPNP and using nine.testrun.org it also passes.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

Downgraded igd-next to 0.15.0, still fails. So the problem appeared somewhere between 0.14.3 and 0.15.0, and the only change was dariusc93/rust-igd#8 (commit dariusc93/rust-igd@56934bb) not counting dariusc93/rust-igd@a06b503

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

With igd_next=trace I get this on a network without UPNP with igd-next 0.15.0:

2024-10-31T18:06:34.895109Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:42600)
2024-10-31T18:06:34.897189Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:39924)
2024-10-31T18:06:34.897905Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:58116)
...
2024-10-31T18:06:39.898443Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
2024-10-31T18:06:39.899833Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
2024-10-31T18:07:05.633502Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:44960)
2024-10-31T18:07:05.634421Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:49078)
2024-10-31T18:07:10.634701Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
2024-10-31T18:07:10.635880Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
2024-10-31T18:07:34.759455Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:51324)
2024-10-31T18:07:34.761698Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:35569)
2024-10-31T18:07:39.760546Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
2024-10-31T18:07:39.762000Z DEBUG igd_next::aio::tokio: timeout while receiving broadcast response
...

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

If I short-circuit search_gateway in rust-igd/src/aio/tokio.rs to make it return SearchError::NoResponseWithinTimeout immediately, tests pass. Will now debug rust-igd.

@divagant-martian
Copy link

@link2xt the commit you linked dariusc93/rust-igd@56934bb at a minimum makes igd-next not respect the timeouts we set for it, however if I replace all port-mapping code with a long sleep, things still work, which is very confusing.

Can you describe what the scenario and test does so that I can try to reproduce? Even with the replaced code, for example, a doctor accept and doctor connect pair still works just fine

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

Scenario I am testing is

def test_realtime_sequentially(acfactory, path_to_webxdc):
"""Test two peers trying to establish connection sequentially."""
ac1, ac2 = acfactory.get_online_accounts(2)
ac1.set_config("webxdc_realtime_enabled", "1")
ac2.set_config("webxdc_realtime_enabled", "1")
ac1.create_chat(ac2)
ac2.create_chat(ac1)
# share a webxdc app between ac1 and ac2
ac1_webxdc_msg = acfactory.send_message(from_account=ac1, to_account=ac2, text="play", file=path_to_webxdc)
ac2_webxdc_msg = ac2.get_message_by_id(ac2.wait_for_incoming_msg_event().msg_id)
snapshot = ac2_webxdc_msg.get_snapshot()
assert snapshot.text == "play"
# send iroh announcements sequentially
log("sending ac1 -> ac2 realtime advertisement and additional message")
ac1_webxdc_msg.send_webxdc_realtime_advertisement()
acfactory.send_message(from_account=ac1, to_account=ac2, text="ping1")
log("waiting for incoming message on ac2")
snapshot = ac2.get_message_by_id(ac2.wait_for_incoming_msg_event().msg_id).get_snapshot()
assert snapshot.text == "ping1"
log("sending ac2 -> ac1 realtime advertisement and additional message")
ac2_webxdc_msg.send_webxdc_realtime_advertisement()
acfactory.send_message(from_account=ac2, to_account=ac1, text="ping2")
log("waiting for incoming message on ac1")
snapshot = ac1.get_message_by_id(ac1.wait_for_incoming_msg_event().msg_id).get_snapshot()
assert snapshot.text == "ping2"
log("sending realtime data ac1 -> ac2")
# Test that 128 KB of data can be sent in a single message.
data = os.urandom(128000)
ac1_webxdc_msg.send_webxdc_realtime_data(data)
log("ac2: waiting for realtime data")
while 1:
event = ac2.wait_for_event()
if event.kind == EventType.WEBXDC_REALTIME_DATA:
assert event.data == list(data)
break

It is two iroh nodes joining a single gossip topic and then sending a message.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

however if I replace all port-mapping code with a long sleep, things still work, which is very confusing

Maybe try to replace search_gateway with infinite loop of sleeps.

I don't get why search_gateway is started 3 times, but returns only 2 times (after 5s timeout waiting for response). Seems third search gets cancelled? Why there are 3 searches started if I only have two nodes is another question.

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

@divagant-martian I managed to reproduce it with just doctor accept.
Created this config file iroh.toml:

[[relay_nodes]]
url = "https://iroh.testrun.org.:4443"
stun_only = false
stun_port = 3478

(STUN port copy-pasted from https://www.iroh.computer/docs/reference/config, only changed the URL)

Then run inside iroh-cli in iroh checkout: cargo run -- --config ./iroh.toml doctor accept.
On a network with UPNP:

$ cargo run -- --config ./iroh.toml doctor accept
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.28s
     Running `/home/user/src/iroh/target/debug/iroh --config ./iroh.toml doctor accept`
Connect to this node using one of the following commands:

	Using the relay url and direct connections:
...

On a network without UPNP:

$ cargo run -- --config ./iroh.toml doctor accept
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.29s
     Running `/home/user/src/iroh/target/debug/iroh --config ./iroh.toml doctor accept`
Error: wait for relay connection

Caused by:
    deadline has elapsed

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

I have created an issue in iroh repo: n0-computer/iroh#2876

@link2xt
Copy link
Collaborator Author

link2xt commented Oct 31, 2024

Let's close this for now and upgrade to a version that has the problem fixed once n0-computer/iroh#2876 is closed.

@link2xt link2xt closed this Oct 31, 2024
@link2xt link2xt deleted the link2xt/iroh-27 branch October 31, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants