Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use timeout context for NewStream call #994

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

ligustah
Copy link
Contributor

Hey,

I'm running an IPFS node with the Accelerated DHT Client, but I'm seeing issues with the reprovider system showing reprovider system not ready. I've been debugging the fullrt code a little bit and it seems it's because the initial crawl never completes, because this loop never exits. Specifically, it seems that some of the worker goroutines get stuck trying to query some quic peers. See the below goroutine stacks to follow along.

crawler stuck in queryPeer, trying to open a stream to a peer

goroutine 579 [select, 10 minutes]:
github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).NewStream(0x14001660480, {0x10676c6d8, 0x1400dbe6ff0}, {0x1401648def0, 0x26}, {0x1400200a8a0, 0x1, 0x1})
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:737 +0xb34
github.com/libp2p/go-libp2p/p2p/host/routed.(*RoutedHost).NewStream(0x14000a2b860, {0x10676c6d8, 0x1400dbe6ff0}, {0x1401648def0, 0x26}, {0x1400200a8a0, 0x1, 0x1})
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/routed/routed.go:214 +0x144
github.com/libp2p/go-libp2p-kad-dht/crawler.(*messageSender).SendRequest(0x1400233d590, {0x10676c6d8, 0x1400dbe6ff0}, {0x1401648def0, 0x26}, 0x1401d97c800)
	/go/pkg/mod/github.com/libp2p/[email protected]/crawler/crawler.go:78 +0x9c
github.com/libp2p/go-libp2p-kad-dht/pb.(*ProtocolMessenger).GetClosestPeers(0x1400200a8b0, {0x10676c6d8, 0x1400dbe6ff0}, {0x1401648def0, 0x26}, {0x1401729bad0, 0x22})
	/go/pkg/mod/github.com/libp2p/[email protected]/pb/protocol_messenger.go:164 +0x514
github.com/libp2p/go-libp2p-kad-dht/crawler.(*DefaultCrawler).queryPeer(0x1400233d5c0, {0x10676c710, 0x14001ac6320}, {0x1401648def0, 0x26})
	/go/pkg/mod/github.com/libp2p/[email protected]/crawler/crawler.go:248 +0x6f4
github.com/libp2p/go-libp2p-kad-dht/crawler.(*DefaultCrawler).Run.func1()
	/go/pkg/mod/github.com/libp2p/[email protected]/crawler/crawler.go:148 +0x124
created by github.com/libp2p/go-libp2p-kad-dht/crawler.(*DefaultCrawler).Run in goroutine 302
	/go/pkg/mod/github.com/libp2p/[email protected]/crawler/crawler.go:145 +0x200

BasicHost stuck trying to negotiate a protocol?

goroutine 295264 [chan receive, 14 minutes]:
github.com/quic-go/quic-go.(*receiveStream).readImpl(0x14016495500, {0x14002800595, 0x1, 0x1})
	/go/pkg/mod/github.com/quic-go/[email protected]/receive_stream.go:180 +0xb50
github.com/quic-go/quic-go.(*receiveStream).Read(0x14016495500, {0x14002800595, 0x1, 0x1})
	/go/pkg/mod/github.com/quic-go/[email protected]/receive_stream.go:94 +0xe4
github.com/libp2p/go-libp2p/p2p/transport/quic.(*stream).Read(0x14017093400, {0x14002800595, 0x1, 0x1})
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/transport/quic/stream.go:22 +0x60
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Stream).Read(0x1401770c000, {0x14002800595, 0x1, 0x1})
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/net/swarm/swarm_stream.go:58 +0x60
github.com/multiformats/go-multistream.(*byteReader).ReadByte(0x140060e0080)
	/go/pkg/mod/github.com/multiformats/[email protected]/multistream.go:322 +0x84
github.com/multiformats/go-varint.ReadUvarint({0x10674f1e0, 0x140060e0080})
	/go/pkg/mod/github.com/multiformats/[email protected]/varint.go:80 +0x58
github.com/multiformats/go-multistream.lpReadBuf({0x1301c4f78, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/multistream.go:286 +0x130
github.com/multiformats/go-multistream.ReadNextTokenBytes({0x1301c4f78, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/multistream.go:269 +0x48
github.com/multiformats/go-multistream.ReadNextToken[...]({0x1301c4f78, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/multistream.go:258 +0x48
github.com/multiformats/go-multistream.readMultistreamHeader({0x1301c4f78, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/client.go:118 +0x44
github.com/multiformats/go-multistream.SelectProtoOrFail[...]({0x105597f50, 0xf}, {0x1301c4f48, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/client.go:56 +0x250
github.com/multiformats/go-multistream.SelectOneOf[...]({0x1400200a8a0, 0x1, 0x1}, {0x1301c4f48, 0x1401770c000})
	/go/pkg/mod/github.com/multiformats/[email protected]/client.go:89 +0x184
github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).NewStream.func2()
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:734 +0xc4
created by github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).NewStream in goroutine 622
	/go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:733 +0xab8

I feel like this has only been happening recently, so maybe some default timeout somewhere in libp2p was removed. Looking through the code, it seems that no timeout is set on the call to NewStream, which already involves network communication and will not always time out on its own. The timeout is only set later in the function when reading from the stream.

Most NewStream calls complete in less than 1ms, so it seems sensible to just include the call in the same timeout context. I confirmed locally that this makes the initial crawl complete successfully.

@ligustah ligustah requested review from guillaumemichel and a team as code owners October 20, 2024 17:09
@gammazero gammazero added the need/analysis Needs further analysis before proceeding label Oct 22, 2024
@lidel lidel self-assigned this Oct 22, 2024
@gammazero gammazero removed the need/analysis Needs further analysis before proceeding label Oct 22, 2024
@lidel lidel removed their assignment Oct 22, 2024
@lidel
Copy link
Member

lidel commented Oct 22, 2024

cc @sukunrt @MarcoPolo for visibility, perhaps this class of quic issue was reported before?

@MarcoPolo
Copy link
Contributor

I added an accompanying change to scope the query context down as well: #996

Copy link
Contributor

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @ligustah

@guillaumemichel guillaumemichel merged commit 50b4e99 into libp2p:master Oct 23, 2024
9 checks passed
lidel added a commit to ipfs/kubo that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants