Use timeout context for NewStream call #994
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey,
I'm running an IPFS node with the Accelerated DHT Client, but I'm seeing issues with the reprovider system showing
reprovider system not ready
. I've been debugging the fullrt code a little bit and it seems it's because the initial crawl never completes, because this loop never exits. Specifically, it seems that some of the worker goroutines get stuck trying to query some quic peers. See the below goroutine stacks to follow along.crawler stuck in queryPeer, trying to open a stream to a peer
BasicHost stuck trying to negotiate a protocol?
I feel like this has only been happening recently, so maybe some default timeout somewhere in libp2p was removed. Looking through the code, it seems that no timeout is set on the call to
NewStream
, which already involves network communication and will not always time out on its own. The timeout is only set later in the function when reading from the stream.Most
NewStream
calls complete in less than 1ms, so it seems sensible to just include the call in the same timeout context. I confirmed locally that this makes the initial crawl complete successfully.