-
Notifications
You must be signed in to change notification settings - Fork 37
return all dial errors if dial has failed #115
Conversation
swarm_dial.go
Outdated
// continue | ||
// } | ||
// dialbackoff.Clear(p) | ||
// if ok, wait := dialsync.Lock(p); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Editor has strict extra spaces policy. Will also fix.
e499cfc
to
60d63a7
Compare
Ok, moved to "hashicorp/go-multierror" and also removed the trashy diff. |
9381b22
to
a5efcbd
Compare
Can I make a test that calls directly "func (s *Swarm) dialAddrs" in the main swarm package? Because some code paths are not very testable. Is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")? |
What value is there in collecting more than one error? Are they actually actionable? I believe the idea of multi-errors is a bit of an anti-pattern. If you wish to perform some action when a specific dial fails, you should be performing the dialing routine yourself so you have more information and control at hand, or expose more appropriate tracing/logging inside the existing dialing routine for observability (noting that you don't actually act on this immediately, apart from manually noting the reason for failures, and maybe correcting your network situation OOB). Reading ipfs/kubo#4355 I think better logging is what's called for. |
Logs won't help much here. We need to expose these dial errors to the user. Really, the ideal way to do this is to feed some kind of dial status channel through with a context so the swarm can return information like "dialing peer X on address Y" and "failed to dial address Y: error". However, returning all the relevant errors instead of just one of them is much better than the current code. At the moment, the user might be told something useless like "no route to host" when that's just one error among many other more relevant errors (e.g., "expected peer Y, found peer X"). |
That’s similar to the idea proposed here: libp2p/go-libp2p-kad-dht#300. Also ties in with an eventual eventbus at the host level. I agree multierror is appropriate now. |
Ok, so what is about the testing policy? Can I add direct test to the package method (dialAddrs). Because idk how to cover diff with only public methods. And is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")? |
You should be able to add two addresses for an undailable peer using the peerstore (
Yes. The package.json file is deprecated. |
I have added silent addresses and the tests are fine. But I do not know how to cover "context.Cancel()" and empty addresses cases only with public methods. It must be done to meet the "diff cover" requirement (codecov/patch — 60% of diff hit (target 77.43%)). |
dial_test.go
Outdated
t.Fatal(err) | ||
} | ||
|
||
t.Logf("correctly get a combined error: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should test that we're getting the right errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it. But was not sure about check. The only way I see is to check err.Error() content itself. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Unfortunately, that's usually how error checking works in go. Ideally, there'd be two different errors to test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the additional checks for Error() content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting code coverage to pass isn't really that important
License: MIT Signed-off-by: Georgij Tolstov <[email protected]>
a5efcbd
to
3719137
Compare
Thanks! |
Long-running queries can build up large error sets that we never actually use. This is exacerbated by libp2p/go-libp2p-swarm#115. fixes libp2p/go-libp2p-swarm#119
Long-running queries can build up large error sets that we never actually use. This is exacerbated by libp2p/go-libp2p-swarm#115. fixes libp2p/go-libp2p-swarm#119
Long-running queries can build up large error sets that we never actually use. This is exacerbated by libp2p/go-libp2p-swarm#115. fixes libp2p/go-libp2p-swarm#119
Have tried to solve this issue [1] ("Print dial status messages to stderr"). I have not changed "Info" level because it seems that dial errors happen all the time (see comments around). Instead dial collects dial errors and if all attempts fail returns combined error. Printing them would be client concern.
But, idk, about context. Maybe in case of total dial fail it would be better just printing collected errors to stderr like it was proposed in the ticket.
Also the test seems useless, but at least it allows play around with the problem.
1 - ipfs/kubo#4355