return all dial errors if dial has failed #115

reinerRubin · 2019-03-26T18:52:28Z

Have tried to solve this issue [1] ("Print dial status messages to stderr"). I have not changed "Info" level because it seems that dial errors happen all the time (see comments around). Instead dial collects dial errors and if all attempts fail returns combined error. Printing them would be client concern.

But, idk, about context. Maybe in case of total dial fail it would be better just printing collected errors to stderr like it was proposed in the ticket.

Also the test seems useless, but at least it allows play around with the problem.

1 - ipfs/kubo#4355

swarm_dial.go

Stebalien · 2019-03-27T10:25:37Z

swarm_dial.go

-//  		continue
-//  	}
-//  	dialbackoff.Clear(p)
+//	if ok, wait := dialsync.Lock(p); !ok {


My bad. Editor has strict extra spaces policy. Will also fix.

reinerRubin · 2019-03-27T18:12:22Z

Ok, moved to "hashicorp/go-multierror" and also removed the trashy diff.

reinerRubin · 2019-03-27T22:48:35Z

Can I make a test that calls directly "func (s *Swarm) dialAddrs" in the main swarm package? Because some code paths are not very testable.

Is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

anacrolix · 2019-03-27T23:58:48Z

What value is there in collecting more than one error? Are they actually actionable? I believe the idea of multi-errors is a bit of an anti-pattern. If you wish to perform some action when a specific dial fails, you should be performing the dialing routine yourself so you have more information and control at hand, or expose more appropriate tracing/logging inside the existing dialing routine for observability (noting that you don't actually act on this immediately, apart from manually noting the reason for failures, and maybe correcting your network situation OOB).

Reading ipfs/kubo#4355 I think better logging is what's called for.

Stebalien · 2019-03-28T02:29:17Z

Logs won't help much here. We need to expose these dial errors to the user.

Really, the ideal way to do this is to feed some kind of dial status channel through with a context so the swarm can return information like "dialing peer X on address Y" and "failed to dial address Y: error". However, returning all the relevant errors instead of just one of them is much better than the current code. At the moment, the user might be told something useless like "no route to host" when that's just one error among many other more relevant errors (e.g., "expected peer Y, found peer X").

raulk · 2019-03-28T09:06:37Z

That’s similar to the idea proposed here: libp2p/go-libp2p-kad-dht#300. Also ties in with an eventual eventbus at the host level.

I agree multierror is appropriate now.

reinerRubin · 2019-03-28T17:27:50Z

Ok, so what is about the testing policy? Can I add direct test to the package method (dialAddrs). Because idk how to cover diff with only public methods.

And is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

Stebalien · 2019-04-04T09:13:43Z

Ok, so what is about the testing policy? Can I add direct test to the package method (dialAddrs). Because idk how to cover diff with only public methods.

You should be able to add two addresses for an undailable peer using the peerstore (swarm.Peerstore().AddAddrs(...)). You can then dial the peer normally and check the error.

And is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

Yes. The package.json file is deprecated.

reinerRubin · 2019-04-04T09:20:26Z

I have added silent addresses and the tests are fine. But I do not know how to cover "context.Cancel()" and empty addresses cases only with public methods. It must be done to meet the "diff cover" requirement (codecov/patch — 60% of diff hit (target 77.43%)).

Stebalien · 2019-04-04T09:39:00Z

dial_test.go

+		t.Fatal(err)
+	}
+
+	t.Logf("correctly get a combined error: %s", err)


This should test that we're getting the right errors.

I thought about it. But was not sure about check. The only way I see is to check err.Error() content itself. Is it ok?

Sounds good. Unfortunately, that's usually how error checking works in go. Ideally, there'd be two different errors to test for.

I have added the additional checks for Error() content.

Stebalien

getting code coverage to pass isn't really that important

License: MIT Signed-off-by: Georgij Tolstov <[email protected]>

Stebalien · 2019-04-05T16:10:53Z

Thanks!

Long-running queries can build up large error sets that we never actually use. This is exacerbated by libp2p/go-libp2p-swarm#115. fixes libp2p/go-libp2p-swarm#119

Stebalien reviewed Mar 27, 2019

View reviewed changes

reinerRubin force-pushed the bug/4355-dial-errors branch from e499cfc to 60d63a7 Compare March 27, 2019 18:09

reinerRubin force-pushed the bug/4355-dial-errors branch 2 times, most recently from 9381b22 to a5efcbd Compare March 27, 2019 22:04

Stebalien reviewed Apr 4, 2019

View reviewed changes

return all dial errors if dial has failed

3719137

License: MIT Signed-off-by: Georgij Tolstov <[email protected]>

reinerRubin force-pushed the bug/4355-dial-errors branch from a5efcbd to 3719137 Compare April 5, 2019 15:18

Stebalien requested a review from raulk April 5, 2019 16:10

Stebalien approved these changes Apr 5, 2019

View reviewed changes

Stebalien merged commit 7269da4 into libp2p:master Apr 5, 2019

vyzo mentioned this pull request Apr 21, 2019

Leaking dial errors #119

Closed

Stebalien mentioned this pull request Apr 24, 2019

query: fix error "leak" libp2p/go-libp2p-kad-dht#328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return all dial errors if dial has failed #115

return all dial errors if dial has failed #115

reinerRubin commented Mar 26, 2019

Stebalien Mar 27, 2019

reinerRubin Mar 27, 2019

reinerRubin commented Mar 27, 2019

reinerRubin commented Mar 27, 2019

anacrolix commented Mar 27, 2019

Stebalien commented Mar 28, 2019

raulk commented Mar 28, 2019

reinerRubin commented Mar 28, 2019 •

edited

Loading

Stebalien commented Apr 4, 2019

reinerRubin commented Apr 4, 2019

Stebalien Apr 4, 2019

reinerRubin Apr 4, 2019

Stebalien Apr 4, 2019

reinerRubin Apr 5, 2019

Stebalien left a comment

Stebalien commented Apr 5, 2019

return all dial errors if dial has failed #115

return all dial errors if dial has failed #115

Conversation

reinerRubin commented Mar 26, 2019

Stebalien Mar 27, 2019

Choose a reason for hiding this comment

reinerRubin Mar 27, 2019

Choose a reason for hiding this comment

reinerRubin commented Mar 27, 2019

reinerRubin commented Mar 27, 2019

anacrolix commented Mar 27, 2019

Stebalien commented Mar 28, 2019

raulk commented Mar 28, 2019

reinerRubin commented Mar 28, 2019 • edited Loading

Stebalien commented Apr 4, 2019

reinerRubin commented Apr 4, 2019

Stebalien Apr 4, 2019

Choose a reason for hiding this comment

reinerRubin Apr 4, 2019

Choose a reason for hiding this comment

Stebalien Apr 4, 2019

Choose a reason for hiding this comment

reinerRubin Apr 5, 2019

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien commented Apr 5, 2019

reinerRubin commented Mar 28, 2019 •

edited

Loading