Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

return all dial errors if dial has failed #115

Merged
merged 1 commit into from
Apr 5, 2019

Conversation

reinerRubin
Copy link
Contributor

Have tried to solve this issue [1] ("Print dial status messages to stderr"). I have not changed "Info" level because it seems that dial errors happen all the time (see comments around). Instead dial collects dial errors and if all attempts fail returns combined error. Printing them would be client concern.

But, idk, about context. Maybe in case of total dial fail it would be better just printing collected errors to stderr like it was proposed in the ticket.

Also the test seems useless, but at least it allows play around with the problem.

1 - ipfs/kubo#4355

swarm_dial.go Outdated Show resolved Hide resolved
swarm_dial.go Outdated
// continue
// }
// dialbackoff.Clear(p)
// if ok, wait := dialsync.Lock(p); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Editor has strict extra spaces policy. Will also fix.

@reinerRubin
Copy link
Contributor Author

Ok, moved to "hashicorp/go-multierror" and also removed the trashy diff.

@reinerRubin reinerRubin force-pushed the bug/4355-dial-errors branch 2 times, most recently from 9381b22 to a5efcbd Compare March 27, 2019 22:04
@reinerRubin
Copy link
Contributor Author

Can I make a test that calls directly "func (s *Swarm) dialAddrs" in the main swarm package? Because some code paths are not very testable.

Is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

@anacrolix
Copy link
Contributor

What value is there in collecting more than one error? Are they actually actionable? I believe the idea of multi-errors is a bit of an anti-pattern. If you wish to perform some action when a specific dial fails, you should be performing the dialing routine yourself so you have more information and control at hand, or expose more appropriate tracing/logging inside the existing dialing routine for observability (noting that you don't actually act on this immediately, apart from manually noting the reason for failures, and maybe correcting your network situation OOB).

Reading ipfs/kubo#4355 I think better logging is what's called for.

@Stebalien
Copy link
Member

Logs won't help much here. We need to expose these dial errors to the user.

Really, the ideal way to do this is to feed some kind of dial status channel through with a context so the swarm can return information like "dialing peer X on address Y" and "failed to dial address Y: error". However, returning all the relevant errors instead of just one of them is much better than the current code. At the moment, the user might be told something useless like "no route to host" when that's just one error among many other more relevant errors (e.g., "expected peer Y, found peer X").

@raulk
Copy link
Member

raulk commented Mar 28, 2019

That’s similar to the idea proposed here: libp2p/go-libp2p-kad-dht#300. Also ties in with an eventual eventbus at the host level.

I agree multierror is appropriate now.

@reinerRubin
Copy link
Contributor Author

reinerRubin commented Mar 28, 2019

Ok, so what is about the testing policy? Can I add direct test to the package method (dialAddrs). Because idk how to cover diff with only public methods.

And is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

@Stebalien
Copy link
Member

Ok, so what is about the testing policy? Can I add direct test to the package method (dialAddrs). Because idk how to cover diff with only public methods.

You should be able to add two addresses for an undailable peer using the peerstore (swarm.Peerstore().AddAddrs(...)). You can then dial the peer normally and check the error.

And is it ok that go.mod and gx (package.json) links to different repos ("github.com/hashicorp/go-multierror" and "gxed")?

Yes. The package.json file is deprecated.

@reinerRubin
Copy link
Contributor Author

I have added silent addresses and the tests are fine. But I do not know how to cover "context.Cancel()" and empty addresses cases only with public methods. It must be done to meet the "diff cover" requirement (codecov/patch — 60% of diff hit (target 77.43%)).

dial_test.go Outdated
t.Fatal(err)
}

t.Logf("correctly get a combined error: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should test that we're getting the right errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it. But was not sure about check. The only way I see is to check err.Error() content itself. Is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Unfortunately, that's usually how error checking works in go. Ideally, there'd be two different errors to test for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the additional checks for Error() content.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting code coverage to pass isn't really that important

License: MIT
Signed-off-by: Georgij Tolstov <[email protected]>
@Stebalien Stebalien merged commit 7269da4 into libp2p:master Apr 5, 2019
@Stebalien
Copy link
Member

Thanks!

@vyzo vyzo mentioned this pull request Apr 21, 2019
Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this pull request Apr 24, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this pull request Apr 24, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
aarshkshah1992 pushed a commit to aarshkshah1992/go-libp2p-kad-dht that referenced this pull request Aug 11, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants