Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing ServerInUseHostPort test on MacOS #2477

Merged
merged 5 commits into from
Sep 18, 2020

Conversation

albertteoh
Copy link
Contributor

Signed-off-by: albertteoh [email protected]

Which problem is this PR solving?

  • Master branch unit tests are failing on TestServerInUseHostPort, seemingly only for MacOS (okay for Linux builds).
  • IIUC, the reason for the test is to have test coverage of error handling code when listening on GRPC and HTTP ports. To achieve this, the test first listens on those same ports, then attempts to bind to those same ports again via the QueryService Start(); and expecting an error to be returned.
  • When running this test, no error is returned when binding to 127.0.0.1:8080 while :8080 is bound, causing the assertion to fail. Interestingly, this behaviour differs between MacOS and Linux (at least on my Centos 8 VM).
  • From this, it appears that the Linux OS libraries are stricter than MacOS; the former considering the 0.0.0.0 address to be bindable to the loopback network interface.
  • Confirmed with the following simple program which I ran on both OSes:
package main

import (
     "fmt"
     "net"
 )

 const (
     hostname0 = ":8080"
     hostname1 = "127.0.0.1:8080"
 )

 func listen(hostname string) {
     fmt.Println("Listening to", hostname)
     if _, err := net.Listen("tcp", hostname); err != nil {
         fmt.Println(err.Error())
     }
 }

 func main() {
     listen(hostname0)
     listen(hostname1)
 }

...and here are the results:

MacOS

$ go run main.go
Listening to :8080
Listening to 127.0.0.1:8080

Centos 8

$ go run main.go
Listening to :8080
Listening to 127.0.0.1:8080
listen tcp 127.0.0.1:8080: bind: address already in use

Short description of the changes

  • The fix involves explicitly providing the host IP (loopback in this case) in the host port string.
  • Add a stronger assertion on the error.
  • Have tried adding a MacOS job in Travis checks to prevent this from happening in future (there've been 2 MacOS specific unit test breakages in the last few weeks already), however, had some difficulties getting the Sarama unit tests to pass, and not sure why they're failing.

@albertteoh albertteoh requested a review from a team as a code owner September 14, 2020 12:28
@albertteoh albertteoh requested a review from vprithvi September 14, 2020 12:28
@codecov
Copy link

codecov bot commented Sep 14, 2020

Codecov Report

Merging #2477 into master will decrease coverage by 0.34%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2477      +/-   ##
==========================================
- Coverage   95.52%   95.18%   -0.35%     
==========================================
  Files         208      208              
  Lines       10759     9158    -1601     
==========================================
- Hits        10278     8717    -1561     
+ Misses        405      365      -40     
  Partials       76       76              
Impacted Files Coverage Δ
cmd/query/app/server.go 88.52% <100.00%> (-0.63%) ⬇️
storage/spanstore/token_propagation.go 60.00% <0.00%> (-11.43%) ⬇️
cmd/agent/app/servers/thriftudp/socket_buffer.go 50.00% <0.00%> (-10.00%) ⬇️
plugin/storage/integration/trace_compare.go 32.35% <0.00%> (-4.24%) ⬇️
plugin/storage/integration/integration.go 77.34% <0.00%> (-4.05%) ⬇️
cmd/agent/app/reporter/flags.go 75.00% <0.00%> (-3.58%) ⬇️
plugin/storage/grpc/shared/grpc_server.go 72.97% <0.00%> (-3.50%) ⬇️
cmd/agent/app/agent.go 84.61% <0.00%> (-2.89%) ⬇️
cmd/collector/app/server/grpc.go 62.50% <0.00%> (-2.89%) ⬇️
cmd/collector/app/handler/grpc_handler.go 86.66% <0.00%> (-2.23%) ⬇️
... and 197 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b9598d...51eb5df. Read the comment docs.

@jpkrohling
Copy link
Contributor

cc @rjs211

@@ -26,6 +27,7 @@ import (
"github.com/stretchr/testify/mock"
"go.uber.org/zap"
"go.uber.org/zap/zaptest/observer"
"golang.org/x/sys/unix"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it wise to introduce OS dependency without a build tag? How critical is it that we check for specific OS error code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question, it's not critical to check the specific OS error code. However, given we are trying to trigger an error that stems from a fairly deep call stack and originating from outside this project, I was also conscious there could be a potentially larger pool of error types returned and so wanted to be as specific as possible to avoid false positives.

Perhaps we can continue to perform type assertions (and assert their success) on the error interfaces until we reach the most specific error, but remove the check on the specific error requiring the unix dependency; so something like the following instead?:

err = server.Start()
assert.Error(t, err)
_, ok := err.(*net.OpError).Err.(*os.SyscallError)
assert.True(t, ok)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it just makes the test dependent on some implementation details of the OS, which can change over time, or Go may change its implementation.

Stepping back to what the test is even about - why would we care about port reuse? I think the objective of the test was to check the code path when listening to a port causes an error. It doesn't matter what the error is, just that our code handles it. So the test may do whatever it needs to introduce the error condition. For example, is there any difference in our code behavior (different path) if the port is already take or is completely invalid?

@@ -74,7 +76,7 @@ func TestServerBadHostPort(t *testing.T) {

func TestServerInUseHostPort(t *testing.T) {

for _, hostPort := range [2]string{":8080", ":8081"} {
for _, hostPort := range [2]string{"127.0.0.1:8080", "127.0.0.1:8081"} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cleaner way to write tests is to listen on port 0 and let the OS assign the unused port. Can we do it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the goal of this test is to intentionally have two listeners listening on the same host:port to cause a "Bind address already in use" error, and so there's a need for the port numbers to be deterministic rather than dynamically allocated by the OS.

Please let me know if I've misunderstood your question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the test carefully, but from it's name it's trying to test for port conflict, which does not require listening on a specific port, it just requires trying to bind twice to the same port. The first time it can bind to a random port and use it for the second attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat! I didn't think it was possible to retrieve the OS allocated port. Thanks!

@albertteoh albertteoh force-pushed the fix-hostport-in-use-test branch from 8669b26 to 7aa9daa Compare September 17, 2020 13:58
@albertteoh
Copy link
Contributor Author

@yurishkuro are there any outstanding concerns that need addressing in this PR?

@albertteoh albertteoh force-pushed the fix-hostport-in-use-test branch from 7aa9daa to b1d0e13 Compare September 17, 2020 22:28
httpHostPort string
grpcHostPort string
}{
{"HTTP host port clash on " + conn.Addr().String(), conn.Addr().String(), availableHostPort},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"HTTP host port clash on " + conn.Addr().String() - I think it's better if the test names are stable, this one will include random port number on every run, which is not informative anyway

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, aside from minor formatting

@albertteoh albertteoh force-pushed the fix-hostport-in-use-test branch from ff5e0fe to 32e5ab6 Compare September 18, 2020 05:01
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: albertteoh <[email protected]>
@albertteoh albertteoh force-pushed the fix-hostport-in-use-test branch from 4af8db0 to 62a4971 Compare September 18, 2020 05:25
Signed-off-by: albertteoh <[email protected]>
@yurishkuro yurishkuro merged commit 899e93a into jaegertracing:master Sep 18, 2020
@albertteoh albertteoh deleted the fix-hostport-in-use-test branch September 18, 2020 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants