Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel CI: tests tagged as "block-network" fail on macOS #10305

Closed
laszlocsomor opened this issue Nov 26, 2019 · 13 comments
Closed

Bazel CI: tests tagged as "block-network" fail on macOS #10305

laszlocsomor opened this issue Nov 26, 2019 · 13 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team untriaged

Comments

@laszlocsomor
Copy link
Contributor

laszlocsomor commented Nov 26, 2019

Description of the problem / feature request:

Bazel's own tests that are tagged as "block-network" now fail on macOS when testing with Bazel built from HEAD, but work when testing with Bazel 1.2.1.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Edit //src/test/shell/bazel/workspace_resolved_test.sh, and remove every test method. Add a test method that just runs bazel info, then test. Witness this error in the log:

Server crashed during startup. Now printing /private/var/tmp/_bazel_buildkite/0f6cf1b347f184bdf59fc3dbd70d52bf/sandbox/darwin-sandbox/4257/execroot/io_bazel/_tmp/81731060611f1135fb37d82b4c16c3ce/root/0776a7d52f698afa6331a0f743d059a6/server/jvm.out
I/O Error: Failed to bind
-- Test log: -----------------------------------------------------------
------------------------------------------------------------------------

Update: culprit is 8e7c349

What operating system are you running Bazel on?

macOS Catalina (10.15.1)

What's the output of bazel info release?

Using Bazel built at 5be24a8

@irengrig irengrig added P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team untriaged labels Nov 26, 2019
@philwo philwo assigned jmmv and unassigned philwo Nov 26, 2019
@philwo
Copy link
Member

philwo commented Nov 26, 2019

@jmmv Possibly related to your recent improvements in this area?

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

Maybe... but why wouldn't these have been caught by presubmit? The change to fix network sandboxing was submitted cleanly.

@philwo
Copy link
Member

philwo commented Nov 26, 2019

but why wouldn't these have been caught by presubmit?

Because the "outer Bazel" that actually runs the tests marked as "block-network" and applies the network blocking logic is the latest released Bazel and not the one that is built from the change you're testing. 😀

@philwo
Copy link
Member

philwo commented Nov 26, 2019

What is failing now is apparently "running Bazel inside a sandbox that blocks network access". I don't think that our bazel_sandboxing_test tests that this works.

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

I see... so I'm assuming it's legitimate for these tests to be marked "block network", but the network sandboxing is not working properly now, right? Like it may not allow local networking, or whatever Bazel needs. Let me try something quickly.

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

Confirmed that running Bazel inside a new Bazel with my "fixes" doesn't work. Trying to figure out why.

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

The crash happens here:

java.io.IOException: Failed to bind
        at io.grpc.netty.NettyServer.start(NettyServer.java:251)
        at io.grpc.internal.ServerImpl.start(ServerImpl.java:177)
        at io.grpc.internal.ServerImpl.start(ServerImpl.java:85)
        at com.google.devtools.build.lib.server.GrpcServerImpl.serve(GrpcServerImpl.java:441)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:1124)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:826)
        at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:75)
Caused by: java.net.SocketException: Operation not permitted
        at java.base/sun.nio.ch.Net.bind0(Native Method)
        at java.base/sun.nio.ch.Net.bind(Unknown Source)
        at java.base/sun.nio.ch.Net.bind(Unknown Source)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:563)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1332)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:488)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:473)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:984)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:259)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Unknown Source)

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

wat. And this brings us to:

commit deee73f24883df51a65248c9409f51531739c498
Author: Philipp Wollermann <[email protected]>
Date:   Wed Aug 3 13:13:15 2016 +0000

    Prefer IPv6 when binding the gRPC server to localhost, because the OS X sandbox will not allow us to bind to the IPv4 one. Automatically falls back to IPv4 if binding to [::1] fails.
    
    --
    MOS_MIGRATED_REVID=129206917

... the plot thickens.

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

I see references to (allow network-bind) in /usr/share/sandbox/* but this does not seems sufficient. However, doing (allow network-inboud) does work. The fix is trivial, but let me see if I can easily add a test as well.

@jmmv
Copy link
Contributor

jmmv commented Nov 26, 2019

Alright, sending out a fix for review with something a bit less loose than what I mentioned earlier.

@laszlocsomor
Copy link
Contributor Author

Thanks for investigating.

For the record, I bisected this issue without having seen your replies and can confirm the culprit is 8e7c349.

@ngeor
Copy link

ngeor commented Feb 15, 2022

Hi folks,

I seem to have stumbled upon this issue with Bazel 5.0 and macOS Monterey 12.2.1

Only on Mac, unit tests that try to listen on localhost are failing with "block-network".

They pass fine on CI (Linux).

@philwo
Copy link
Member

philwo commented Feb 15, 2022

@ngeor Could you please file a new issue and add log output etc.? This makes it easier for us to prioritize and handle it than a comment on an old and already closed issue. Feel free to reference this one as a prior occurrence, of course! Thank you 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team untriaged
Projects
None yet
Development

No branches or pull requests

5 participants