Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.0rc1: android_ndk_repository is fetched eagerly #92

Open
brentleyjones opened this issue Nov 11, 2021 · 27 comments
Open

5.0rc1: android_ndk_repository is fetched eagerly #92

brentleyjones opened this issue Nov 11, 2021 · 27 comments
Assignees
Labels

Comments

@brentleyjones
Copy link

Description of the problem / feature request:

In bazel 5.0rc1 android_ndk_repository is fetched even if nothing depends on it. In 4.2.1, this wasn't the case.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Add android_ndk_repository(name = "androidndk", api_level = 29) to WORKSPACE, don't have the ANDROID_NDK_HOME environment variable set. Build anything, and get this error:

ERROR: /Users/brentleyjones/Developer/Lyft/eager_repo_fetching_bug/WORKSPACE:19:23: fetching android_ndk_repository rule //external:androidndk: Either the path attribute of android_ndk_repository or the ANDROID_NDK_HOME environment variable must be set.
INFO: Repository remote_coverage_tools instantiated at:
  /DEFAULT.WORKSPACE.SUFFIX:3:13: in <toplevel>
Repository rule http_archive defined at:
  /private/var/tmp/_bazel_brentleyjones/256d770c92848b1788719fcd229cde09/external/bazel_tools/tools/build_defs/repo/http.bzl:364:31: in <toplevel>
ERROR: Analysis of target '//:test' failed; build aborted: Either the path attribute of android_ndk_repository or the ANDROID_NDK_HOME environment variable must be set.

In 4.2.1 this error wouldn't happen.

What operating system are you running Bazel on?

macOS 11.6.1

What's the output of bazel info release?

release 5.0.0rc1

Have you found anything relevant by searching the web?

No.

@brentleyjones
Copy link
Author

@Wyverald Wyverald self-assigned this Nov 11, 2021
@Wyverald
Copy link
Member

Do you have a minimum repro case? I tried the following:

$ echo 'android_ndk_repository(name = "androidndk", api_level = 29)' > WORKSPACE
$ echo 'filegroup(name="haha")' > BUILD
$ USE_BAZEL_VERSION=5.0.0rc1 bazelisk build :haha

And nothing failed. (I don't have the NDK installed or the env var set.)

@brentleyjones
Copy link
Author

@brentleyjones
Copy link
Author

$ ./repo.sh
ERROR: /Users/brentleyjones/Developer/Lyft/eager_repo_fetching_bug/WORKSPACE:19:23: fetching android_ndk_repository rule //external:androidndk: Either the path attribute of android_ndk_repository or the ANDROID_NDK_HOME environment variable must be set.
ERROR: Analysis of target '//:test' failed; build aborted: Either the path attribute of android_ndk_repository or the ANDROID_NDK_HOME environment variable must be set.
INFO: Elapsed time: 0.144s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 4 targets config\
ured)
    currently loading: @androidsdk//
    Fetching @remote_coverage_tools; fetching
    Fetching @local_config_cc_toolchains; fetching
    Fetching @local_jdk; fetching

@Wyverald
Copy link
Member

Thanks, it looks like it's somehow related to rules_swift. I'll take a look now

@Wyverald
Copy link
Member

Pretty sure this is due to bazelbuild/bazel@7398e33.

This commit made the android_ndk_repository repo rule register some toolchains, which it didn't in 4.2.1. Now because it registers toolchains defined within the repo, any rule that uses toolchains will cause the repo to be fetched, be it Android-related rules, Swift rules, or even Java rules for example. (That's also why my initial repro attempt using filegroup didn't work.)

@katre any thoughts? I think this is somewhat WAI, although it's weird that android_ndk_repository fails to fetch but android_sdk_repository doesn't.

@brentleyjones
Copy link
Author

And is there any way to conditionally run a repository rule? For example, with a --define/build setting/select(). I've needed that ability for some other rules.

@katre
Copy link
Member

katre commented Nov 16, 2021

Why do you have an android_ndk_repository in your WORKSPACE when you're not setting it up?

There are two parts to a toolchain: the actual underlying toolchain (in the case of the Android NDK, that's a cc_toolchain), and the declaration, which uses the toolchain target.

There is no direct dependency from the toolchain to the cc_toolchain, to prevent eager analysis, but in this case the toolchain and the cc_toolchain are in the same package (see https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/rules/android/android_ndk_cc_toolchain_template.txt?ss=bazel&q=android_ndk) and so the observed eager fetching behavior is present.

The easiest fix is to separate them into two different repositories: one that defines the toolchain targets (and which will be locally created by the android_ndk_repository rule, so it's okay that it is always loaded), and another that defines the cc_toolchain targets (and which will then only be configured when needed).

However, I don't think this will address your problem: the android_ndk_repository rule will still execute, and so it will still give an error that it's not valid. I am, frankly, surprised that this ever failed to give an error before.

@brentleyjones
Copy link
Author

Why do you have an android_ndk_repository in your WORKSPACE when you're not setting it up?

Because we have various different CI jobs, and they only setup what is needed for the job. Our iOS jobs don't setup the JDK or Android SDK, because it's not needed for that build. Local developers that only develop on the iOS stuff don't need to setup the JDK or Android SDK.

@katre
Copy link
Member

katre commented Nov 16, 2021

Bazel's CI system only adds the android_sdk_repository and android_ndk_repository to the WORKSPACE when the test environment supports it and those tests are run. Is this an approach that you could look into?

I'm very nervous about either conditional repository rules, or lazy loading of repository rules, although @Wyverald is the expert on that and can answer about whether it makes sense.

@brentleyjones
Copy link
Author

Bazel's CI system only adds the android_sdk_repository and android_ndk_repository to the WORKSPACE when the test environment supports it and those tests are run. Is this an approach that you could look into?

While that might address the CI issue, it's a huge pain for local developers then.

Another case for conditional/lazy loading repository rules: I want to only download SDKs for my execution platforms, which might change depending on if I'm using RBE.

@Wyverald
Copy link
Member

The easiest fix is to separate them into two different repositories

Like you said later, I don't think that would help -- IIUC, as soon as you have some toolchains in @androidndk registered, then any rule that uses any toolchain type will cause RegisteredToolchainsFunctions to run, which would in turn try to fetch @androidndk.

@katre any idea why android_sdk_repository always succeeds to fetch?

I'm very nervous about either conditional repository rules, or lazy loading of repository rules

I'm similarly nervous. There was a long discussion about conditional external dependencies in the Bzlmod design doc, and the rabbit hole goes a mile down. In today's WORKSPACE there is technically a way to do conditional dependencies based on environment variables, but it's more of a hack than something we're comfortable supporting. Anyhow, even if we were to support conditional dependencies, it wouldn't make it into 5.0.

I want to only download SDKs for my execution platforms

For what it's worth, this is possible today if you carefully separate the execution platform definition and the actual SDK binary into two separate repos. The execution platform definition repo is always fetched (because it has the platform targets in it), but the actual SDK binaries are only downloaded if the platform is actually used. This is how the remote JDKs that come with Bazel are set up today.

@brentleyjones
Copy link
Author

For what it's worth, this is possible today if you carefully separate the execution platform definition and the actual SDK binary into two separate repos. The execution platform definition repo is always fetched (because it has the platform targets in it), but the actual SDK binaries are only downloaded if the platform is actually used. This is how the remote JDKs that come with Bazel are set up today.

That's good to hear. Someone should update rules_go to do this then 😄.

@katre
Copy link
Member

katre commented Nov 16, 2021

When we first added this system, that's exactly how rules_go worked (because it was the testbed): there was the actual @io_bazel_rules_go repository, which defined the toolchain type, actual rules, platforms, and toolchain entries, and each registered SDK was a separate repository which would be downloaded only when used.

If you're seeing something different (and if Bazel is downloading all several dozen go SDKs, you'll notice), that should definitely be filed as a bug to investigate.

@brentleyjones
Copy link
Author

If you're seeing something different (and if Bazel is downloading all several dozen go SDKs, you'll notice), that should definitely be filed as a bug to investigate.

The default behavior only downloads SDKs for the host platform. If I have an execution platform that is different, one way to do that is to set my host platform to that execution platform. The repository rule uses uname though, so to get around that I have to manually call go_sdk_download for each architecture. These are all downloaded. I can file a bug over there.

@katre
Copy link
Member

katre commented Nov 16, 2021

@Wyverald I'm not sure why the NDK shows an error and the SDK doesn't. At a guess, the NDK error is just that it's not fully configured: there's no path to the NDK, and no environment variable. If the SDK isn't reporting errors, then either there is a path or the environment variable is set (even if it's not set to anything useful).

One way forward would be to add an attribute to android_ndk_repository (and android_sdk_repository, for completeness) to disable errors when improperly configured.

@katre
Copy link
Member

katre commented Nov 16, 2021

@brentleyjones Hmm, that's a bit odd, but I haven't followed rules_go recently. Definitely follow up with the maintainers and see if there's an easier way.

@brentleyjones
Copy link
Author

If the SDK isn't reporting errors, then either there is a path or the environment variable is set (even if it's not set to anything useful).

I don't have any setup for the Android SDK: no env variable and no setup.

@Wyverald
Copy link
Member

^ neither do I.

@Wyverald
Copy link
Member

I'll take this off the release blocker list for now, as there doesn't seem to be a short-term solution (unless @katre you think we could work out a way to make android_ndk_repository not fail, like android_sdk_repository). Unfortunately this means that people have to manually comment out the line in WORKSPACE, or change their CI setup to insert the line, which is definitely not great :(

@jpsim
Copy link

jpsim commented Feb 3, 2022

The time has now come for Envoy Mobile to update to Bazel 5 so we're likely going to comment out the android_ndk_repository configuration in our WORKSPACE file, requiring any CI jobs or local developers that need to build for Android to uncomment that line.

This isn't too problematic for CI, but will lead to some other issues:

  1. Envoy Mobile developers building for Android will need to manually uncomment this line. This is cumbersome and we risk accidentally checking in this change to source control. Although CI will fail if this happens, it's still going to add friction.
  2. If any consumers of Envoy Mobile for Android are building from source, they'll need to mirror this pre-processing in their build system or CI.

Many of the workaround ideas that were suggested in this thread would work for us.

For example, @brentleyjones suggested a way to conditionally run a repository rule with a --define, build setting, select() or environment variable.

@katre's suggestion to add an attribute to android_ndk_repository to disable errors when improperly configured would also be suitable for us, considering that any misconfiguration will fail at a later step if it was needed. I'd likely be able to implement this option if maintainers of the rule are open to accepting such a change.

@keith
Copy link
Member

keith commented Feb 4, 2022

I think there needs to be some solution to this, even if it's the one mentioned above just making the NDK not fail. I think many projects are setup in this way for android where maybe not all codepaths require it, but some do, for example grpc:

https://github.com/grpc/grpc/blob/608970f78309b219ef3c0219f1f1ed81a25a7f1a/WORKSPACE#L56-L67

Based on this PR grpc/grpc#28683 they already hit this during their upgrade attempt.

@keith
Copy link
Member

keith commented Feb 4, 2022

For reference here's a workaround that I think will work in the meantime envoyproxy/envoy-mobile#2039

jpsim referenced this issue in envoyproxy/envoy-mobile Feb 4, 2022
* Update Envoy: envoyproxy/envoy@71248e5...bbcd487
* Remove various `remotejdk_11` configurations in .bazelrc which we now inherit from Envoy
* Work around https://github.com/bazelbuild/bazel/issues/14260 androidndk fetching issue

Signed-off-by: JP Simard <[email protected]>
Co-authored-by: Keith Smiley <[email protected]>
@jiawen
Copy link

jiawen commented Feb 28, 2022

I just tested the workarounds listed above (grpc/grpc#28683, envoyproxy/envoy-mobile#2039) and they work. What isn't obvious to a novice is that you can't call register_toolchains("@androidndk:all") in WORKSPACE as that will turn lazy back into eager and fail when the Android environment variables aren't set.

The way to defer this would be to pass --extra_toolchains=@androidndk:all on the command line or in .bazelrc.

@ahumesky
Copy link
Collaborator

We're replacing the native version of android_ndk_repository with a starlark-based implementation at https://github.com/bazelbuild/rules_android_ndk you might consider trying this version out to see if it avoids the problems here (it also supports the latest ndk version)

@ahumesky
Copy link
Collaborator

hmm actually since this is about registering toolchains, and you still register toolchains with the starlark version of android_ndk_repository, it might still have the same problem

@ahumesky ahumesky added the P2 label Oct 25, 2022
jpsim referenced this issue in envoyproxy/envoy Nov 29, 2022
* Update Envoy: 71248e5...bbcd487
* Remove various `remotejdk_11` configurations in .bazelrc which we now inherit from Envoy
* Work around https://github.com/bazelbuild/bazel/issues/14260 androidndk fetching issue

Signed-off-by: JP Simard <[email protected]>
Co-authored-by: Keith Smiley <[email protected]>
Signed-off-by: JP Simard <[email protected]>
@ahumesky
Copy link
Collaborator

Sounds like this can be solved with #63, we should revisit that PR

@ahumesky ahumesky transferred this issue from bazelbuild/bazel Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants