Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules Apple Fails on Intel x86 Mac when using CpuInfo (Tensorflow) #1954

Open
alexbeattie42 opened this issue May 3, 2023 · 6 comments
Open

Comments

@alexbeattie42
Copy link

Problem

If you are attempting to use any rule from @build_bazel_rules_apple on an Intel x86_64 Mac with anything that relies on the cpuinfo project (like Tensorflow) you will face the build error described by this mediapipe issue and this tensorflow issue.

Root Cause

The root cause of the problem falls with the complex interplay between cpuinfo, tensorflow, bazel, and the @build_bazel_rules_apple library. Bazel has reverted the default cpu value on x86 macOS to "darwin" in this commit.

This means that bazel expects Intel x86_64 Macs to report their cpu as darwin. @build_bazel_rules_apple will report the cpu as darwin_x86_64. This cpu does not exist in the cpuinfo project as it expects darwin. Although there is an unmerged pull request to fix this behavior, it is unlikely to be resolved soon.

Solution

In order to fix the problem, I have created a patch so that the library can read the host_cpu value which will report as darwin on an Intel x86_64 Mac. If this value is present, it will return darwin instead of darwin_x86_64.

The Fix

It is necessary to add the --incompatible_enable_apple_toolchain_resolution flag to your build command and modify@build_bazel_rules_apple with the following patch:

diff --git a/apple/internal/transition_support.bzl b/apple/internal/transition_support.bzl
index 65f51b89..7da7ade3 100644
--- a/apple/internal/transition_support.bzl
+++ b/apple/internal/transition_support.bzl
@@ -102,6 +102,9 @@ def _cpu_string(*, cpu, platform_type, settings = {}):
             return "ios_sim_arm64"
         return "ios_x86_64"
     if platform_type == "macos":
+        host_cpu = settings["//command_line_option:host_cpu"]
+        if host_cpu == "darwin": 
+            return "darwin"
         if cpu:
             return "darwin_{}".format(cpu)
         macos_cpus = settings["//command_line_option:macos_cpus"]
@@ -342,6 +345,7 @@ _apple_rule_common_transition_inputs = [
     "//command_line_option:apple_crosstool_top",
 ]
 _apple_rule_base_transition_inputs = _apple_rule_common_transition_inputs + [
+    "//command_line_option:host_cpu",
     "//command_line_option:cpu",
     "//command_line_option:ios_multi_cpus",
     "//command_line_option:macos_cpus",
@keith
Copy link
Member

keith commented May 3, 2023

does pytorch/cpuinfo#139 fix this issue as well? I see your mention of it but does it work? maybe we could poke some folks to get it reviewed if so

@alexbeattie42
Copy link
Author

alexbeattie42 commented May 3, 2023

Yes that PR would fix the problem for anyone who will update to the version of tensorflow which will eventually contain the cpuinfo fix when it is merged. The problem would still exist retroactively for anyone using older versions of tensorflow (which would be on the older versions of cpuinfo that do not have that fix).

I have written a post which shows both ways to patch the problem. The harder way is using the same technique that the cpuinfo MR uses and replicating it across all 3 places cpuinfo is referenced. This creates a double nested patch for tensorflow that consists of 3 patches itself and is quite cumbersome to maintain.

@keith
Copy link
Member

keith commented May 3, 2023

FYI darwin as a CPU has been removed again on 7.x here bazelbuild/bazel#17547

what change broke this?

@alexbeattie42
Copy link
Author

alexbeattie42 commented May 4, 2023

As noted by your comment, there are still 3 projects with open PRs that rely on the darwin means x86_64 convention. Additionally ruy is not listed but is another dependency of tensorflow that does.

It will take a long time for the change from darwin to darwin_x86_64 to percolate downstream and for people to adopt it. The original issue which manifested as a symptom of this problem has been open for 2+ years without traction or resolution from the bazel, mediapipe, or tensorflow teams.

It may be okay to force the change from darwin to darwin_x86_64 for Bazel 7.x and forward but this library should not be broken for people who choose to use tensorflow. The point of my PR is that if a system (for me Bazel 6.1) self reports as darwin, and not darwin_x86_64, then don't attempt to override it because it breaks downstream projects (tensorflow, ruy, xnnpack, etc.) in the pipeline.

@keith
Copy link
Member

keith commented May 4, 2023

thanks for linking that other library, i submitted a fix there too google/ruy#331

im fine with merging your change, but i do think it would be worth trying to push the other real solutions at the same time, if nothing else it will block any future bazel upgrades until it's solved.

@schmidt-sebastian
Copy link

MediaPipe is also affected by this: google-ai-edge/mediapipe#2324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants