Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a cc_library to the constraint_values of a platform causes a crash #22996

Closed
tpudlik opened this issue Jul 11, 2024 · 9 comments
Closed
Assignees
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug untriaged

Comments

@tpudlik
Copy link
Contributor

tpudlik commented Jul 11, 2024

Description of the bug:

On Bazel 8.0.0-pre.20240618.2, adding a cc_library to the constraint_values of a platform causes Bazel to crash with an unhelpful error message:

java.lang.IllegalStateException: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ConfiguredTargetKey{label=//:library, config=BuildConfigurationKey[6f97ef646cfec4f33129d2a43267b717f71213ca1cec09d0d350528022f27781]}' (requested by nodes 'ConfiguredTargetKey{label=//:platform, config=BuildConfigurationKey[6f97ef646cfec4f33129d2a43267b717f71213ca1cec09d0d350528022f27781]}')                            
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.evaluateSkyKeys(SkyframeExecutor.java:1996)                                                                                                                                                                                                                                                                                                                          
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.evaluateSkyKeys(SkyframeExecutor.java:1972)                                                                                                                                                                                                                                                                                                                          
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createBuildConfigurationKey(SkyframeExecutor.java:1928)                                                                                                                                                                                                                                                                                                              
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:1852)                                                                                                                                                                                                                                                                                                                         
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createConfiguration(SkyframeExecutor.java:1608)                                                                                                                                                                                                                                                                                                                      
        at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:258)                                                                                                                                                                                                                                                                                                                                                  
        at com.google.devtools.build.lib.buildtool.AnalysisAndExecutionPhaseRunner.runAnalysisAndExecutionPhase(AnalysisAndExecutionPhaseRunner.java:227)                                                                                                                                                                                                                                                                               
        at com.google.devtools.build.lib.buildtool.AnalysisAndExecutionPhaseRunner.execute(AnalysisAndExecutionPhaseRunner.java:125)                                                                                                                                                                                                                                                                                                    
        at com.google.devtools.build.lib.buildtool.BuildTool.buildTargetsWithMergedAnalysisExecution(BuildTool.java:358)                                                                                                                                                                                                                                                                                                                
        at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:197)                                                                                                                                                                                                                                                                                                                                           
        at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:543)                                                                                                                                                                                                                                                                                                                                         
        at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:511)                                                                                                                                                                                                                                                                                                                                         
        at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:105)                                                                                                                                                                                                                                                                                                                                      
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:683)                                                                                                                                                                                                                                                                                                                
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:252)                                                                                                                                                                                                                                                                                                                           
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:607)                                                                                                                                                                                                                                                                                                                                  
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:676)                                                                                                                                                                                                                                                                                                                                    
        at io.grpc.Context$1.run(Context.java:566)                                                                                                                                                                                                                                                                                                                                                                                      
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)                                                                                                                                                                                                                                                                                                                                                  
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)                                                                                                                                                                                                                                                                                                                                                 
        at java.base/java.lang.Thread.run(Unknown Source)                                                                                                                                                                                                                                                                                                                                                                               
Caused by: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ConfiguredTargetKey{label=//:library, config=BuildConfigurationKey[6f97ef646cfec4f33129d2a43267b717f71213ca1cec09d0d350528022f27781]}' (requested by nodes 'ConfiguredTargetKey{label=//:platform, config=BuildConfigurationKey[6f97ef646cfec4f33129d2a43267b717f71213ca1cec09d0d350528022f27781]}')                                                  
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:557)                                                                                                                                                                                                                                                                                                                
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:426)                                                                                                                                                                                                                                                                                                             
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)                                                                                                                                                                                                                                                                                                                                       
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)                                                                                                                                                                                                                                                                                                                                                           
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)                                                                                                                                                                                                                                                                                                                                           
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)                                                                                                                                                                                                                                                                                                                                                             
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)                                                                                                                                                                                                                                                                                                                                                        
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)                                                                                                                                                                                                                                                                                                                                                      
Caused by: com.google.common.base.VerifyException: expected a Starlark exec transition definition                                                                                                                                                                                                                                                                                                                                       
        at com.google.common.base.Verify.verifyNotNull(Verify.java:503)                                                                                                                                                                                                                                                                                                                                                                 
        at com.google.devtools.build.lib.analysis.config.ExecutionTransitionFactory.create(ExecutionTransitionFactory.java:99)                                                                                                                                                                                                                                                                                                          
        at com.google.devtools.build.lib.analysis.config.ExecutionTransitionFactory.create(ExecutionTransitionFactory.java:50)                                                                                                                                                                                                                                                                                                          
        at com.google.devtools.build.lib.analysis.producers.DependencyProducer.step(DependencyProducer.java:175)                                                                                                                                                                                                                                                                                                                        
        at com.google.devtools.build.skyframe.state.TaskTreeNode.run(TaskTreeNode.java:94)                                                                                                                                                                                                                                                                                                                                              
        at com.google.devtools.build.skyframe.state.Driver.drive(Driver.java:87)                                                                                                                                                                                                                                                                                                                                                        
        at com.google.devtools.build.lib.skyframe.DependencyResolver.computeDependencies(DependencyResolver.java:663)                                                                                                                                                                                                                                                                                                                   
        at com.google.devtools.build.lib.skyframe.DependencyResolver.evaluate(DependencyResolver.java:391)                                                                                                                                                                                                                                                                                                                              
        at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:265)                                                                                                                                                                                                                                                                                                                   
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:468)

@katre

Which category does this issue belong to?

Configurability

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Minimal example: https://github.com/tpudlik/constraint_value_crash.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 8.0.0-pre.20240618.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

Kind of? On Bazel 7.2.1 I get an error instead of a crash:

ERROR: /usr/local/google/home/tpudlik/.cache/bazel/_bazel_tpudlik/259e376b1fd08ce7249a45dd7f913f12/external/bazel_tools/src/conditions/BUILD:145:27: in config_setting rule @@bazel_tools//src/conditions:host_windows_arm64_constraint: all rules of type config_setting require the presence of all of [PlatformConfiguration], but these were all disabled in configuration ab8cba103302292a2b1ba4f5816adc45d4e4bc4d1f3a0e0ddbfd94da0
f3bd83b                                                                                                                                                                                                                                                                                                                                                                                                                                 
ERROR: /usr/local/google/home/tpudlik/.cache/bazel/_bazel_tpudlik/259e376b1fd08ce7249a45dd7f913f12/external/bazel_tools/src/conditions/BUILD:145:27: Analysis of target '@@bazel_tools//src/conditions:host_windows_arm64_constraint' failed                                                                                                                                                                                            
ERROR: /usr/local/google/home/tpudlik/.cache/bazel/_bazel_tpudlik/259e376b1fd08ce7249a45dd7f913f12/external/bazel_tools/src/conditions/BUILD:145:27: errors encountered resolving select() keys for @@bazel_tools//src/conditions:host_windows                                                                                                                                                                                          
ERROR: /usr/local/google/home/tpudlik/.cache/bazel/_bazel_tpudlik/259e376b1fd08ce7249a45dd7f913f12/external/bazel_tools/tools/def_parser/BUILD:11:10: errors encountered resolving select() keys for @@bazel_tools//tools/def_parser:def_parser                                                                                                                                                                                         
ERROR: /usr/local/google/home/tpudlik/src/constraint_value_crash/BUILD.bazel:14:10: Target //:platform was referenced as a platform, but does not provide PlatformInfo                                                                                                                                                                                                                                                                  
ERROR: Analysis of target '//:binary' failed; build aborted                                                                                                                                                                                                                                                                                                                                                                             
INFO: Elapsed time: 0.181s, Critical Path: 0.00s                                                                                                                                                                                                                                                                                                                                                                                        
INFO: 1 process: 1 internal.                                                                                                                                                                                                                                                                                                                                                                                                            
ERROR: Build did NOT complete successfully 

This error is not especially informative, but at least it suggests the problem is with the platform, somehow.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Configurability platforms, toolchains, cquery, select(), config transitions label Jul 11, 2024
@tpudlik
Copy link
Contributor Author

tpudlik commented Jul 11, 2024

I realized I never described what the expected behavior is. I would have hoped Bazel would print an error message like,

"The constraint_values attribute of //:platform includes the target //:library which is not a constraint_value."

@katre
Copy link
Member

katre commented Jul 15, 2024

Reproduced the failure, investigating now.

@katre
Copy link
Member

katre commented Jul 15, 2024

The actual cause of this is fairly basic: any target with the standard definition of compatible_with is trying to use the exec transition, and in this case the target is being resolved in the empty config. And, of course, the empry config doesn't define an exec transition.

Possible fixes:

  1. Don't use the exec config for compatible_with: NoConfigTransition is arguably the right option anyway.
  2. Fix dependency analysis to handle a request for cfg = "exec" when the Starlark exec flag is unset: probably by re-using the current (empty) config
  3. Figure out what changed that now //tools/cpp:current_cc_toolchain is being analyzed in the empty config: this is probably due to wider use of NoConfigTransition.

@katre
Copy link
Member

katre commented Jul 15, 2024

I was initially surprised that there isn't an earlier error: PlatformRule.constraint_values does declare that dependencies must declare ConstraintValueInfo.

However, after a lot (a lot) of code tracing, it looks like this is checked too late:

  1. The mandatoryProviders method populates a RequiredProviders object
  2. But RequiredProviders.getMissing is only ever called from RuleContext.Builder.validateDirectPrerequisite (with a lot of methods in between, but those only have one caller).
  3. validateDirectPrerequisite (and all of RuleContext.Builder) is part of analysis, which comes after dependencies are themselves analyzed.
  4. This means that if the dependency is in error, the validation never happens, and so even though the cc_library should be ignored and never analyzed, instead Bazel attempts to analyze it, fails, and reports that error.

A fix would be to check for mandatory providers much earlier, before dependencies are computed (and so based only on advertised providers, not actual providers). I'm not sure this is easy to do, and I'm not sure what else we'd break if we added it.

@katre
Copy link
Member

katre commented Jul 15, 2024

The summary here is that we can avoid the crash but I'm not sure we can give a reasonable error message without some work.

@gregestren
Copy link
Contributor

https://bazel-review.googlesource.com/c/bazel/+/254333 applies fix #1 from #22996 (comment).

Does that help?

@katre
Copy link
Member

katre commented Jul 15, 2024

It will stop the crash, doesn't help report a better error.

@katre
Copy link
Member

katre commented Jul 16, 2024

To summarize: there are really two issues here.

Bazel Crash

When the ExecutionTransitionFactory doesn't get the Starlark exec transition, it crashes bazel. This is a major problem and we need to avoid it, probably by reporting an error but keeping the Bazel server alive. I have a few ideas towards this:

  1. Fix ExecutionTransitionFactory to just return an instance of NoConfigTransition when the Starlark exec transition isn't present.
    1. @gregestren has suggested only doing this when the incoming config is also the empty config. This is possibly but tricky
  2. Fix ExecutionTransitionFactory to report an error directly (either by throwing an EvalException or returning an error message): this requires changing the TransitionFactory interface and potentially some cleanup of callers everywhere to handle this case.
  3. Fix DependencyProducer to directly check whether the transition factory is ExecutionTransitionFactory and the config is invalid. This is probably the most targetted but feels like a hack.

I'd like @gregestren's opinion on which of these are cleaner: I am leaning towards number 2 personally.

Invalid dependencies in constraint_values

It'd be very nice to correctly note (before analysis) that cc_library doesn't provide ConstraintValueInfo and error out early: right now even with the crash removed, the dependencies will be fully analyzed before platform checks whether they are appropriate.

This looks fairly difficult to manage,, but I'm going to file a separate issue to track it and raise it with the appropriate teams. It's not solvable in the short term, so I propose using this issue to only track the actual crash.

@tpudlik, does this make sense to you?

@tpudlik
Copy link
Contributor Author

tpudlik commented Jul 16, 2024

Yes, this all makes sense. Could you cc me on the separate better-error issue?

Thank you for digging into this! I didn't expect it would prove such a rabbit hole, but I'm glad we uncovered these issues (the ExecutionTransitionFactory problem in particular sounds like it might manifest in other contexts, too).

copybara-service bot pushed a commit that referenced this issue Jul 22, 2024
…on path.

Also fix incompatible flags that need to have values preserved.

Part of #22996.

PiperOrigin-RevId: 654704954
Change-Id: I90902a5a8df232f5b8e2d79904cb21553a14877d
copybara-service bot pushed a commit that referenced this issue Jul 22, 2024
Part of #22996.

PiperOrigin-RevId: 654859209
Change-Id: I865c086c0b1433ce51caa156d160d35e6d1f982d
copybara-service bot pushed a commit that referenced this issue Jul 22, 2024
Currently this is only usable within the builtin rules.

This is intended to reduce the complexity of the StarlarkDefinedConfigTransition, by making it easier to tell if a transition has the extra capabilities of the exec transition.

Work towards composable starlark transitions: #22248. Also part of fixing #22996.

PiperOrigin-RevId: 654886740
Change-Id: I1d5292bc2e5e8646e5563f0a2cd3afd9b2157659
copybara-service bot pushed a commit that referenced this issue Jul 26, 2024
…tion.

Needed for the execution transition factory.

Part of #22996.

PiperOrigin-RevId: 656477022
Change-Id: Id7f9702bc4767aff330b7990afc793dca593445a
copybara-service bot pushed a commit that referenced this issue Jul 26, 2024
Part of #22996.

PiperOrigin-RevId: 656502048
Change-Id: Ib56cd9374c7d2cc964689c3029c865b294247da8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug untriaged
Projects
None yet
Development

No branches or pull requests

6 participants