-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental incorrectness on transitions + build setting defaults #15732
Comments
I think this is the culprit: bazel/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java Lines 105 to 115 in c7c3fa1
Transition cache keys include a) the transition logic, b) the attributes of the target the transition is attached to, and c) the input flag values. I believe the problem is input flag defaults appear as |
Alternatively: the input flag doesn't appear in |
I took a brief look b/c saw this email: I can think of 2-3 fixes thus far:
|
Also: bazel/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java Lines 471 to 472 in c7c3fa1
where the transition loads its input flags and their defaults. Currently, this executes only after a cache miss. So, as you say, we somehow need to integrate it into the cache checking logic. Note that if we just move it before caching - so it executes unconditionally - this might slow down speed for transition-heavy builds. Performance optimization is exactly why we added the transition cache in the first place. Even a few Skyrame calls is more overhead than nothing. I'm not sure if it makes a practical difference. Re 3: my original approach indeed used a Skyframe function: I still have the prototype code - bazel/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java Line 169 in c7c3fa1
which is presumably more memory efficient? |
I think I disagree on 471-472 executing only under a cache miss. It looks like this executes unconditionally. bazel/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java Lines 471 to 472 in c7c3fa1
applyTransitionWithSkyframe is called first, unconditionally, via various callers (resolveGenericTransition and collectTransitionKeys). In applyStarlarkTransition lines 496-503 handle creating the cache key and looking for a cache hit. bazel/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java Lines 496 to 503 in c7c3fa1
Then 504+ handle a cache miss (including reading buildSettingsPackages to apply defaults). So it seems all the work of actually building up buildSettingsPackages is always done, regardless of cache hit or miss. This makes Option 1 a tempting quick fix. Also note then there is a nice symmetry between the cache key and the inputs to applyStarlarkTransition (which gives a better visual assurance the cache key is correct). PS: I do agree that Option 3 is worth exploring; I just fear it may take time to think through and experiment with converting transition application to a SkyFunction. In particular, I am concerned with the performance impact of restarts. PPS: I put Option 2 in the list but it seems so fiddly and potentially error prone that I am retracting it as a suggested option. |
Oh you're right! All the better! |
Please ignore if you already know this: Skyframe recently got the capability to maintain state through restarts. So unless the restarts would happen during the execution of a Starlark function, they should be much less costly than in the past. |
Two main goals of this CL: * Properly include 'metadata about Starlark build settings' in the transition application cache. * Better isolate where and why the transition infrastructure reads BUILD file information. This unifies previously duplicated logic in SkyframeExecutor. First, the transition application cache, on a miss, would also read defaults, type, aliasing information, etc. from BUILD files but failed to include any of this info in the cache key. This lead to failure to re-evaluate a transition when that build setting information in BUILD files changes. This CL fixes that by introducing a new type StarlarkBuildSettingsDetailsValue, which contains all that information for a given set of Starlark build settings. That value is now included in the cache key. Second, rather than having ConfiguredTargetFunction (via machinery in StarlarkTransition) itself do the Skyframe calls and subsequent computations necessary to compute StarlarkBuildSettingsDetailsValue, a new SkyFunction StarlarkBuildSettingsDetailsFunction was created to do so. This provides a few benefits: 1. ConfiguredTargetFunction now is only registering an edge onto a single StarlarkBuildSettingsDetailsFunction, rather than all the PackageValue needed to resolve the StarlarkBuildSettings. StarlarkBuildSettingsDetailsValue more narrowly describes how sensitive the transition is to change. This means irrelevant changes to a package involved in build setting resolution will not cause as much invalidation. This information is now cached so StarlarkTransition (called via ConfiguredTargetFunction) can more quickly compute the cache key. (Beforehand, it was always unconditionally following any build setting alias chains). 2. SkyframeExecutor can just evaluate StarlarkBuildSettingsDetailsValue rather than having a slightly different implementation getBuildSettingsPackages. (The new getStarlarkBuildSettingsDetailsValue functions that must be synced are much simpler and easier to keep in sync.) As part of this unification, the error messages are now consistent regardless of whether the problematic build settings are asked for when transitioning the top-level target or an underlying target. 3. The split between reading Packages to compute StarlarkBuildSettingsDetailsValue and then later consuming it in StarlarkTransition should be easier to read and maintain. PS: As part of the refactor, significant issues were found in PrepareAnalysisPhaseFunction. Warnings have been added as comments and upcoming work will be deleting that file entirely. This fixes caching issue described at #15732 PiperOrigin-RevId: 462741847 Change-Id: I517a9a841735e6cff26faea5373f50a164334f86
I took a brief look b/c saw this email:
I can think of 2-3 fixes thus far:
1. Add the defaults to the cache key (or, similarly, the
buildSettingsPackages parameter). This seems easiest and safest. Need to
verify hashability of the added parameter.
2. Move `addDefaultStarlarkOptions` to before the caching. This means
fromOptions will have the proper defaults instead of null. HOWEVER, also
will need to move the call validate (or some subset that handles nulling
out option defaults) after the cache hit. This is in case the defaults
change, want to make sure the nulling out of the default options is done
properly. See
https://github.com/bazelbuild/bazel/blob/c7c3fa1e97ad73ec95c310af2ffb692069a565eb/src/main/java/com/google/devtools/build/lib/analysis/config/ConfigurationResolver.java#L505
3. Just make transition application a SkyFunction. There are some
performance implications related to restarting though....
…On Thu, Jun 23, 2022 at 9:16 PM Greg ***@***.***> wrote:
Alternatively: the input flag doesn't appear in fromOptions at all
because it was never set on the command line.
—
Reply to this email directly, view it on GitHub
<#15732 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHX76EVB623NQLOV4FSLT3VQUD5LANCNFSM5ZWFEZBQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Description of the bug:
When a configuration transition depends on a user-defined flag, the transition doesn't update when the flag's default value updates.
This produces incorrect output for incremental builds (build a target, change the flag default, build the target again).
Workarounds:
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
https://gist.github.com/gregestren/3c6d67a3f23ea2a0b0baac34d9917dfb
Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?release 5.2.0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?Have you found anything relevant by searching the web?
Forked from #15653 (comment)
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: