Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent action conflicts for exec-starlark-exec transitions #13915

Closed

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Aug 29, 2021

With this commit, an execution transition will recompute the transition
hash appended to the output directory name if set by any previous
Starlark transition. This also takes into account that the Starlark
transition may transitively affect an option --foo_bar after the exec
transition if it previously affected the corresponding host option
--host_foo_bar.

Previously, an execution transition would reuse the existing hash, which
no longer reflected the current state of native options correctly and
thus led to action conflicts.

Fixes #13464.

AUTHORS Outdated Show resolved Hide resolved
CONTRIBUTORS Outdated Show resolved Hide resolved
@aiuto aiuto added Starlark configuration Starlark transitions, build_settings team-Configurability platforms, toolchains, cquery, select(), config transitions labels Sep 5, 2021
@aiuto aiuto requested a review from sdtwigg September 5, 2021 18:09
@katre katre added the P1 I'll work on this now. (Assignee required) label Sep 8, 2021
@katre
Copy link
Member

katre commented Sep 8, 2021

@sdtwigg is reviewing this.

@gregestren
Copy link
Contributor

Continuing the conversation at #13464 to clarify approaches.

@sdtwigg
Copy link
Contributor

sdtwigg commented Sep 9, 2021

To clarify, I/we have not forgotten about this. There are just a lot of potential concerns (mostly with where to take this API in the future) that we want to address before proceeding.

@fmeum
Copy link
Collaborator Author

fmeum commented Sep 25, 2021

@sdtwigg Do you think that we'll be able to get some version of this (possibly without the host_* accounting) into 5.0?

With this commit, an execution transition will recompute the transition
hash appended to the output directory name if set by any previous
Starlark transition. This also takes into account that the Starlark
transition may transitively affect an option --foo_bar after the exec
transition if it previously affected the corresponding host option
--host_foo_bar.

Previously, an execution transition would reuse the existing hash, which
no longer reflected the current state of native options correctly and
thus led to action conflicts.

Fixes bazelbuild#13464.
@fmeum
Copy link
Collaborator Author

fmeum commented Sep 27, 2021

@gregestren I'm seeing unexpected NullPointerExceptions in the tests: https://storage.googleapis.com/bazel-untrusted-buildkite-artifacts/36b11777-1957-4dc8-9914-3bdd82aa6e4a/src%5Ctest%5Cjava%5Ccom%5Cgoogle%5Cdevtools%5Cbuild%5Clib%5Crules%5Ccpp%5CCcLibraryConfiguredTargetTest%5Cattempt_1.log.
Do you have an idea why? I would have expected updateTransitionDirectoryNameFragment to be safe to call at any time.

@gregestren
Copy link
Contributor

I'm not sure. Which line is FunctionTransitionUtil.java:442 in your dev build?

@fmeum
Copy link
Collaborator Author

fmeum commented Sep 29, 2021

I'm not sure. Which line is FunctionTransitionUtil.java:442 in your dev build?

This one:

@gregestren
Copy link
Contributor

I'm not sure why the tests get NullPointerExceptions, I'm sorry. Maybe some mock test infrastructure is missing an important field?

More broadly, can we pause this while we work out #13464 (comment) and the discussion at #13587 (comment)?

The right next steps, in my mind, are for @sdtwigg and I to seed a proper design doc for all of us to focus our thoughts on. I want to be careful not to eliminate action conflicts by an endless layer of ad hoc changes.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 7, 2021

I'm not sure why the tests get NullPointerExceptions, I'm sorry. Maybe some mock test infrastructure is missing an important field?

More broadly, can we pause this while we work out #13464 (comment) and the discussion at #13587 (comment)?

The right next steps, in my mind, are for @sdtwigg and I to seed a proper design doc for all of us to focus our thoughts on. I want to be careful not to eliminate action conflicts by an endless layer of ad hoc changes.

Sounds very good to me, the proposal seems very reasonable and would resolve a cluster of usability and performance issues around transitions, beyond action conflicts.

Let me know when you have started a draft of the design doc and I will see what I can contribute.

Should I close the current PR in favor of this effort?

@sdtwigg
Copy link
Contributor

sdtwigg commented Nov 1, 2021

Ok, I was originally confused by the part of the this change (and corresponding discussion) regarding transitioning on --host_* values. (For reference, that seems tricky but we should discuss the merits as a separate issue.)

I need to find a proper issue to post the longer-form summary of our plan around ST-hash computation; however, for this specific sub-issue, I was planning to fix this issue by moving transitionDirectoryNameFragment out of BuildOptions and into BuildConfigurationValue (previously called BuildConfiguration).

transitionDirectoryNameFragment should wholly be a function of BuildOptions and it is fragile to require all transitions to remember to update it when it touches associated options, native or otherwise. BuildConfigurationValue is already a container for potentially-expensive-to-compute values based off BuildOptions so movement there is a natural fit (e.g. notably OutputDirectories, which ultimately consumed transitionDirectoryNameFragment is already part of BuildConfigurationValue!)

This essentially makes this PR irrelevant (although considering it did expose other potential performance problems with the the execution transition).

@sdtwigg
Copy link
Contributor

sdtwigg commented Nov 1, 2021

See #14023 (comment)

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 2, 2021

Ok, I was originally confused by the part of the this change (and corresponding discussion) regarding transitioning on --host_* values. (For reference, that seems tricky but we should discuss the merits as a separate issue.)

I need to find a proper issue to post the longer-form summary of our plan around ST-hash computation; however, for this specific sub-issue, I was planning to fix this issue by moving transitionDirectoryNameFragment out of BuildOptions and into BuildConfigurationValue (previously called BuildConfiguration).

transitionDirectoryNameFragment should wholly be a function of BuildOptions and it is fragile to require all transitions to remember to update it when it touches associated options, native or otherwise. BuildConfigurationValue is already a container for potentially-expensive-to-compute values based off BuildOptions so movement there is a natural fit (e.g. notably OutputDirectories, which ultimately consumed transitionDirectoryNameFragment is already part of BuildConfigurationValue!)

This essentially makes this PR irrelevant (although considering it did expose other potential performance problems with the the execution transition).

I fully agree that moving the transitionDirectoryNameFragment computation is a much better solution. I will close this PR in favor of this approach. Depending on how it is implemented, the host_* options might lead to action conflicts again, but we can just keep that in mind and e.g. reuse the test case from this PR.

@fmeum fmeum closed this Nov 2, 2021
bazel-io pushed a commit that referenced this pull request Nov 5, 2021
…nValue

As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue  than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.)

This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment.

This fixes and subsumes #13464 and #13915, respectively. This is related to #14023

PiperOrigin-RevId: 407913175
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 6, 2021

@sdtwigg Should I create a new PR containing just the test (probably only the one without --host_ flags)? It could serve as a useful regression test for upcoming changes to transition logic.

fmeum pushed a commit to fmeum/bazel that referenced this pull request Nov 9, 2021
…nValue

As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue  than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.)

This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment.

This fixes and subsumes bazelbuild#13464 and bazelbuild#13915, respectively. This is related to bazelbuild#14023

PiperOrigin-RevId: 407913175
Wyverald pushed a commit that referenced this pull request Nov 11, 2021
* Move transitionDirectoryNameFragment calculation to BuildConfigurationValue

As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue  than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.)

This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment.

This fixes and subsumes #13464 and #13915, respectively. This is related to #14023

PiperOrigin-RevId: 407913175

* Properly account for StarlarkOptions at their default (=null) when calculating ST-hash

Previously, the hash calculation was skipping including StarlarkOptions that happened to be at their default values.

This is wrong since those values may still be in "affected by Starlark transition" (because either the commandline set them and the Starlark transition reset them to their Starlark defaults thus still requiring a hash change OR the commandline did not set them but a series of Starlark transitions did an default->B->default anyways causing the Starlark option to still be 'stuck' in "affected by Starlark transition").

Resolves #14239

PiperOrigin-RevId: 408701552

Co-authored-by: twigg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes P1 I'll work on this now. (Assignee required) Starlark configuration Starlark transitions, build_settings team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflicting actions for cc_proto_library with transition in Starlark rule dependency
5 participants