-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rule_implementation_hash is unsound when a rule definition site is a macro in b.bzl, but that macro is invoked from a.bzl, and that invocation passes along e.g. an implementation function object #12086
Comments
@linzhp Thank you very much for filing this! @michajlo saw this Github entry and suspected the underlying bug was the explanation for some very mysterious nondeterminism we've been recently seeing internally at Google with a system that relies on caches of the @benjaminp Thank you very much for your prompt and spot-on insights, as always. You are completely correct about the culprit commit. The bug here is pretty blatant. Here is the setup of my more minimal repro:
I've also edited the issue title to more precisely describe the bug. |
Github isn't letting me assign to @alexjski (Github "org" membership issue?) but, rest assured, Alex (or someone else) will work on this with very high priority tomorrow or Monday. |
@alandonovan , for awareness |
The rule definition environment (RDE---a concept that desperately needs an explanatory doc comment somewhere, not least because the term is overloaded as the name of an unrelated Java class) is based on the hash of the source of the .bzl file containing the immediate call to bazel/src/main/java/com/google/devtools/build/lib/analysis/starlark/StarlarkRuleClassFunctions.java Line 352 in 460ab68
BTW, Nathan: |
What is the use for that definition? In the most extreme thought experiment, every rule in the universe could have the same hash if they all were defined through the I suggest the innermost module at rule export time as a more intuitive and—in the context of this bug, anyway—correct definition. (I suppose that's the same as the old method because module globals are immutable after creation.)
|
+1 to what @benjaminp said. I was going to say the same thing (and give a more extreme version of the thought experiment where Alan, chat me internally if you want to see some real examples of code like my
I admittedly wasn't part of that discussion, nor did I read through it last night before posting here. Do you happen to remember if the situation in this Github issue's title was considered during the design discussion? If not, why not? And if so, why was the consequence considered WAI and not-bad?
Awesome, thanks! |
Good question; I'm struggling to page back into memory our lengthy conversations about this, which unfortunately left no residue in the code. A superficial observation: the offending CL at least made the treatment of the label portion consistent with the digest portion. Before, the label came from the innermost .bzl file and the digest from the outermost .bzl file on the call stack. At least a part of the conversation was about the infeasibility of deeply hashing Starlark function values, and the assumption that Starlark will ~soon support nested def statements with lexical scope, aka closures, which means that a rule class created by a function in Q.bzl may be a closure over values supplied by a function in P.bzl, where P loads Q. This seems to argue for the RDE of a RuleClass being a property of the label + digest of the outermost frame, which is the opposite of what the change did. Alex, do you remember more?
If the purpose of the hash is to detect changes in the logic of a rule, then that's not necessarily wrong, as all the rules will have the same load graph of bzl sources. |
@benjaminp, you are absolutely right that my change (9f2cab5) caused the issue described -- that's an excellent find. @linzhp, thank you for filing that issue and for the repro. We have a fix for the bug already (b9bb102). We will cherry-pick that into Bazel 3.5.1 release. Personally, I would like to apologize to everyone involved for having caused that issue. I did not realize this edge case when working on that change, which I thought was a pure cleanup effort. I am sorry to everyone who has been affected by failures related to this and spent time diagnosing/recovering from problems which were caused by it. |
Thanks for getting this resolved so quickly |
Alex, the fault was mine: you worked on the clean-up at my behest, and the suggestion to make the behavioral change along with the refactoring was mine based on a misunderstanding of the inconsistency between the label and digest portions. Thanks again for making the change, and for unmaking it when the problem emerged. |
Am I wrong or did the fix not make it into 3.5.1 after all? |
@rohansingh what makes you think so? I tried the opening example with bazelisk and 3.5.0 and could repro. After switching to 3.5.1 the hash changed. |
I was just going by the fix commit that @alexjski mentioned above, b9bb102. That commit doesn't seem to be in 3.5.1 according to GitHub. |
Ah, ok - see the list of commits here: https://github.com/bazelbuild/bazel/commits/3.5.1 |
Ah, my mistake. Thanks! |
This is still an issue in 3.6. Did you forget to include the fix? |
@linzhp I fear you are correct, that commit is missing from 3.6.0. cc @laurentlb |
Verified the issue is fixed in 3.7 |
Description of the problem / feature request:
Since Bazel 3.4,
rule_implementation_hash
inbazel query
's proto output doesn't always change for rule implementation changesBugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
What operating system are you running Bazel on?
macOS
What's the output of
bazel info release
?release 3.5.0-homebrew
The text was updated successfully, but these errors were encountered: