Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule_implementation_hash is unsound when a rule definition site is a macro in b.bzl, but that macro is invoked from a.bzl, and that invocation passes along e.g. an implementation function object #12086

Closed
linzhp opened this issue Sep 11, 2020 · 19 comments
Assignees
Labels
P0 This is an emergency and more important than other current work. (Assignee required)

Comments

@linzhp
Copy link
Contributor

linzhp commented Sep 11, 2020

Description of the problem / feature request:

Since Bazel 3.4, rule_implementation_hash in bazel query's proto output doesn't always change for rule implementation changes

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

-- ./WORKSPACE --
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "io_bazel_rules_go",
    sha256 = "08c3cd71857d58af3cda759112437d9e63339ac9c6e0042add43f4d94caf632d",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_go/releases/download/v0.24.2/rules_go-v0.24.2.tar.gz",
        "https://github.com/bazelbuild/rules_go/releases/download/v0.24.2/rules_go-v0.24.2.tar.gz",
    ],
)

load("@io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")

go_rules_dependencies()

go_register_toolchains()

-- ./rules/heatpipe.bzl --
load("@io_bazel_rules_go//go:def.bzl", "go_rule")

def _heatpipe_impl(ctx):
    ctx.actions.write(ctx.outputs.types_out, "types1")
    ctx.actions.write(ctx.outputs.fx_out, "fx1")

    return [
        DefaultInfo(files = depset([ctx.outputs.types_out, ctx.outputs.fx_out])),
    ]

_heatpipe_rule = go_rule(
    _heatpipe_impl,
    attrs = {
        "topic": attr.string(
            doc = "The topic for the schema",
            mandatory = True,
        ),
        "types_out": attr.output(
            doc = "The __types.go file to be generated",
            mandatory = True,
        ),
        "fx_out": attr.output(
            doc = "The __fx.go file to be generated",
            mandatory = True,
        ),
    },
)

def heatpipe(name, topic, types_out = "types.go", fx_out = "fx.go"):
    _heatpipe_rule(
        name = name,
        topic = topic,
        types_out = types_out,
        fx_out = fx_out,
    )

-- ./user_created/BUILD.bazel --
load("//rules:heatpipe.bzl", "heatpipe")

heatpipe(
    name = "heatpipe",
    topic = "hp-yodlee-integration-user_created",
)

-- ./rules/BUILD.bazel --
$ bazel query //user_created:heatpipe --output=proto | protoc --decode_raw | grep rule_implementation_hash -A2
Loading: 0 packages loaded
Loading: 1 packages loaded

Loading: 1 packages loaded
      1: "$rule_implementation_hash"
      2: 2
      5: "45b15fdaa24bca7551749305bd313cb60bd329deabd30dc53d653fc3145d5d1d"
$ sed -i '' "s/types1/types2/" rules/heatpipe.bzl
$ bazel query //user_created:heatpipe --output=proto | protoc --decode_raw | grep rule_implementation_hash -A2
Loading: 0 packages loaded
Loading: 1 packages loaded

Loading: 1 packages loaded
      1: "$rule_implementation_hash"
      2: 2
      5: "45b15fdaa24bca7551749305bd313cb60bd329deabd30dc53d653fc3145d5d1d"

What operating system are you running Bazel on?

macOS

What's the output of bazel info release?

release 3.5.0-homebrew

@benjaminp
Copy link
Collaborator

@alexjski 9f2cab5 is suspect as the rule hash is now the transitive hash of the defining module not necessarily the top-level loading one.

@haxorz haxorz added P0 This is an emergency and more important than other current work. (Assignee required) team-Starlark labels Sep 11, 2020
@haxorz
Copy link
Contributor

haxorz commented Sep 11, 2020

@linzhp Thank you very much for filing this! @michajlo saw this Github entry and suspected the underlying bug was the explanation for some very mysterious nondeterminism we've been recently seeing internally at Google with a system that relies on caches of the rule_implementation_hash thing. After staring at your repro (ty for that!) and 9f2cab5, I'm pretty sure this issue is relevant.

@benjaminp Thank you very much for your prompt and spot-on insights, as always. You are completely correct about the culprit commit. The bug here is pretty blatant.

Here is the setup of my more minimal repro:

$ find hash -type f | while read P; do echo --- $P ---; cat $P; done
--- hash/a/a.bzl ---
load("//hash/b:b.bzl", "make_a")
def f():
    return "old"
a = make_a(f)
--- hash/a/BUILD ---
--- hash/b/b.bzl ---
def make_a(impl):
    return rule(
        implementation = impl
    )
--- hash/b/BUILD ---
--- hash/BUILD ---
load("//hash/a:a.bzl", "a")
a(name = "t")

I've also edited the issue title to more precisely describe the bug.

@haxorz haxorz changed the title rule_implementation_hash doesn't detect rule changes rule_implementation_hash is unsound when a rule definition site is a macro in b.bzl, but that macro is invoked from a.bzl, and that invocation passes along e.g. an implementation function object Sep 11, 2020
@haxorz
Copy link
Contributor

haxorz commented Sep 11, 2020

Github isn't letting me assign to @alexjski (Github "org" membership issue?) but, rest assured, Alex (or someone else) will work on this with very high priority tomorrow or Monday.

@haxorz
Copy link
Contributor

haxorz commented Sep 11, 2020

@alandonovan , for awareness

@alandonovan
Copy link
Contributor

alandonovan commented Sep 11, 2020

The rule definition environment (RDE---a concept that desperately needs an explanatory doc comment somewhere, not least because the term is overloaded as the name of an unrelated Java class) is based on the hash of the source of the .bzl file containing the immediate call to rule(), plus all bzl files it transitively loads. It is not based on the .bzl file that is being initialized. In other words, it's the innermost frame on the call stack, not the outermost. This was discussed at length in the context of cr/311173293 aka 9f2cab5 (see

) and the sequence of changes of which it was a part; Alex Jurkowski @alexjski has the freshest knowledge here.

BTW, Nathan: head dir/* is your friend for listing files preceded by their file name. (Add -n 999 if any file is long.)

@benjaminp
Copy link
Collaborator

benjaminp commented Sep 11, 2020

The rule definition environment (RDE---a concept that desperately needs an explanatory doc comment somewhere, not least because the term is overloaded as the name of an unrelated Java class) is based on the hash of the source of the .bzl file containing the immediate call to rule(), plus all bzl files it transitively loads. It is not based on the .bzl file that is being initialized. In other words, it's the innermost frame on the call stack, not the outermost. This was discussed at length in the context of cr/311173293 aka 9f2cab5 (see


) and the sequence of changes of which it was a part; Alex Jurkowski @alexjski has the freshest knowledge here.

What is the use for that definition? In the most extreme thought experiment, every rule in the universe could have the same hash if they all were defined through the make_a in haxorz 's example.

I suggest the innermost module at rule export time as a more intuitive and—in the context of this bug, anyway—correct definition. (I suppose that's the same as the old method because module globals are immutable after creation.)

BTW, Nathan: head dir/* is your friend for listing files preceded by their file name. (Add -n 999 if any file is long.)

head -v for the filenames.

@haxorz
Copy link
Contributor

haxorz commented Sep 11, 2020

+1 to what @benjaminp said. I was going to say the same thing (and give a more extreme version of the thought experiment where make_a uses **kwargs). The current/new meaning of "rule definition environment" is not useful. The old meaning was more useful. Let's go back to something as useful as the old meaning.

Alan, chat me internally if you want to see some real examples of code like my make_a.

[@alandonovan] This was discussed at length in the context of cr/311173293 aka 9f2cab5 and the sequence of changes of which it was a part

I admittedly wasn't part of that discussion, nor did I read through it last night before posting here. Do you happen to remember if the situation in this Github issue's title was considered during the design discussion? If not, why not? And if so, why was the consequence considered WAI and not-bad?

[@alandonovan] BTW, Nathan: head dir/* is your friend for listing files preceded by their file name. (Add -n 999 if any file is long.)

Awesome, thanks!

@alandonovan
Copy link
Contributor

alandonovan commented Sep 11, 2020

What is the use for that definition?

Good question; I'm struggling to page back into memory our lengthy conversations about this, which unfortunately left no residue in the code. A superficial observation: the offending CL at least made the treatment of the label portion consistent with the digest portion. Before, the label came from the innermost .bzl file and the digest from the outermost .bzl file on the call stack.

At least a part of the conversation was about the infeasibility of deeply hashing Starlark function values, and the assumption that Starlark will ~soon support nested def statements with lexical scope, aka closures, which means that a rule class created by a function in Q.bzl may be a closure over values supplied by a function in P.bzl, where P loads Q. This seems to argue for the RDE of a RuleClass being a property of the label + digest of the outermost frame, which is the opposite of what the change did. Alex, do you remember more?

every rule in the universe could have the same hash if they all were defined through the make_a in haxorz 's example.

If the purpose of the hash is to detect changes in the logic of a rule, then that's not necessarily wrong, as all the rules will have the same load graph of bzl sources.

@alexjski alexjski self-assigned this Sep 11, 2020
@alexjski
Copy link
Contributor

@benjaminp, you are absolutely right that my change (9f2cab5) caused the issue described -- that's an excellent find. @linzhp, thank you for filing that issue and for the repro.

We have a fix for the bug already (b9bb102). We will cherry-pick that into Bazel 3.5.1 release.

Personally, I would like to apologize to everyone involved for having caused that issue. I did not realize this edge case when working on that change, which I thought was a pure cleanup effort. I am sorry to everyone who has been affected by failures related to this and spent time diagnosing/recovering from problems which were caused by it.

@linzhp
Copy link
Contributor Author

linzhp commented Sep 11, 2020

Thanks for getting this resolved so quickly

@alandonovan
Copy link
Contributor

Alex, the fault was mine: you worked on the clean-up at my behest, and the suggestion to make the behavioral change along with the refactoring was mine based on a misunderstanding of the inconsistency between the label and digest portions. Thanks again for making the change, and for unmaking it when the problem emerged.

@rohansingh
Copy link

Am I wrong or did the fix not make it into 3.5.1 after all?

@meisterT
Copy link
Member

meisterT commented Oct 1, 2020

@rohansingh what makes you think so? I tried the opening example with bazelisk and 3.5.0 and could repro. After switching to 3.5.1 the hash changed.

@rohansingh
Copy link

I was just going by the fix commit that @alexjski mentioned above, b9bb102. That commit doesn't seem to be in 3.5.1 according to GitHub.

@meisterT
Copy link
Member

meisterT commented Oct 1, 2020

Ah, ok - see the list of commits here: https://github.com/bazelbuild/bazel/commits/3.5.1
It shows f1f9411 which is the same as the one that you reference - it has a different hash since it was a cherrypick. You can see the 3.5.1 tag there.

@rohansingh
Copy link

Ah, my mistake. Thanks!

@linzhp
Copy link
Contributor Author

linzhp commented Oct 14, 2020

This is still an issue in 3.6. Did you forget to include the fix?

@meisterT
Copy link
Member

@linzhp I fear you are correct, that commit is missing from 3.6.0. cc @laurentlb

@linzhp
Copy link
Contributor Author

linzhp commented Oct 24, 2020

Verified the issue is fixed in 3.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 This is an emergency and more important than other current work. (Assignee required)
Projects
None yet
Development

No branches or pull requests

7 participants