Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to always rerun a repo rule #3041

Open
kchodorow opened this issue May 23, 2017 · 33 comments
Open

Ability to always rerun a repo rule #3041

kchodorow opened this issue May 23, 2017 · 33 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@kchodorow
Copy link
Contributor

I could have sworn that the stamp attribute combined with --stamp and --workspace_status_command did this, but the rule isn't being run even though bazel-out/volatile-status.txt changes. See https://groups.google.com/d/msgid/bazel-discuss/c1a4f036-19e6-45e1-b1eb-e6c280f3c6a5%40googlegroups.com.

@iirina iirina added category: misc > misc category: rules > misc native type: bug P2 We'll consider working on this in future. (Assignee optional) labels May 26, 2017
@lberki
Copy link
Contributor

lberki commented Jun 27, 2017

I don't think we should implement this. Bazel builds are supposed to be reproducible and it's very easy to introduce an "always run" genrule somewhere deep in your dependency graph, thus making everything that depends on it build slowly.

We also don't want to make Bazel a general purpose workflow system, so I'd much rather keep the invariant that inputs are files, full stop.

@lberki lberki closed this as completed Jun 27, 2017
@lberki
Copy link
Contributor

lberki commented Jun 27, 2017

@kchodorow : we actually special-case volatile-status.txt so that if that's the only input of an action that's changed, said action is not re-run.

@philwo
Copy link
Member

philwo commented Jun 27, 2017

We also don't want to make Bazel a general purpose workflow system

Who decides that? I personally would like to be able to use Bazel as a general purpose workflow system, especially when this bug is the only thing in the way of having that use case work.

I don't think it's kind to close this without further discussion.

Why do you think that having a mechanism to always run rules harms reproducibility of builds?

@philwo philwo reopened this Jun 27, 2017
@c-parsons
Copy link
Contributor

c-parsons commented Jun 27, 2017

Drive-by comment: Apple systems are somewhat notorious for hiding system information outside of the normal filesystem. It would be nice if our current local cc-osx and xcode repository rules could be invalidated based on the output of a script. Right now we recommend a bazel clean at user's discretion, which is a very poor alternative.

This is at least something we should consider.

@lberki
Copy link
Contributor

lberki commented Jun 28, 2017

I thought that has been decided since a good while ago; that's why we don't have the corresponding feature internally. I obviously can't link to that discussion here (I also tried to look for it but couldn't seem to find it), but that's why we don't have unconditionally evaluated genrules there, either.

We also had a thread about it on bazel-discuss (at that time, I was in the opposite corner): https://groups.google.com/forum/#!msg/bazel-discuss/L40qovu-d1s/P08-N2xZDgAJ

I haven't given a lot of deep thought to the use case of @c-parsons but it seems like it'd better be encapsulated as a repository rule. After all, that's where we capture non-hermeticity.

Routing this to core folk for another look.

@lberki lberki assigned ericfelly and unassigned lberki Jun 28, 2017
@kchodorow
Copy link
Contributor Author

kchodorow commented Jun 28, 2017

A repository rule doesn't actually help @c-parsons's case, as it still will be cached build-to-build (there's no file that changes we can depend on). But maybe repository rules would be the right place to add a "force to run every time" option.

@c-parsons
Copy link
Contributor

I admit that allowing us to have an always-run repository rule would be fine for my use-case (allowing always-run non-repository-rule functionality would be unnecessary for me)

@ericfelly ericfelly assigned haxorz and ericfelly and unassigned ericfelly Jul 18, 2017
@ericfelly
Copy link
Contributor

Nathan can you take a look at this? Thanks.

@excavador
Copy link

I definitely need always-run rule

@haxorz
Copy link
Contributor

haxorz commented Nov 2, 2017

[i hadn't set up email filters for bazel github emails until last night, so i'm only seeing this github issue now]

@lberki - what's being asked of me and/or the core team here? are you looking for another opinion on this FR? or are you looking for a statement on the viability of implementing this FR?

i think this FR is fine, as long as the documentation is very clear that this feature should be very rarely used and will be hostile to performance. i think this sort of FR is in the same category as bazel's current support for arbitrary external symlinks, which is to say "it's correct, but we don't care about optimizing this use case because we don't encourage this use case".

as for the implementation, the existing "ErrorTransienceValue" in skyframe could be generalized to the aptly named "TransienceValue", and make the ActionExecutionValue for an action from rule have a direct dep on the TransienceValue (and also have the action have a unique & random action cache key) and then everything works for free. but here's where the performance "concern" above comes in here: on incremental builds, we will unconditionally invalidate the UTC of the TransienceValue. down the line, we'll rerun the action. skyframe change pruning will still come into play for the output and UTC of this output, but at non-zero cost.

@excavador
Copy link

excavador commented Nov 2, 2017

@haxorz sounds great.

I need something like for that for following:

  • inspect actual docker images
  • inspect actual docker container
  • inspect actual process list.

I need all these things:

  • for writing build rules - for instance, I have rule which transform bunch of sql files to YAML-file. It implemented using docker (for postgresql) and pyrseas (third-party tool which inspect postgresql information schema and generate yaml file).
  • for writing tests - I should setup bunch of docker containers and processes for testing. I, obviously CAN implement it as bazel said, but it works weirdly - about 30-40 seconds just for build postgresql pg_data with database schema BEFORE I can do anything.
  • for writing run targets - for wake-up the local backend stand, which used by frontend team for integration.

Yes, I understand all penalties, why it is wrong, etcetera. But it is the last part to make bazel-only build-system for all tasks related to build & test.

@haxorz haxorz assigned lberki and unassigned haxorz Nov 2, 2017
@haxorz
Copy link
Contributor

haxorz commented Nov 2, 2017

assigning back to @lberki for prioritization and re-assignment

@lberki lberki assigned dslomov and unassigned lberki Nov 2, 2017
@lberki
Copy link
Contributor

lberki commented Nov 2, 2017

@haxorz: I wanted to get an opinion from you folks about exposing the "always execute this action" bit to Skylark. I think it's inevitable, but it's a powerful tool and I'm not sure the use cases we know of warrant the inclusion of such a footgun in Bazel.

In particular, I am worried about two things:

  • That people will use the "always run" bit as a way to get away with neglecting declaring dependencies correctly
  • That such actions will creep into every build thus slowing them down

@dslomov , what do you think? IMHO we should do it, but if we do do it, we'll eventually need diagnostics to figure out which actions during a particular build were run because they were tagged as "always run".

@aiuto
Copy link
Contributor

aiuto commented May 13, 2020

I was almost going to close this out because I am not convinced it is needed, but the discussion is a good one for posterity.
Moving to the product and starlark teams to think about the merits of this.

My alternate strawman design:
This is deliberately hard for an arbitrary rule to use it. That prevents random foot-gun.

  • bazel can read a list of paths from <WORKSPACE_TOP>/.bazel_volatile_paths
  • the file system interface treats those files as always changing. The time stamp and content for each is different on each call.
  • we can use visibility on those files to lock down their use.
  • A CI system could swap the volatile paths file before the build to get some level of determinism.
  • No starlark semantic changes.

@golvok
Copy link

golvok commented Nov 18, 2020

Just to add another possible use-case, though I kinda agree that any breaks from the ideal of ideal of repoducibility should possibly be avoided.

While transitioning to bazel from Make, it is sometimes easier to have local = 1 genrules with side-efffects that copy files into the source tree where the old Makefiles expect them. However, since these are side-effects (ie. not an output of the rule), bazel will not re-run the rule if the in-tree file gets deleted or something. For us, it would be sufficient to be able to do something like bazel run-it-anyway //path/to:genrule to fixup the tree.

This can actually be achieved by using another genrule to produce an executable script, which is then run by the original genrule. (bazel run will work or that script). However, you lose access to some features (eg. 'make' variable substitutions) and may have to double-escape quotes, etc.

@brandjon
Copy link
Member

Maybe I misunderstand this thread, but thought we already had a concept of repo rules that always run.

Is this FR only related to repository rules, and not the rest of the build (where we really should be hermetic)? If so we should deprioritize as repo rules will be deprecated by bzlmod. (Triaging under that basis.)

@brandjon brandjon added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Build-Language and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Bazel General Bazel product/strategy issues team-Starlark labels Feb 15, 2021
@brandjon brandjon changed the title Have a way to always run a rule Ability to always rerun a repo rule Feb 15, 2021
@keithl-stripe
Copy link
Contributor

Isn't the no-cache tag designed to do just this? Does that tag not always work?

@burdiyan
Copy link

@keithl-stripe I think it doesn't help here. The no-cache tag disables disk-based or remote caching, but still will only re-run the rule if inputs have changed.

@burdiyan
Copy link

burdiyan commented Sep 21, 2021

To solve some of the problems mentioned in this thread, and to balance the convenience/{correctness,speed,reproducibility} tradeoff more into the convenience side for us, I've been experimenting with some "impure" Bazel rules.

Those rules are explicitly no-cache, local, and no-sandbox. And they are escaping from the execroot into the source tree, and can execute arbitrary commands. Of course this is a huge foot gun, and you must know what you're doing (don't forget to track you inputs basically). But it's being useful for us.

Our Bazel targets are a lot more coarse-grained than usual, but it's fine for most people that are not Google's scale. Of course remote execution and shared caching is lost here. It basically works a bit like make but with support for hashing inputs/outputs, and Starlark.

To solve the problem @golvok mentions (#3041 (comment)), we symlink in-tree outputs into the Bazel's output directory. So when files disappear or modify in-tree, Bazel knows to re-run the rule. This may be non-portable outside Unix-like systems.

Using these rules, we're able to be "friends" with all the different tools from different ecosystems, trading off some of the Bazel's properties like fine-grained caching.

For example we use them to do yarn install when package.lock changes, we can generate code from protobuf and store it in-tree, all with bazel build.

Of course all of this is very against Bazel's philosophy, but I'm still surprised that it even worked.

Initially I wanted to develop a general-purpose task orchestration system, that knows to track inputs and outputs, supports hashing, Starlark, and can run arbitrary scripts. But it turned out that I was able to make Bazel do this.

Do not repeat at home! :)

@burdiyan
Copy link

One thing that still would be great to have in Bazel is the ability to declare inputs after the execution phase. I don't know if it's even possible to implement in Bazel's current architecture, but it could be useful to improve convenience for people, even without doing "impure" things I mentioned above.

Similar to Ninja's depfile, or add_dep post build hooks in Please.

So the idea is that one would only need to declare direct source files for the package, and imported sources could be declared as inputs after the fact.

For example go_binary rule could have srcs = glob["*.go"]), and after the binary is built it could run go list to declare the imported packages as dependencies dynamically, and Bazel would have to remember them as inputs too. This way when sources change, or when external packages change, the rule would re-run. But you won't have to touch your BUIDL files every time you add a new import statement in the code.

@Wyverald
Copy link
Member

"Always rerun" needs to be defined here. Is it run for every bazel build command, or every Bazel server restart? The latter is supported by repo rules already (the poorly-documented "local" attribute); the former is not AFAIK.

In any case, #3041 (comment) is relevant, despite the fact that repo rules aren't being deprecated after all. I suspect that in most cases, a repo rule that reruns for every Bazel server restart is good enough for these use cases.

@uri-canva
Copy link
Contributor

I agree that "always rerun repository rule" doesn't sound like a well defined feature, and the example use cases in this thread are very different, and some even seem to relate to rules, not repository rules. I though I'd summarise some of the existing bazel features that can help in some situations where one might want this feature:

  1. --workspace_status_command lets you pass any input you'd like into a rule, assuming you can serialise it. You can even generate files then pass the sha and path to the files you generated. Stable keys are considered inputs that affect the cache so you can use it to declare a lot of inputs that aren't handled by the usual bazel input mechanisms. One possible improvement here is for rules to be able to declare which values of the status output they depend on, right now changing any of the stable values will cause all rules that use any of the values to be rebuilt.
  2. environ in repository_rule lets you declare which environment variables the repository rule depends on, and will re-evaluate it if it changes. You can use this to pass inputs to your repository rules, or use it to pass some sha / timestamp the rule can use to track whether some input it uses has changed or not. You can generate this environment variable value in a tools/bazel wrapper.
  3. managed_directories allows you to declare directories in the source tree that can be managed by external processes (there's not much documentation for it, but see the original proposal https://docs.google.com/document/d/1uaeQpPrSH4q46zGuXLgtzMkLVXk7Er9I-P_-ScwwGLk/edit#heading=h.xgjl2srtytjt).

@brandjon brandjon added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged and removed team-Build-Language labels Nov 4, 2022
@blorente
Copy link

blorente commented Mar 2, 2023

At the risk of reviving a dead thread, I think we have a use case for "run on every bazel invocation" repository rules, as detailed in #17640:

We'd like the ability to re-run a repository rule every time your Xcode path changes (every time the result of xcode-select -p changes), so that we can configure their build accordingly.

These changes are so frequent (in some cases) and the cost of checking so low that we would be willing to pay the cost of always re-running them at every bazel invocation.

We work with a variety of teams, not all of whom can provide access to their code. We have several limitations to implement @uri-canva 's suggestions:

  • We have limited control over their bazel installation, which limits the usefulness of the tools/bazel wrapper approach to pass the value of xcode-select -p as an input through environ. We would like to avoid mandating a bazel wrapper.
  • In macOS, The actual value of xcode-select -p is stored in a part of the machine that would require disabling system integrity protection to read. We would like to avoid imposing that on all users, ruling out the managed_directories approach. Furthermore, as far as I know, managed_directories is going away anyway.

Currently, the best we can offer is a troubleshooting tool that will check that their Xcode installation is the same as the one bazel thinks it is, but it would be very nice to have things just work.

@hanikesn
Copy link

To add another use-case:
I just migrated our container image builds from rules_docker to rules_oci using their stamp_tags example to stamp our push rules with a unique tag based on the hash of the current state of the git repo to make sure we always have a unique name, when triggering a (test-) deploy locally. But caching is now too good that the tag isn't repushed all the time.

@uri-canva
Copy link
Contributor

@hanikesn See bazel-contrib/rules_oci#269. If it's not clear from the documentation, open an issue on rules_oci.

@bcsgh
Copy link

bcsgh commented Dec 14, 2023

Yet another use case: I'm generating a chain of x509 test certs that expire and need to be regenerated every day or so, even if the code uses to generate them hasn't changed. I could use bazel clean but I'd rather something that's a bit more surgical and can be used more much more often than actually needed and without causing everything to rebuild.

What I'm currently playing with is:

_BUILD = "exports_files({files})"

def _status_repository_impl(ctx):
    VERSIONS = {
      "year":   "+%Y",
      "month":  "+%Y-%m",
      "week":   "+%Gw%V",
      "day":    "+%Y-%m-%d",
      "hour":   "+%Y-%m-%d %H",
      "minute": "+%Y-%m-%d %H:%M",
      "second": "+%Y-%m-%d %H:%M:%S",
    }

    date = ctx.which("date")
    DATES = dict([
      (i, ctx.execute([date, "--universal", f]))
      for i,f in VERSIONS.items()
    ])

    err = dict([
      (i, f.stderr.strip())
      for i,f in DATES.items()
      if f.return_code
    ])
    if err: fail(err)

    for f, d in DATES.items(): ctx.file(f, d.stdout.strip())
    ctx.file("BUILD", _BUILD.format(files=str(sorted(DATES.keys()))))
    return

volatile_repository = repository_rule(
    doc = """Create a repository with files that change at predictable time.

        NOTE: Bazel makes it hard to get this to re-run every time
        because that could very easily become a foot-gun. See:
        https://github.com/bazelbuild/bazel/issues/3041

        To force the regeneration, run:
        $ bazel sync --configure
    """,
    implementation = _status_repository_impl,
    local = True,
    configure = True,
)

@burdiyan
Copy link

@bcsgh If you can afford to use a wrapper script for Bazel (which you could place in tools/bazel in your repo, to make Bazel automatically use it), the easiest way I've found to achieve something like this is to set an environment variable inside the wrapper script, with a random value on every invocation, and make the repository rule depend on that environment variable.

Be careful though if any of your build actions use default shell env, i.e. make sure you use --incompatible_strict_action_env flag, otherwise you'll invalidate lots of build actions unnecessarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests