-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to always rerun a repo rule #3041
Comments
I don't think we should implement this. Bazel builds are supposed to be reproducible and it's very easy to introduce an "always run" genrule somewhere deep in your dependency graph, thus making everything that depends on it build slowly. We also don't want to make Bazel a general purpose workflow system, so I'd much rather keep the invariant that inputs are files, full stop. |
@kchodorow : we actually special-case |
Who decides that? I personally would like to be able to use Bazel as a general purpose workflow system, especially when this bug is the only thing in the way of having that use case work. I don't think it's kind to close this without further discussion. Why do you think that having a mechanism to always run rules harms reproducibility of builds? |
Drive-by comment: Apple systems are somewhat notorious for hiding system information outside of the normal filesystem. It would be nice if our current local cc-osx and xcode repository rules could be invalidated based on the output of a script. Right now we recommend a This is at least something we should consider. |
I thought that has been decided since a good while ago; that's why we don't have the corresponding feature internally. I obviously can't link to that discussion here (I also tried to look for it but couldn't seem to find it), but that's why we don't have unconditionally evaluated genrules there, either. We also had a thread about it on bazel-discuss (at that time, I was in the opposite corner): https://groups.google.com/forum/#!msg/bazel-discuss/L40qovu-d1s/P08-N2xZDgAJ I haven't given a lot of deep thought to the use case of @c-parsons but it seems like it'd better be encapsulated as a repository rule. After all, that's where we capture non-hermeticity. Routing this to core folk for another look. |
A repository rule doesn't actually help @c-parsons's case, as it still will be cached build-to-build (there's no file that changes we can depend on). But maybe repository rules would be the right place to add a "force to run every time" option. |
I admit that allowing us to have an always-run repository rule would be fine for my use-case (allowing always-run non-repository-rule functionality would be unnecessary for me) |
Nathan can you take a look at this? Thanks. |
I definitely need always-run rule |
[i hadn't set up email filters for bazel github emails until last night, so i'm only seeing this github issue now] @lberki - what's being asked of me and/or the core team here? are you looking for another opinion on this FR? or are you looking for a statement on the viability of implementing this FR? i think this FR is fine, as long as the documentation is very clear that this feature should be very rarely used and will be hostile to performance. i think this sort of FR is in the same category as bazel's current support for arbitrary external symlinks, which is to say "it's correct, but we don't care about optimizing this use case because we don't encourage this use case". as for the implementation, the existing "ErrorTransienceValue" in skyframe could be generalized to the aptly named "TransienceValue", and make the ActionExecutionValue for an action from rule have a direct dep on the TransienceValue (and also have the action have a unique & random action cache key) and then everything works for free. but here's where the performance "concern" above comes in here: on incremental builds, we will unconditionally invalidate the UTC of the TransienceValue. down the line, we'll rerun the action. skyframe change pruning will still come into play for the output and UTC of this output, but at non-zero cost. |
@haxorz sounds great. I need something like for that for following:
I need all these things:
Yes, I understand all penalties, why it is wrong, etcetera. But it is the last part to make bazel-only build-system for all tasks related to build & test. |
assigning back to @lberki for prioritization and re-assignment |
@haxorz: I wanted to get an opinion from you folks about exposing the "always execute this action" bit to Skylark. I think it's inevitable, but it's a powerful tool and I'm not sure the use cases we know of warrant the inclusion of such a footgun in Bazel. In particular, I am worried about two things:
@dslomov , what do you think? IMHO we should do it, but if we do do it, we'll eventually need diagnostics to figure out which actions during a particular build were run because they were tagged as "always run". |
I was almost going to close this out because I am not convinced it is needed, but the discussion is a good one for posterity. My alternate strawman design:
|
Just to add another possible use-case, though I kinda agree that any breaks from the ideal of ideal of repoducibility should possibly be avoided. While transitioning to bazel from Make, it is sometimes easier to have This can actually be achieved by using another genrule to produce an executable script, which is then run by the original genrule. ( |
Maybe I misunderstand this thread, but thought we already had a concept of repo rules that always run. Is this FR only related to repository rules, and not the rest of the build (where we really should be hermetic)? If so we should deprioritize as repo rules will be deprecated by bzlmod. (Triaging under that basis.) |
Isn't the |
@keithl-stripe I think it doesn't help here. The |
To solve some of the problems mentioned in this thread, and to balance the convenience/{correctness,speed,reproducibility} tradeoff more into the convenience side for us, I've been experimenting with some "impure" Bazel rules. Those rules are explicitly no-cache, local, and no-sandbox. And they are escaping from the execroot into the source tree, and can execute arbitrary commands. Of course this is a huge foot gun, and you must know what you're doing (don't forget to track you inputs basically). But it's being useful for us. Our Bazel targets are a lot more coarse-grained than usual, but it's fine for most people that are not Google's scale. Of course remote execution and shared caching is lost here. It basically works a bit like To solve the problem @golvok mentions (#3041 (comment)), we symlink in-tree outputs into the Bazel's output directory. So when files disappear or modify in-tree, Bazel knows to re-run the rule. This may be non-portable outside Unix-like systems. Using these rules, we're able to be "friends" with all the different tools from different ecosystems, trading off some of the Bazel's properties like fine-grained caching. For example we use them to do Of course all of this is very against Bazel's philosophy, but I'm still surprised that it even worked. Initially I wanted to develop a general-purpose task orchestration system, that knows to track inputs and outputs, supports hashing, Starlark, and can run arbitrary scripts. But it turned out that I was able to make Bazel do this. Do not repeat at home! :) |
One thing that still would be great to have in Bazel is the ability to declare inputs after the execution phase. I don't know if it's even possible to implement in Bazel's current architecture, but it could be useful to improve convenience for people, even without doing "impure" things I mentioned above. Similar to Ninja's depfile, or So the idea is that one would only need to declare direct source files for the package, and imported sources could be declared as inputs after the fact. For example |
"Always rerun" needs to be defined here. Is it run for every In any case, #3041 (comment) is relevant, despite the fact that repo rules aren't being deprecated after all. I suspect that in most cases, a repo rule that reruns for every Bazel server restart is good enough for these use cases. |
I agree that "always rerun repository rule" doesn't sound like a well defined feature, and the example use cases in this thread are very different, and some even seem to relate to rules, not repository rules. I though I'd summarise some of the existing bazel features that can help in some situations where one might want this feature:
|
At the risk of reviving a dead thread, I think we have a use case for "run on every bazel invocation" repository rules, as detailed in #17640: We'd like the ability to re-run a repository rule every time your Xcode path changes (every time the result of These changes are so frequent (in some cases) and the cost of checking so low that we would be willing to pay the cost of always re-running them at every bazel invocation. We work with a variety of teams, not all of whom can provide access to their code. We have several limitations to implement @uri-canva 's suggestions:
Currently, the best we can offer is a troubleshooting tool that will check that their Xcode installation is the same as the one bazel thinks it is, but it would be very nice to have things just work. |
To add another use-case: |
@hanikesn See bazel-contrib/rules_oci#269. If it's not clear from the documentation, open an issue on |
Yet another use case: I'm generating a chain of x509 test certs that expire and need to be regenerated every day or so, even if the code uses to generate them hasn't changed. I could use What I'm currently playing with is:
|
@bcsgh If you can afford to use a wrapper script for Bazel (which you could place in Be careful though if any of your build actions use default shell env, i.e. make sure you use |
I could have sworn that the
stamp
attribute combined with--stamp
and--workspace_status_command
did this, but the rule isn't being run even thoughbazel-out/volatile-status.txt
changes. See https://groups.google.com/d/msgid/bazel-discuss/c1a4f036-19e6-45e1-b1eb-e6c280f3c6a5%40googlegroups.com.The text was updated successfully, but these errors were encountered: