-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make transition directory name fragment reflect output directory name #13587
Conversation
If output directory name is overriden, the output path does no longer consists of the transition directory name fragment. The calculation of transition directory name fragment does not reflect this situation and instead we can get conflicts due to this name itself. This patch makes sure that a the transition directory name fragment gets a unique hash for a given output directory name.
I read #12171 (comment) but can you clarify more precisely what problem you're addressing? Broadly speaking we'd like to keep the choice of output directory names outside of the public API as much as possible, to maximize the ability to optimize it (and do so methodically and correctly) from within Bazel's core algorithms. |
@torgil |
@gregestren We have used "output directory name" transition from the start, besides current issues and the different set of problems we faced then it gives us flexibility with naming conventions to avoid duplicated actions (like generated platform independent files ending up as platform dependent actions due to its path). It also allows for more predictable output-paths. Since you have removed this feature we'll review the situation again and create a separate issue for this.
@aiuto Agree. Now we might have to take a step back to reintroduce the possibility to set output directory name (see above). The idea is that we don't use the ST-suffix and don't want it to affect the configuration hash. I tried to revert this patch and got hit by #14023 |
I'm sorry, I didn't realize The combined work @sdtwigg is doing (to make transition ordering sequences irrelevant) and me (to support platform-independent output paths) I think covers your uses cases. We're trying to tackle those in more deeply principled ways. But that comes at the cost of much more involved work. What kind of platform-independent files do you have? Of course I strongly prefer the longer term, more principled solutions. But I recognize they have unknown ETAs and the issues you experience happen now. Making sense of the right balance there is IMO the crucial question. |
@torgil We were also using The best solution I have found (which seems to work ok, but isn't ideal) is to define an This is ugly for a few reasons:
@gregestren / @sdtwigg I think the ideal solution here is to define As a (maybe) simpler fallback, just allowing a transition to directly specify |
@gregestren This sounds analog to "output directory name". Are there any documentation / discussions on this available?
Mostly generated source files (test mocks, proto files, ...), code metrics, documentation and other metadata.
@eric-skydio What scenario is that workaround solving? |
This workaround solves the problem where we have generated source code that is depended on from targets in several different configurations, and we want the code generation to run only once, rather than separately several times with different output roots. The workaround I provided accomplishes that by always transitioning to a "standard" configuration, making a diamond in the configuration graph and also saving analysis time. That standard configuration ends up with an output root of This is preferable to allowing different configurations and just forcing the output root to match, because it avoids analyzing the targets twice and means we don't need to worry about accidentally producing different generated code in some circumstances and breaking build hermeticity. |
This is exactly what I'd like to do. I don't think the configuration should depend on the sequence of transitions that got you there. I've been thinking default values vs. command-line set values (i.e. with |
No. But we can formalize our ideas more. Let me talk with @sdtwigg to get that started. I keep on seeing themes of you, @eric-skydio and others basically reinventing the output path algorithm with horrible hacks. This makes me further think we should settle on a principled core algorithm and do that. The change as I see it would not offer per-action config paths. We can explore that, but that may be easy or complicated depending on exactly what you want out of it. For IDEs, it would at least be more stable in terms of the top-level configuration - and any configuration that comes back to the top-level configuration after transitions - would be hash-free. What else would you want from the IDE perspective? |
It sounds like we're on the same page, and I'm excited to get away from the hack that was directly specifying the output directory and towards a transition that actually modifies the configuration in the desired ways. Ideally, I should be able to use a Starlark transition to return to any hashfree directory name produced by a native configuration change, for example a Starlark transition that modifies only the CPU value should result in a hash-free directory, since CPU is already encoded in the output directory name (assuming the configuration exactly matches what would be produced by just specifying the --cpu flag), and similar for compilation mode and Python version which are also directly encoded in the name. The trickier case is when The reason I suggested comparing against the command line configuration is to ensure that |
For reference, One use case for Transitioning from the command line parameters to something else and then back again does not take you back to If we know that certain flags do not affect the commands, then |
OK. Does this mean you transition
Sounds good. Be aware that these "horrible hacks" can include domain/implementation specific components. If we have for instance a build setting affecting only D below, we might want to include that in the output path in D only to avoid duplicated actions in A, B, C, E.
The issue I'm thinking of is rules that generate several actions with different configuration dependencies (like generate + compile). There is a cost associated with splitting these rules to several rules and cluttering the target graph.
Without having the whole picture, I guess one common use case is to be able to configure "find coverage files in this directory" or "find generated sources in this directory" in a way that hopefully survives a few rebases. A separate build setting for this use-case may be acceptable. |
@torgil Yes, we are successfully transitioning that on master, as well as
@moroten It is currently the case that a starlark transition cannot return to
The invariant you must maintain to do this safely is actually quite a bit more strict than that. In particular, because D is affected by the build setting, its outputs must go into separate directories for each configuration, which almost certainly means C will need to run twice no matter what (once with each set of inputs from D). The only (safe) efficiency wins come if you can promise that the output of C will not depend on which input files it was provided with from D, but in that case you probably should either not depend on D at all, or (once the changes proposed here land) use a transition to reset any configurations differences that don't matter, so C can always be built in a single configuration with a single output root. The edge cases where this doesn't work relate to cases where there are important analysis-time behavior differences between rules (probably runfiles-related) which don't cause any execution-time behavior differences in one or more of the actions they create, which is important to solve and does come up for us occasionally. Having a way to unsafely force the output root for those cases (particularly if we could do it per-file or per-action) would still be nice, but is a separate issue. In practice, we've found the invariant above to be sufficiently subtle that developers get it wrong more often than they get it right, which is why I'm excited to switch to a safe option that actually normalizes the configuration. |
Will this apply for exec hashes as well? Given that build settings set on the command-line doesn't add a ST-hash this would require the ability for a transition to set the top-level value rather than the default value (as suggested in another thread).
True, with the exception that it's sufficient with separate output paths for all produced outputs including filenames. For some use-cases, for instance "debug compile subsystem D", the build setting value will (for that subsystem) not add additional output directory naming complexity compared to "-c dbg". C includes D and has to be duplicated if the build setting is changed above (not controlled by) C Edit: Today both A and C needs to take care of these build settings and reset them on other dependencies to avoid conflicts or duplicated actions (in B, E). The root cause for this seems to be that the action conflicts due to conflicting configurations. A solution should be to not require configuration hash to be equal which also makes this PR redundant. |
I would like de-duplicate discussion here to #14023 (comment) (I'll leave you to close this if you agree.) Note that |
@eric-skydio I'm following @sdtwigg 's advice to consolidate at #14023 (comment). I'm especially curious for your and @fmeum 's buy-in since you've both been closely involved with the discussion. If anything at the other bug misses a concern please comment! |
@sdtwigg Good job. There are two other issues remaining in the discussion here:
I wrote #14236 to clarify and possibly solve these issues (would be a nice fit for 2). Together with #14023 (both making this PR obsolete), I agree to close this issue. |
If output directory name is overriden, the output path does no longer
consists of the transition directory name fragment.
The calculation of transition directory name fragment does not reflect
this situation and instead we can get conflicts due to this name itself.
This patch makes sure that a the transition directory name fragment
gets a unique hash for a given output directory name.