-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to emit *.runfiles_manifest as JSON #14676
Add the ability to emit *.runfiles_manifest as JSON #14676
Conversation
49b33c9
to
2ec1149
Compare
@EdSchouten do you have a recommended reviewer for this PR? Or at least team (https://bazel.build/maintaining/maintainers-guide.html#team-labels)? I can route as make sense to me, but if we can route to anyone we already know is paying attention that offers a better chance of timely response. |
Hey Greg! Thanks for responding!
I suspect team-Core would be the best fit for this PR, considering that runfiles handling is pretty core to Bazel. Happy to bring other teams in the loop when it comes to reviewing some of the runfiles libraries changes, but we can always do that later. |
I believe I assigned Generic runfiles I think suffers from weak ownership. |
So basically, I'm trying to get this proper attention in accordance with https://bazel.build/maintaining/maintainers-guide.html#lifecycle-of-a-pull-request. Ping me specifically if you worry about responsiveness, but as a heads up I'd expect this one to take some time. |
2ec1149
to
0222c9c
Compare
The *.runfiles_manifest files currently use a pretty simple format: two pathnames separated by a space character. This can be problematic at times. On Windows, usernames may contain spaces, meaning home directories with spaces are not uncommon. If you look at the macOS ecosystem, it's also not uncommon to have applications or packages containing files with spaces in their names. The goal of this change is to start addressing this issue, by moving *.runfiles_manifest to newline delimited JSON. Runfiles libraries for each programming language will need to be adjusted to support this new format. Because not all runfiles libraries are bundled/versioned together with Bazel, those libraries will need to seamlessly support both formats. Tuples are encoded as lists, JSON entries will always start with an opening square bracket. Because paths are generally prefixed with a workspace name, it's extremely unlikely that this causes any conflict. Runfiles libraries can thus distinguish both formats as follows: if (line[0] == '[') { // Parse as JSON. } else { // Parse as legacy format. } To prevent any breakage, we place this feature behind a flag named --incompatible_json_source_manifests. Even if this flag is disabled, we emit JSON entries for paths that are not representable in the legacy format. This should at least make it possible to build targets and instantiate their runfiles directories with build-runfiles, regardless of whether the manifest can be parsed at runtime. Before considering JSON, I experimented with CSV instead. Though easier to parse, RFC 4180 requires the use of carriage return newlines. Because both Bazel and runfiles libraries tend to open runfiles manifests in text mode, this turned out to be hard to get right. Given the fact that Bazel does not permit non-printable characters and backslashes in pathnames, parsing the resulting JSON turns out to be remarkably easy. If it's not realistic to let a given runfiles library depend on a third-party library to do the JSON parsing, it's most likely acceptable to handroll a simplified parser. This PR already contains updates to build-runfiles, the Bash runfiles library and the Python runfiles library to support the new format. Updates to the C++ and Java runfiles libraries remain to be done. Issue: bazelbuild#4327
0222c9c
to
9ecb5ba
Compare
I was pinged. So I'll give this a closer look. |
Please note I'm not super-familiar with the runfiles implementation logic, so I have some pretty basic questions. Is build-runfiles the logic that Bazel uses to construct a target's How much of #4327 does this PR address? The issue is titled as runfiles-related, but I think it's actually a broader issue with spaced path names in general, of which runfiles is one major but not the only example? i.e. the failure of Do you know what's going on at #4327 (comment) - which is setting Aside from these specifics I'm curious how many logic holes exist throughout the code base w.r.t. spaces in paths, and how much harder of a problem that is vs. what I think is a subset of the problem being addressed here. |
Hey Greg, Sorry for the late response; I missed the notification.
Yes. But there are also cases in which Bazel constructs 'foo.runfiles' directories without going through 'build-runfiles'. It may construct the runfiles directory directly, based on its in-memory bookkeeping.
It by no means addresses all issues related to whitespace in filenames. For my specific use case I want to drag in a large directory hierarchy with pre-generated contents into the runfiles directory of a tool of ours. My intent is not to call
So if I understand it correctly,
Yeah, there are likely many other things that are broken. That said, we don't necessarily need to support all these use cases. My hope is that we can at least support being able to I hope that answers all your questions! |
Yes, these are great answers. Thank you. I'm caught up again and I'll summarize where I'm at. And my thoughts on trying to move this effectively forward:
So in summary I support this PR but want to run it through the team review. I'll get that scheduled in and ensure all three of us have discussed it at our next regular sync. It's a little bit of bureaucracy but we hope it ultimately helps move these changes forward more effectively. |
Hi @EdSchouten . @brandjon and I discussed this yesterday. Would you be willing to output both a If runfiles libraries aren't released with Bazel itself, we may also need them to look for the The ultimate end game could be to remove the |
Hey Greg,
Just so I get this right: Does that mean that we should also add a foo.runfiles/MANIFEST.ndjson? I suppose we should also introduce a |
Does your current patch adjust both files? Are you preferring ndjson vs. full json because it's easier to implement, or you think it's cleaner, or both? It's probably most consistent to update both
I'd suggest the latter. I'm wary of reliance of environment variables for reproducibility reasons, but maybe I'm not appreciating this use case for them. |
Yes!
NDJSON has the advantage that it can be parsed line by line. Especially for stuff like runfiles.bash it's easier if we allow that. Allowing arbitrary JSON in there (which may contain newlines at arbitrary spots) can be tricky to deal with in those languages.
Yeah, I think they also have to be in sync. I seem to remember there are some places in the tree where runfiles_manifest and runfiles/MANIFEST are compared for equality. If they don't match, the runfiles directory is rebuilt. The runfiles/MANIFEST file is thus used as a marker to indicate whether the runfiles directory is up to date.
Gotcha. Will try to get this PR updated next week. Thanks! |
Any update, @EdSchouten? Really appreciate the work you have put in here; it's going to help a lot of people! |
Hi there! Thank you for contributing to the Bazel repository. We appreciate your time and effort. We're doing a clean up of old PRs and will be closing this one since it seems to have stalled. Please feel free to reopen if you’re still interested in pursuing this or if you'd like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so. |
FYI still happy to review when there's the energy to move forward. |
@keertk @gregestren how can we reopen this? |
@matthewjh we just need ownership. Would you be willing to check around to see who might want to take this on, or if @EdSchouten wants to re-focus? Ask around on Slack, https://github.com/bazelbuild/bazel/discussions, etc? |
The *.runfiles_manifest files currently use a pretty simple format: two
pathnames separated by a space character. This can be problematic at
times. On Windows, usernames may contain spaces, meaning home
directories with spaces are not uncommon. If you look at the macOS
ecosystem, it's also not uncommon to have applications or packages
containing files with spaces in their names. The goal of this change is
to start addressing this issue, by moving *.runfiles_manifest to newline
delimited JSON.
Runfiles libraries for each programming language will need to be
adjusted to support this new format. Because not all runfiles libraries
are bundled/versioned together with Bazel, those libraries will need to
seamlessly support both formats. Tuples are encoded as lists, JSON
entries will always start with an opening square bracket. Because paths
are generally prefixed with a workspace name, it's extremely unlikely
that this causes any conflict. Runfiles libraries can thus distinguish
both formats as follows:
To prevent any breakage, we place this feature behind a flag named
--incompatible_json_source_manifests. Even if this flag is disabled, we
emit JSON entries for paths that are not representable in the legacy
format. This should at least make it possible to build targets and
instantiate their runfiles directories with build-runfiles, regardless
of whether the manifest can be parsed at runtime.
Before considering JSON, I experimented with CSV instead. Though easier
to parse, RFC 4180 requires the use of carriage return newlines. Because
both Bazel and runfiles libraries tend to open runfiles manifests in
text mode, this turned out to be hard to get right.
Given the fact that Bazel does not permit non-printable characters and
backslashes in pathnames, parsing the resulting JSON turns out to be
remarkably easy. If it's not realistic to let a given runfiles library
depend on a third-party library to do the JSON parsing, it's most likely
acceptable to handroll a simplified parser.
This PR already contains updates to build-runfiles, the Bash runfiles
library and the Python runfiles library to support the new format.
Updates to the C++ and Java runfiles libraries remain to be done.
Issue: #4327