-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch arbitrary file in repo rules #21230
Conversation
b4272a0
to
5f88d98
Compare
cc @ismell It's a long change so before poring over it, I have some high-level questions:
|
No I haven't -- and I'm not sure what to benchmark against, what corpus to use, etc. If you have any pointers, please let me know!
Yeah this is just the first in a series of PRs. I plan to add watching to
Hmm. It might do, it might not. The reason is not really important but for eg. the module extension part we'd need multiple dicts instead of just the one.
|
An interesting benchmark that doubles as a pretty useful feature could be to add watching of built-in include directories to the auto-configured C++ toolchain (to remove the need to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brain seems to be on leave but I tried my best.
src/main/java/com/google/devtools/build/lib/rules/repository/RepoRecordedInput.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/rules/repository/RepositoryDelegatorFunction.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/rules/repository/RepoRecordedInput.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/rules/repository/RepoRecordedInput.java
Outdated
Show resolved
Hide resolved
...ava/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java
Show resolved
Hide resolved
@@ -444,6 +445,20 @@ public Location read(JsonReader jsonReader) throws IOException { | |||
} | |||
} | |||
|
|||
private static final TypeAdapter<RepoRecordedInput.File> REPO_RECORDED_INPUT_FILE_TYPE_ADAPTER = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same vague dissatisfaction with this as with time stamps in lockfiles: this encodes non-hermetic information in lockfiles, which are supposed to be checked in. Although in this case, one could say that that could be avoid by carefully not depending on absolute paths...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have two mechanisms for non-hermetic operations, repository rules and module extensions, we should think about where we want to draw the boundary between them. If module extensions are meant to be optimized for "reproducibility in the face of non-hermeticity", i.e., locking down the result of potentially non-hermetic operations in a way that is portable across machines via the lockfile, then we may want to purposefully limit their capabilities.
How about adding the watch capability to repo rules only for now? We can always gather some data from user feedback before we extend it to module extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good points, and I'm not totally sure what's the best way forward here. It's awkward to add watch only to repo rules because rctx.read(watch=True)
is in StarlarkBaseExternalContext
which is shared by both contexts. We could alternatively just filter out any "absolute file" watches when we write the lockfile. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very knowledgeable about the philosophical difference between module extensions and repositories, but at least as far as the implementation goes, it'd be pretty easy to add an "access to absolute paths is allowed" flag to StarlarkBaseExternalContext
, which would be set to "true" for repositories and "false" for module extensions.
Between filtering out absolute file watches and erroring out when they are attempted, I'd rather we do the latter because if we ever decide that this is a useful thing, it's much more easy to turn an error into a non-error than to subtly change the behavior of code that Bazel already accepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the situation is a bit more nuanced -- I'm adding a boolean parameter watch
to rctx.read
which defaults to True. That is, any file read by rctx.read
will be watched, be it a label-addressed file or an absolute path file.
Now we can't really just make it an error to attempt to watch absolute path files in module extensions, because it is legal to read an absolute path file in module extensions today. And yes, we could make watch
default to True for repo rules and False for module extensions, but I'm not sure that's better (as we would by default not watch even files inside the workspace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my newest idea: make rctx.read(watch)
a tristate: true - always watch (same as today's rctx.read(...); rctx.watch(...);
); false - never watch; auto - watch if appropriate ("appropriate" = it's a file inside the workspace OR we're in a repo rule context). Defaults to 'auto'. WDYT? (EDIT: implemented this, see code for concept.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about watch()
defaulting to false everywhere, at least for the time being? That would mean that there is no behavior change and it would also resolve the module extension / repository conundrum.
re: your tristate approach, I'm neutral; I think it's a bit of DWIM, but only a bit so I could live with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to enable watch by default because the only change in behavior is that we refetch a bit more often, which is really an improvement in correctness. It also reduces surprises -- people have been asking why label-addressed files are watched but others aren't.
src/main/java/com/google/devtools/build/lib/rules/repository/RepoRecordedInput.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/rules/repository/RepoRecordedInput.java
Show resolved
Hide resolved
I'm hoping to use the So if you wanted a test, you can checkout the chromiumos repository. I would recommend cloning the internal manifest. You can then try "watching" the following directories:
That would be the worst case. I'm hoping we can return the specific directories ("overlays") we looked at when generating the repository so we can narrow the scope. We should probably ignore the |
2f16de2
to
3cc0dcc
Compare
FileValue fileValue = | ||
(FileValue) env.getValueOrThrow(FileValue.key(pair.second), IOException.class); | ||
if (fileValue == null) { | ||
throw new NeedsSkyframeRestartException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that if you watch N files previously unseen, it's O(N^2) time and re-doing the rest of the work the repository does N times? You could get around this by just recording the watch request, then requesting all FileValue
s at the end in one single getValues()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only O(N^2) if we actually do restart skyframe; we're now using loom by default so it's actually linear. Given Loom will be the way forward, I don't want to spend more engineering time for the no-Loom case (i.e. the one-single-getValues()-call-at-the-end thing); in fact I want to remove more workarounds for repo restarts (e.g. enforceLabelAttributes
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to say that there is still one little corner case where we can't ignore JVMs that don't support virtual threads, but it turns out that that use case went away a few hours ago, so I'm fine with this compromise.
cc @zhengwei143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it turns out that that use case went away a few hours ago
just out of curiosity, what is that use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builds of the Android kernel. I got word overnight that they finished their upgrade to JVM 21.
The only hard blocker I see is the O(n^2) behavior I mentioned above. I'd prefer watching to default to false so as not to rock the boat in 7.x, but I'd accept a decision to the contrary. |
d0bbb74
to
b6a4f7c
Compare
* Starlark API changes * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups. * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`. * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`. * Marker file format changes * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package. * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile. * Code changes * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning. * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more. * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions) * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that. RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo. Work towards #20952.
b6a4f7c
to
463cd52
Compare
* Starlark API changes * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups. * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`. * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`. * Marker file format changes * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package. * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile. * Code changes * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning. * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more. * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions) * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that. Work towards #20952. Closes #21230. RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo. PiperOrigin-RevId: 607097422 Change-Id: Ib96a49461deddd7f4c786bd1c227de18b2dd71d7
* Starlark API changes * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups. * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`. * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`. * Marker file format changes * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package. * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile. * Code changes * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning. * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more. * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions) * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that. Work towards #20952. Closes #21230. RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo. PiperOrigin-RevId: 607097422 Change-Id: Ib96a49461deddd7f4c786bd1c227de18b2dd71d7
rctx.watch()
that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.rctx.read()
gets a new optional boolean parameterwatch
that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.mctx
as well.FILE:
marker file entries can now beFILE:
followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is not under a package.accumulatedFileDigests
sections of the module lockfile.RepoRecordedInput.File
subclasses are renamed toFileOutsideWorkspace
(absolute paths) andFileInsideWorkspace
(label-like things) instead to better reflect their meaning.FileOutsideWorkspace
has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.Work towards #20952.