Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow repositories that they depend on the contents of files/directories on the local file system #20952

Closed
lberki opened this issue Jan 19, 2024 · 19 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@lberki
Copy link
Contributor

lberki commented Jan 19, 2024

Description of the feature request:

An important use case for Starlark repositories is configuring the build for the local machine.

This sometimes requires reading files, calling readdir(), i.e. interacting with the local file system. In some cases, it's desirable that the repository is not re-fetched if the local state changes (if it's expensive to do so), but in some other case, it is and we currently don't provide a good mechanism to do so:

  • rctx.read() and friends can only declare dependencies on files in other repositories
  • local_repository() and new_local_repository() are only invalidated when the top-level directory of the repository changes and they can't pull in individual files (this matters for large directories e.g. /etc). The rationale behind the first limitation is that one would glob() in the repository and declare dependencies on the files below the first level that way, but that doesn't work for other repository rules.

The only workaround at the moment is to declare a dependency on an environment variable that always changes, which has other undesirable effects (see #20951).

There is a separate, ancient feature request to always re-run a repository rule (#3041) but I think that's dangerous because it results in sloppiness with declaring dependencies.

Which category does this issue belong to?

No response

What underlying problem are you trying to solve with this feature?

No response

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@lberki lberki added type: feature request P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Jan 19, 2024
@meteorcloudy
Copy link
Member

This is a blocker for #18285

@Wyverald
Copy link
Member

Linking the design doc here again: https://docs.google.com/document/d/17RZKMuMjAIgNfFdrsLhqNsMTQDSS-BBA-S-fCSBPV7M/edit# I'll start impl work now.

@lberki
Copy link
Contributor Author

lberki commented Feb 1, 2024

@ismell brought up an interesting point today: if your level of abstraction is system calls (readdir() and stat()), you need to issue a Starlark function call for each file / directory you want to watch, which in his case is about 10K, all on one thread. (because you can't do multithreading in Starlark, at least not without jumping through an excessive number of hoops, e.g. by sharding it over multiple repositories)

If your abstraction was glob() (i.e. something that's semantically "recursive"), Bazel could in theory parallelize "behind the scenes" to its heart's content.

@Wyverald Wyverald added this to Bzlmod Feb 2, 2024
@github-project-automation github-project-automation bot moved this to Todo in Bzlmod Feb 2, 2024
Wyverald added a commit that referenced this issue Feb 2, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.
copybara-service bot pushed a commit that referenced this issue Feb 6, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this issue Feb 7, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object. Refactored code to reflect that.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 7, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 7, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 8, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 9, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 13, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional boolean parameter `watch` that defaults to True. This causes the file read to be watched as well by default, even if it's not addressed by a label.
  * Both changes apply to `mctx` as well.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 13, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 13, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 13, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 13, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.

Work towards #20952.
@brentleyjones
Copy link
Contributor

Wouldn't expanded archives with fixed mtimes break the optimization, similar to what used to be broken with #14723?

@lberki
Copy link
Contributor Author

lberki commented Feb 15, 2024

They would (in fact, @tjgq and myself just ran into this yesterday). This is why I listed mtime, ctime and inode number in my comment above when calculating the space requirements for the lockfile.

@meteorcloudy
Copy link
Member

Do we think "watching the stat() of all transitive children under a tree" is a generally useful API that we'll get a second user?

I think maybe we can use it to watch the cc toolchain directory, so that local_config_cc is automatically fetched when the system compiler is updated?

@lberki
Copy link
Contributor Author

lberki commented Feb 15, 2024

For the cc toolchain use case, isn't what we are ultimately interested in the contents? (it looks like @ismell says that hat's the case for them, too, so I think the point is moot)

Wyverald added a commit that referenced this issue Feb 15, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.
copybara-service bot pushed a commit that referenced this issue Feb 15, 2024
- `rctx.watch()` now supports watching a path even if it's a directory or nonexistent.
  - a path's status changing counts triggers a refetch.
- added `path.is_dir` so that users can tell whether a path points to a directory or a file.
- added `watch_X` parameters to the following methods, with behavior similar to the `watch` parameter in `rctx.read()`
  - `rctx.extract()`, `rctx.patch()`, `rctx.template()` (for the template file), `rctx.symlink()` (for the symlink target)

Work towards #20952.

Closes #21339.

PiperOrigin-RevId: 607415094
Change-Id: Iebb6bc28174d05277a034ba3cf5e0c9bf90ce027
Wyverald added a commit that referenced this issue Feb 15, 2024
- Added a new parameter `watch` to `path.readdir()` that allows one to watch for changes in entries under a given directory.
  - only names are watched; 'entry types' are not. This somewhat matches the return value of `path.readdir()`, which only contains entry names
  - same restrictions as `rctx.watch()` in terms of which paths can be watched and which cannot; similarly for `mctx`.
- Made a big-ish refactor that eliminated the two `RepoRecordedInput.File` subclasses, and pulled the difference into a separate `RepoRecordedInput.RepoCacheFriendlyPath` instead. This new path class is used by the new `RepoRecordedInput.Dirents` as well.

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 15, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 15, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  - In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.
Wyverald added a commit that referenced this issue Feb 15, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  - In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.
copybara-service bot pushed a commit that referenced this issue Feb 16, 2024
- Added a new parameter `watch` to `path.readdir()` that allows one to watch for changes in entries under a given directory.
  - only names are watched; 'entry types' are not. This somewhat matches the return value of `path.readdir()`, which only contains entry names
  - same restrictions as `rctx.watch()` in terms of which paths can be watched and which cannot; similarly for `mctx`.
- Made a big-ish refactor that eliminated the two `RepoRecordedInput.File` subclasses, and pulled the difference into a separate `RepoRecordedInput.RepoCacheFriendlyPath` instead. This new path class is used by the new `RepoRecordedInput.Dirents` as well.

Work towards #20952.

Closes #21341.

PiperOrigin-RevId: 607772207
Change-Id: Ibba2b3389acd23e0a703818fec2cd58321a9b896
Wyverald added a commit that referenced this issue Feb 16, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  - In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.
copybara-service bot pushed a commit that referenced this issue Feb 20, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  -  In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.

Closes #21362.

PiperOrigin-RevId: 608667062
Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
@github-project-automation github-project-automation bot moved this from Todo to Done in Bzlmod Feb 20, 2024
Wyverald added a commit that referenced this issue Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this issue Feb 20, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

Work towards #20952.

Closes #21230.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.
PiperOrigin-RevId: 607097422
Change-Id: Ib96a49461deddd7f4c786bd1c227de18b2dd71d7
Wyverald added a commit that referenced this issue Feb 20, 2024
- `rctx.watch()` now supports watching a path even if it's a directory or nonexistent.
  - a path's status changing counts triggers a refetch.
- added `path.is_dir` so that users can tell whether a path points to a directory or a file.
- added `watch_X` parameters to the following methods, with behavior similar to the `watch` parameter in `rctx.read()`
  - `rctx.extract()`, `rctx.patch()`, `rctx.template()` (for the template file), `rctx.symlink()` (for the symlink target)

Work towards #20952.

Closes #21339.

PiperOrigin-RevId: 607415094
Change-Id: Iebb6bc28174d05277a034ba3cf5e0c9bf90ce027
Wyverald added a commit that referenced this issue Feb 20, 2024
- Added a new parameter `watch` to `path.readdir()` that allows one to watch for changes in entries under a given directory.
  - only names are watched; 'entry types' are not. This somewhat matches the return value of `path.readdir()`, which only contains entry names
  - same restrictions as `rctx.watch()` in terms of which paths can be watched and which cannot; similarly for `mctx`.
- Made a big-ish refactor that eliminated the two `RepoRecordedInput.File` subclasses, and pulled the difference into a separate `RepoRecordedInput.RepoCacheFriendlyPath` instead. This new path class is used by the new `RepoRecordedInput.Dirents` as well.

Work towards #20952.

Closes #21341.

PiperOrigin-RevId: 607772207
Change-Id: Ibba2b3389acd23e0a703818fec2cd58321a9b896
Wyverald added a commit that referenced this issue Feb 20, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  -  In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.

Closes #21362.

PiperOrigin-RevId: 608667062
Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
Wyverald added a commit that referenced this issue Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this issue Feb 20, 2024
* Starlark API changes
  * New method `rctx.watch()` that watches a path. As of right now, said path must exist and be a regular file, but that will change in follow-ups.
  * Existing method `rctx.read()` gets a new optional string parameter `watch` that defaults to `auto`. This causes the file read to be watched as well by default, even if it's not addressed by a label. See `rctx.read()` docs for the semantics of `watch`.
  * Both changes apply to `mctx` as well, except that no files outside the current Bazel workspace may be watched for `mctx`.
* Marker file format changes
  * `FILE:` marker file entries can now be `FILE:` followed by either an absolute path (starts with a singular '/'), or a label-like ... thing (which is a label but has a slash in place of a colon). This is because we might be watching something inside a repo that is _not_ under a package.
  * Same change will happen to the `accumulatedFileDigests` sections of the module lockfile.
* Code changes
  * `RepoRecordedInput.File` subclasses are renamed to `FileOutsideWorkspace` (absolute paths) and `FileInsideWorkspace` (label-like things) instead to better reflect their meaning.
  * Unified the file watching/digest checking logic in SingleExtensionEvalFunction with the RepoRecordedInput stuff. Follow-ups will unify more.
  * Moved RepoRecordedInput to its own BUILD target as it's being used outside the repo fetching machinery (namely, module extensions)
  * Since `FileOutsideWorkspace` has to be an absolute path (never relative), we don't need a "base directory" for the RepoRecordedInput parser. What we really need is just the FileSystem object, which we can get from the PathPackageLocator available in Skyframe. Refactored code to reflect that.

Work towards #20952.

Closes #21230.

RELNOTES: Added a new method, `repository_ctx.watch()`, which asks Bazel to watch for changes to an arbitrary file. When said file changes, Bazel will refetch the repo.
PiperOrigin-RevId: 607097422
Change-Id: Ib96a49461deddd7f4c786bd1c227de18b2dd71d7
Wyverald added a commit that referenced this issue Feb 20, 2024
- `rctx.watch()` now supports watching a path even if it's a directory or nonexistent.
  - a path's status changing counts triggers a refetch.
- added `path.is_dir` so that users can tell whether a path points to a directory or a file.
- added `watch_X` parameters to the following methods, with behavior similar to the `watch` parameter in `rctx.read()`
  - `rctx.extract()`, `rctx.patch()`, `rctx.template()` (for the template file), `rctx.symlink()` (for the symlink target)

Work towards #20952.

Closes #21339.

PiperOrigin-RevId: 607415094
Change-Id: Iebb6bc28174d05277a034ba3cf5e0c9bf90ce027
Wyverald added a commit that referenced this issue Feb 20, 2024
- Added a new parameter `watch` to `path.readdir()` that allows one to watch for changes in entries under a given directory.
  - only names are watched; 'entry types' are not. This somewhat matches the return value of `path.readdir()`, which only contains entry names
  - same restrictions as `rctx.watch()` in terms of which paths can be watched and which cannot; similarly for `mctx`.
- Made a big-ish refactor that eliminated the two `RepoRecordedInput.File` subclasses, and pulled the difference into a separate `RepoRecordedInput.RepoCacheFriendlyPath` instead. This new path class is used by the new `RepoRecordedInput.Dirents` as well.

Work towards #20952.

Closes #21341.

PiperOrigin-RevId: 607772207
Change-Id: Ibba2b3389acd23e0a703818fec2cd58321a9b896
Wyverald added a commit that referenced this issue Feb 20, 2024
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.
  -  In the future we could add glob patterns to this method.
- Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting.
  - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044)

Work towards #20952.

Closes #21362.

PiperOrigin-RevId: 608667062
Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
Archived in project
Development

No branches or pull requests

8 participants