Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hint for indicating that a Tree is topologically sorted #230

Merged
merged 2 commits into from
Oct 21, 2022

Conversation

EdSchouten
Copy link
Collaborator

@EdSchouten EdSchouten commented Sep 30, 2022

I'm currently trying to improve the performance of handling of large output directories (Tree messages), having sizes in the order of hundreds of megabytes. In the process, I have realised that there is a lot of value in enforcing that the Directory messages contained in them are topologically sorted. Two practical use cases:

  • When instantiating the contents of a Tree on a local file system, having the Tree be topologically sorted allows you to immediately create files and directories in the right place.

  • When needing to resolve the properties of a single file by path, a topologically sorted Tree permits resolution (of paths without ..) by doing a simple forward scan.

Especially when other features like compression are taken into account, it's useful if Tree messages can be processed in a streaming manner.

One practical issue is that most Protobuf libraries don't offer APIs for processing messages in a streaming manner. This means that implementors who want to achieve these optimisations will need to write their own message parsers; at least for the Tree itself. To make this as painless as possible, we also require that the Tree is stored in some normal form.

Fixes: #229

@EdSchouten EdSchouten force-pushed the eschouten/20220930-toposort branch 2 times, most recently from dac402b to 0266a4b Compare September 30, 2022 08:11
@EdSchouten
Copy link
Collaborator Author

Cc @werkt, as he also wanted to make improvements to Tree in #159.

@EdSchouten EdSchouten force-pushed the eschouten/20220930-toposort branch from 0266a4b to e81e679 Compare October 1, 2022 16:25
@EdSchouten
Copy link
Collaborator Author

Also cc @peterebden, as he made changes in this area as well in #223

@EdSchouten EdSchouten force-pushed the eschouten/20220930-toposort branch 2 times, most recently from 01a52cc to cd66598 Compare October 2, 2022 17:42
@EdSchouten EdSchouten force-pushed the eschouten/20220930-toposort branch from cd66598 to e37ee76 Compare October 11, 2022 14:47
@EdSchouten
Copy link
Collaborator Author

Thanks everyone for your time discussing this proposal today. I've made the change suggested by @sstriker, documenting that manual construction of Tree objects without using the Protobuf library is recommended.

Copy link
Contributor

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that this implies a limit of 16(?) possible fields in this message- I think it might be worth mentioning that so we don't forget about it. Otherwise this lgtm.

@EdSchouten
Copy link
Collaborator Author

You mentioned that this implies a limit of 16(?) possible fields in this message- I think it might be worth mentioning that so we don't forget about it. Otherwise this lgtm.

I think this was @bergsieker asking whether or not we should require that the tag is encoded as a single byte. In my opinion this is acceptable, for the following reason.

The tag is essentially just a variable length integer, having value (fieldNumber << 3) | fieldType. In my proposal I'm stating that the field number can only be 1 or 2, with no room for future extensions. This means that the tag can only be a 5-bit number, which can always be encoded in a single byte.

I have just extended the phrasing to clarify this.

I'm currently trying to improve the performance of handling of large
output directories (Tree messages), having sizes in the order of
hundreds of megabytes. In the process, I have realised that there is a
lot of value in enforcing that the Directory messages contained in them
are topologically sorted. Two practical use cases:

- When instantiating the contents of a Tree on a local file system,
  having the Tree be topologically sorted allows you to immediately
  create files and directories in the right place.

- When needing to resolve the properties of a single file by path, a
  topologically sorted Tree permits resolution by doing a simple forward
  scan.

Especially when other features like compression are taken into account,
it's useful if Tree messages can be processed in a streaming manner.

One practical issue is that most Protobuf libraries don't offer APIs for
processing messages in a streaming manner. This means that implementors
who want to achive these optimisations will need to write their own
message parsers; at least for the Tree tree itself. To make this as
painless as possible, we also require that the Tree is stored in some
normal form.

Fixes: bazelbuild#229
@EdSchouten EdSchouten force-pushed the eschouten/20220930-toposort branch from e37ee76 to c3e4d01 Compare October 13, 2022 07:35
EdSchouten added a commit to EdSchouten/bazel that referenced this pull request Oct 13, 2022
remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230
@EdSchouten
Copy link
Collaborator Author

Draft PR against Bazel to start generating topologically sorted trees: bazelbuild/bazel#16463

I will mark it ready for review as soon as this PR lands.

EdSchouten added a commit to buildbarn/bb-remote-execution that referenced this pull request Oct 14, 2022
I have submitted the following PR against remote-apis, which adds a hint
to ActionResult to indicate that the Directory messages contained in a
Tree are topologically sorted:

bazelbuild/remote-apis#230

The advantage of having Tree objects in this form is that it makes it
possible by doing instantiation of Tree objects on a local file system
using a simple forward scan.

As this change hasn't been merged yet, this comment only adds the logic
for generating topologically sorted Tree objects. I will only add the
code for announcing the hint as soon as the linked PR gets merged.
EdSchouten added a commit to buildbarn/bb-remote-execution that referenced this pull request Oct 14, 2022
I have submitted the following PR against remote-apis, which adds a hint
to ActionResult to indicate that the Directory messages contained in a
Tree are topologically sorted:

bazelbuild/remote-apis#230

The advantage of having Tree objects in this form is that it makes it
possible by doing instantiation of Tree objects on a local file system
using a simple forward scan.

As this change hasn't been merged yet, this comment only adds the logic
for generating topologically sorted Tree objects. I will only add the
code for announcing the hint as soon as the linked PR gets merged.
@EdSchouten
Copy link
Collaborator Author

@bergsieker Would you be interested in merging this PR for me? Thanks!

@bergsieker bergsieker removed the request for review from buchgr October 21, 2022 20:39
@bergsieker bergsieker merged commit f608918 into bazelbuild:main Oct 21, 2022
@EdSchouten EdSchouten deleted the eschouten/20220930-toposort branch October 23, 2022 02:14
EdSchouten added a commit to EdSchouten/bazel that referenced this pull request Oct 23, 2022
remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230
EdSchouten added a commit to EdSchouten/bazel that referenced this pull request Oct 27, 2022
remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230
copybara-service bot pushed a commit to bazelbuild/bazel that referenced this pull request Oct 27, 2022
remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Partial commit for third_party/*, see #16463.

Signed-off-by: Sunil Gowroji <[email protected]>
copybara-service bot pushed a commit to bazelbuild/bazel that referenced this pull request Nov 9, 2022
remote-apis PR 230 added a way where producers of Tree messages can    indicate that the directories contained within are stored in topological    order. The advantage of using such an ordering is that it permits    instantiation of such objects onto a local file system in a streaming    fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at    least modifies Bazel's REv2 client to emit topologically sorted trees.    This makes it possible for tools such as Buildbarn's bb-browser to    process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Closes #16463.

PiperOrigin-RevId: 487196375
Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
ShreeM01 pushed a commit to bazelbuild/bazel that referenced this pull request Dec 1, 2022
remote-apis PR 230 added a way where producers of Tree messages can    indicate that the directories contained within are stored in topological    order. The advantage of using such an ordering is that it permits    instantiation of such objects onto a local file system in a streaming    fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at    least modifies Bazel's REv2 client to emit topologically sorted trees.    This makes it possible for tools such as Buildbarn's bb-browser to    process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Closes #16463.

PiperOrigin-RevId: 487196375
Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b
ShreeM01 pushed a commit to bazelbuild/bazel that referenced this pull request Dec 1, 2022
remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Partial commit for third_party/*, see #16463.

Signed-off-by: Sunil Gowroji <[email protected]>
meteorcloudy pushed a commit to bazelbuild/bazel that referenced this pull request Dec 2, 2022
* Emit Tree objects in topological order

remote-apis PR 230 added a way where producers of Tree messages can    indicate that the directories contained within are stored in topological    order. The advantage of using such an ordering is that it permits    instantiation of such objects onto a local file system in a streaming    fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at    least modifies Bazel's REv2 client to emit topologically sorted trees.    This makes it possible for tools such as Buildbarn's bb-browser to    process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Closes #16463.

PiperOrigin-RevId: 487196375
Change-Id: Iafcfd617fc101fec7bfa943552113ce57ab8041b

* Emit Tree objects in topological order

remote-apis PR 230 added a way where producers of Tree messages can
indicate that the directories contained within are stored in topological
order. The advantage of using such an ordering is that it permits
instantiation of such objects onto a local file system in a streaming
fashion. The same holds for lookups of individual paths.

Even though Bazel currently does not gain from this, this change at
least modifies Bazel's REv2 client to emit topologically sorted trees.
This makes it possible for tools such as Buildbarn's bb-browser to
process them more efficiently.

More details:
- bazelbuild/remote-apis#229
- bazelbuild/remote-apis#230

Partial commit for third_party/*, see #16463.

Signed-off-by: Sunil Gowroji <[email protected]>

Signed-off-by: Sunil Gowroji <[email protected]>
Co-authored-by: Ed Schouten <[email protected]>
upmorpheus pushed a commit to upmorpheus/Golang-dev that referenced this pull request Oct 14, 2023
I have submitted the following PR against remote-apis, which adds a hint
to ActionResult to indicate that the Directory messages contained in a
Tree are topologically sorted:

bazelbuild/remote-apis#230

The advantage of having Tree objects in this form is that it makes it
possible by doing instantiation of Tree objects on a local file system
using a simple forward scan.

As this change hasn't been merged yet, this comment only adds the logic
for generating topologically sorted Tree objects. I will only add the
code for announcing the hint as soon as the linked PR gets merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace message Tree with a topologically sorted varint delimited stream of Directory messages
3 participants