-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EXPERIMENT: build jujutsu with buck2 #1997
base: main
Are you sure you want to change the base?
Conversation
Thanks for bringing the write up from https://gist.github.com/thoughtpolice/6fcc0b102e10ac968a22b420b540f607, to here. I also fell into the build systems rabbit hole a few years ago and haven't recovered. From a technical standpoint I like the idea, but I'm pretty sure that @martinvonz and other Google developers use Copybara to import the changes from Github to the Google Monorepo, where they have a Blaze/Bazel The drawbacks section speaks for itself and makes it sadly a immediate no-go from an infrastructure standpoint, as no windows and no-nix linux builds make it hard to present a good case for it. If we had something like a developer policy, like LLVM, it could be here to stay as a developer supported or in rustc terms as a Tier-3 or Tier-4 target. I want to share your enthusiasm about buck2 and also understand the pains of cargo, but this is a non-trivial ask and probably would need a RFC process if it existed. Sidenote: Once again thanks for a amazing write up in this pr. |
Out of curiosity, I wonder: is it possible to have it export usable ones for public consumption here? Could we have those? I'd love to actually give a whirl at Bazel vs Buck2 on a real project (there are a few others in public that have BUCK and BUILD rules) and would be very interested in seeing how the BUILD rules work for jj. Though maybe it's just exactly like my BUCK files here...
Actually, Windows is pretty well supported by Buck2; it's a pretty high priority internally with multiple Windows devs assigned directly to it, and I know personally that Rust works with MSVC in this setup (I'm not sure what Bazel's Windows support looks like, but my understanding was that Windows was a long tail project that is now pretty well supported.) I probably overstated the problem in my original description. The real thing is just that the scripts like Frankly it probably makes sense to even replace that script with a Rust binary. The Nix thing is, some minor nits aside, mostly because it just magically gives me a buck2 binary in I do agree though that unless this has working multi-platform builds that at least work in debug mode, OOTB — it's far too experimental to merge into the main repo. But, it's not bad as a motivating case study to get those issues polished/fixed, though (my own projects are typically Linux exclusive, which helps a lot.)
Yeah, I think that's good probably a lot later on, so for now just keeping this in a branch is OK with me. Honestly, I don't foresee us getting any major complications anytime soon that would make rebasing this hard.
You got it! |
Actually, I'd say my minimum requirement for merging this would be even higher — a unified build cache for all 3 platforms. That's not only a high bar but also something that immediately is useful for everyone. |
I have no answer to that, as I don't work for/at Google. But I think LLVM went a similar way, partial Google opensourcing and now its community maintained.
I think both Buck2 and Bazel have nowadays good Windows support, but I'm the wrong person to adress it here, as I've had no production experience with them yet. Notable things missing from the Bazel Windows version are I think the builtin sandbox, which makes it attractive for local sandboxed builds. The whole prelude import is also blocked on glens submodule work, as a job updating the prelude is not available yet. Bootstrapping is less of problem, if it's easily doable with a single script. So I agree pretty much to everything.
Major 👍
Time for
That part is actually covered by newer Github runners, I think the All in all, I'd wait for the real stakeholders to show up. |
By the way, I just merged #2115, which adds a way for custom binaries to bake extra configs into the binary. I would have liked to have a test for that, but since Cargo can't yet depend on binaries from another crate, I just had to skip tests. I have never used Buck2, but I assume it would have handled that without a problem. |
Smallish update on this:
We could use ahornby's hashbang to distribute Buck2 for |
46d8715
to
89231f7
Compare
As an update, I randomly decided to pick this back up and made an important milestone: I can build a working copy of First, install
This will put 3 tools in your path from
Assuming that the stars align, this should work. And your output will look like this; a clean build for me took 1m30s:
I had to do a surprisingly small amount of work to get here. Some notes.
So this is still a huge experiment. But I consider a reliable build to be a big milestone! And the binary does seem to mostly work fine. |
89231f7
to
ab63bf3
Compare
It works today with a bit of configuration, but once I land this branch into rust-analyzer, the overall experience should be much more seamless and Cargo-like. |
842b0a3
to
0c8aa96
Compare
7c8f3e1
to
dc1c726
Compare
6df50ab
to
8e83868
Compare
8e83868
to
2aadf84
Compare
1de1cf7
to
376940c
Compare
These are basically always useful in vscode; turn them on. Note that there is an editorconfig plugin for vscode and we do have a `.editorconfig` file but these options aren't set due to an old intellj-rust bug. Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
This lays the basic groundwork to invoke buck2 in a way that barely works and builds nothing. The `jj.bzl` code will be used in some upcoming diffs to add `BUCK` files to the various crates. Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
`libgit2` requires `libssh2`, which in turn requires `openssl-sys`. OpenSSL is notoriously hard to vendor for a number of reasons including its build system. In contrast, while BoringSSL does not make compatibility guarantees, it is easy to vendor and is designed to be used with Bazel. The goal is that we can substitute BoringSSL for OpenSSL in `openssl-sys` as the underlying library, and `libssh2` will still work. Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
Summary: `buck2 build third-party//rust` works fine here. Signed-off-by: Austin Seipp <[email protected]>
Summary: These will be needed for a lot of upcoming dependencies as the Reindeer dependencies and Cargo dependencies are unified. Signed-off-by: Austin Seipp <[email protected]>
This includes a very simple script to do the synchronization between the workspace Cargo file and the Buck2-specific Cargo file, automatically. Signed-off-by: Austin Seipp <[email protected]>
This cfg value isn't understood by Cargo, so it needs to have the warning suppressed by default. We could also add an entry to `build.rs` too, but not every package has one. To be used by upcoming diffs. Signed-off-by: Austin Seipp <[email protected]>
buck run -v0 tools/scripts:unused_workspace_deps Signed-off-by: Austin Seipp <[email protected]>
This adds a new step to the `synchronize.py` script that synchronizes dependencies between `Cargo.toml` and `BUCK` files. In this model, Cargo remains the source of truth. Signed-off-by: Austin Seipp <[email protected]>
This is needed to emit the `.rs` files into the right build directory in a follow up diff to add `BUCK` files. Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
The `grammar` macro from `pest_derive` doesn't actually interpret the given file as relative in our case, so we have to give it the fully qualified relative path which exists in the `buck-out/` dir. Signed-off-by: Austin Seipp <[email protected]>
The `grammar` macro from `pest_derive` doesn't actually interpret the given file as relative in our case, so we have to give it the fully qualified relative path which exists in the `buck-out/` dir. Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
Signed-off-by: Austin Seipp <[email protected]>
376940c
to
2d292ca
Compare
This is an experimental and fully un-tested branch, which attempts to add build rules for building
jj
with Buck2, AKABUCK
files. The goal of such a technology would be to have a fully unified and integrated system for development of Jujutsu — no matter the build platform or client of the build system — even as the project expands and includes more and more features, more developers, and possibly languages and tools.Please note that I don't think this is a branch that should in its current state be merged (or ever will? Assuming all goes well?) While I was partially motivated to unify things in
workspace.dependencies
recently because it would help here, that wasn't really the main goal. Part of it was to see if it could even remotely work at all; I am happily using buck2 for my own (private) Rust projects, so it was also an attempt to see if I could bring some of that magic over. I'm happy to say it seems promising but I've run into a (big) issue. In the meantime, I am publishing an interm report and PR on it.If something like this is truly needed one day, I assume that's because Jujutsu will be wildly successful beyond our dreams, to the point that this sort of tech is a requirement, for reasons I'll go over. But maybe there's more to it than just that? This is above all an experiment, but I'll elaborate.
Because this kind of wide-reaching impact this kind of thing can have (on project direction, on developers, on support, et cetera) the remainder of this description is a very long novel about why you might be interested in this (or maybe why you would hate it.)
Note: This patch series shouldn't be taken as any kind of endorsement or commitment to its maintenance. If you're not interested, don't worry, it's not going anywhere. It may break, get rewritten, or otherwise behave incorrectly. Do not taunt Happy Fun Ball.
Prologue
In the beginning — literally, since the dawn of time — there was
make
.make
is actually not one program but many, but all variants ofmake
have a similar idea: to represent the compilation of software, or software artifacts, as a directed acyclic graph — where each node represents an "action" to perform that results in outputs, and where most actions (non.PHONY
ones) can be cached in the filesystem.make
is a good tool and I have a fond spot for it. However, in practice, it is very difficult to write correct Makefiles that work, at scale, for large software projects, without exceedingly austere requirements on the layout of the project, its language, and its specific design points — and while having something that is portable and easy to understand.For many years many programming languages, including high level ones, used Make (including my beloved Haskell) to drive software compilation; though at some point around the turn of the millenium, it seems engineers began to move away from this model when dealing with high level languages (C/C++ users continue to wallow in misery to this day). There are some decent reasons for this that motivated it, I believe, when you back up a bit:
make
isn't one thing, but a family of things. There isgmake
,bmake
, there used to beimake
, and there is also the venerablemk
from plan 9 (which unlike the others actually had the fortitude to fix one ofmake
s worst sins — its interaction with shell escaping in the body of a rule.)mtime
) to reimplement than to piggyback on tool with an impedence mismatch.mk
solves this by simply never doing any substitution or interpolation in the body of a rule. It writes the body of a rule, literally, to a temporary file, and then executes it while setting environment variables that are set by themkfile
. This was figured out in 1991 or something, but we've suffered with the alternative every day since..c
file from another file, but then you need to scan the.c
file for.h
files it includes, in order to get a correct build graph. Becausemake
can't dynamically discover this information, you have to repeat the logic of the generator program into the build rules, which is tedious and error prone; you can easily mis-synchronize the generator program and the rules. There is no reuse possible.make
alone, among others. For example, good luck handling C++20 modules. Even for simpler cases, capturing module dependencies is often a bunch of work in Make; for Haskell you need to make sure the compile emits dependency information that is.include
'd by Make, similar togcc -MMD
in C or C++.This is ignoring the packaging and distribution situation for any given language, which are often handled by the build tool too, muddling their responsibilities. But ultimately, I think so many language-specific dependency managers were designed for the above reasons, among many many others — Make just doesn't fit well into many of the patterns modern languages enjoy, and so most decided to go their own way and reimplement the good bits — parallel, toposorted, caching DAG — on their own.
What's the modern standard for language-based build systems?
In my mind, a lot of the "modern" build tools we enjoy sprouted from the seeds originally planted by solutions like
pip
for Python andrbenv
/bundler
for ruby. Those tools had a simple insight: every project should have their own list of dependencies and needed tools, and they should be provisioned for that project correctly in an isolated manner.This was a big step up for language specific tools that
make
also couldn't match. In practice, the development velocities of these communities became discoupled from traditional package managers in the FOSS world where making sure every dependency X worked with every other dependency Y became difficult. There are other cultural reasons for this, but a big motivation for the "isolation" concept came from that.Actually, among the modern alternatives, I think
rustup
andcargo
for Rust, as well asghcup
andcabal
in Haskell, are about the best examples for modern standard, and I give the edge tocabal
somewhat there. (It's not surprising, by the way, that one of the creators of Bundler, Yehuda Katz, was also one of the creators of Cargo.)But the language-specific-ness remains. Every tool must implement its own build engine, must integrate with language specific features, and has their own distinct UX and lexicon. And those tools also tend to miss out on the greater picture or lack advanced features...
OK, so what's the sales pitch for buck2?
From its own
README.md
(which was written by yours truly):.c
file neeads a.h
file that isn't correctly specified), the build will fail. This enforced correctness helps avoids entire classes of errors that most build systems allow, and helps ensure builds work everywhere for all users. And Buck2 correctly tracks dependencies with far better accuracy than Buck1, in more languages, across more scenarios. That means "it compiles on my machine" can become a thing of the past.make
and tie togetherdune
topip
andcargo
. But then how do you run test suites, code coverage, or query code databases? Buck2 is designed to support multiple languages from the start, with abstractions for interoperation. And because it's completely scriptable, and users can implement language support — it's incredibly flexible. Now your Python library can depend on an OCaml library, and your OCaml library can depend on a Rust crate — and with a single build tool, you have a consistent UX to build and test and integrate all of these components.Why use a large-scale integrated build system like this? What becomes possible? What becomes easy?
Assuming this all worked out, and everyone was on board, here's an example of a few things this could offer us.
jj
repo. Any other developer can runjj git fetch
and then check out those changes. They runbuck2 build
and their build completes instantly, thanks to cache hits.BUCK
files, all commands can be executed remotely on remote systems in a correct way. This makes many distinctions between e.g. platform-specific development much simpler, because alternative platforms fail in your local development loop just like they would in CI.buck2 build
, usingBUCK
files that can be programmatically queried and introspected.Make
tie the definition of the rule and the body of the rule together. Due to previous deficiencies, this often means that rule dependencies are dependent on the implementation of rules. Want to refactor a rule? Get ready to change all the dependents.A good example of this is that in large scale projects, multi-language support becomes essentially necessary, and by then abstraction capabilities are really useful.
For example, the above generality allows much deeper levels of code generation than most tools possibly allow. We can have Rust code depend on a code generator written in Python that depends on C++ that depend on Rust libraries, all with a unified UX. For example, we could automatically generate web pages, man pages, and documentation indicies from the source code itself with techniques like this.
A more concrete example? A theoretical
jj web
that I keep talking about on Discord. We might have Typescript code that uses React that depends on HTML and hell, maybe all that TypeScript also depends on Rust compiled to WebAssembly. The sky is literally the limit in the webdev world. And we need the Rust code to depend on it, as a distributable. We need to run linters, formatters, and run test suites on that code just like we do with Rust, today to the same standards and quality. And all that can be achieved with a unified UX that behaves the same way. How do you build the HTML frontend?buck2 build
. Testing?buck2 test
. Running the server with every change?buck2 run
.Interlude: Cargo woes
One of my biggest constant issues with Cargo is that it does not do truly global content-addressable caching. What this means is that you can get into a scenario where things are spuriously recompiled. For example:
x=1.2.3
x
, resulting in anrlib
foobar
onx=1.2.3
and build again.x
, as expected.foobar
, and buildx
againx
withoutfoobar
, even though it already did that once beforeIn other words, even though the content of the final build step above is the same as a previous result that was built, Cargo didn't know that and rebuilt it instead. This in turn means that sharing Cargo directories is not possible between projects, because one project may require
serde_json
with another one requiresserde_json
with another feature. At a very high level, Cargo doesn't identify things by content, but by name; therefore there can only be onex
at a time, regardless of features.In contrast, Buck2 correctly handles the above scenario, so the second recompilation of
x
withoutfoobar
will be a no-op and complete instantly. Actually, Cargo seems to really like spuriously recompiling things for many reasons beyond my comprehension; I assume this is because it is being safe and conservative to ensure the build succeeds, likely for reasons like this.You might say "this seems theoretical", but it isn't! It's a specific case of a general problem, and it happens all the time. Anytime you
jj edit
any new commit, you may have a completely new set of dependencies and versions that can come from a downstream changing. For example, switching between two branches where one has a new feature will cause spurious dependency recompilation, like above.... And what becomes hard? (OpenSSL edition)
Dependencies, mostly. One of the drawbacks of the approach tools like Buck, Bazel, and even Nix make is that they truly want the most complete view of the build graph as is possible, and this means they want to understand all inputs and outputs to the system in a deterministic way. Tools operating under this assumption are few and far between e.g. too many programs write files or talk to the network or do any number of other things under the hood in order to be more useful. A simple corollary to keep in mind: small conveniences like these sleights-of-hand actually cost a lot in the long run.
In this version of the design, what this means is that Cargo crates are compiled by Buck, and not really by Cargo. These are automatically generated by a tool called reindeer, which translates
Cargo.toml
files intoBUCK
files — effectively translating the static build graph from Cargo to Buck.This is fantastic in general, but there's a small nit — Rust
build.rs
scripts, which can range from doing things like "Compile 1 million lines of C++ code on demand" to "Write the currentHEAD
revision to a file." In general,build.rs
scripts are simultaneously:Because of that, and because of the impedence mismatch between Cargo and Buck2 — this patch series contains a commit called
buck: add initial reindeer scaffold for rust compilation
which adds approximately ~90-100 lines of YAML files in order to work around these sorts of problems; the YAML files are there to informreindeer
about what kind ofbuild.rs
scripts the crates in question are using, so it can properly emulate some of their features.And, right now, this Buck2 build actually is broken right now, and guess why? Because
openssl-sys
has abuild.rs
script that leans into the former category — of trying to build and vendor openssl itself and compile a ton of C++ "out of band". I haven't worked a way around this yet, but it is a hard dependency due to the need ofopenssl
bylibgit2
. Therefore, if this cannot be worked around, this kind of dependency would effectively prevent Buck from ever being used for Jujutsu, until we did something like swith to Gitoxide (as the pure-Rust nature alleviates many of these issues and makes many build scripts simpler; C/C++ compilation is a number-1 cause of many woes like this.)This is something we take for granted as modern developers; dependencies are easier than ever to add and rely on, but in some sense, we also do not perceive these things as a kind of power that also restricts us and binds us. This is a form of path dependence. This is one of the ways that it can come back to stop promising experiments or alternative avenues dead in their tracks. It's worth keeping in mind.
For the most part, I genuinely find the
buck2
workflow to be pretty smooth and far less corruption/recompilation prone than the equivalentCargo
workflows, ignoring bits like IDE integration.Why not Bazel? Could we use it?
Actually, I think Bazel is pretty cool. But I have little experience with it; I'm using Buck2 because I think it's a good project, written by talented people, and I trust the pedigree it came from, and I think many of its design choices mean it actually might be able to give Bazel a true run for its money — whereas many alternative solutions either seem to be catching up or do not match its functionality, Buck2 seems to be hitting all the right high notes and offers many things Bazel itself hasn't or can't.
Rules in userspace. Bazel rules for a language are often written by third parties, except for one set of libraries: the C/C++/ObjC, and Java rules, all are written in a special variant of Starlark and built into Bazel. These rules cannot be modified or rewritten.
In contrast, all build rules for all languages in buck2 are written in Starlark in "userspace"; the
buck2
binary knows nothing about any programming language. Morally, it's a binary that only knows how to build a graph of commands, and run them (possibly remotely.) This makes things like atomic upgrades between versions, rollbacks, and many other long-tail scenarios like patching the build rules very easy in practice.Buck Extension Language, or "BXL". An entire extension framework built on top of the build engine, which can be driven by Starlark, allowing code-driven introspection, automation, and querying. This makes features like LSPs, code search indicies, and other "related" tooling significantly easier to develop and support e.g. you can feasibly develop your own bxl-based LSP integration for
rust-analyze
or TypeScript.Virtual filesystem support. While this is minor and tangential, virtualized jj repositories have been a recurring topic; Buck2 includes direct support for them by way of watchman and integration with Sapling's EdenFS; I don't believe there's any reason we couldn't extend that to Jujutsu as well. I think that makes this a great dogfood opportunity, and a chance for more "vertical" integration between them. I think this could be super exciting if executed correctly.
Native binary. Again, minor — I don't think it's a huge deal in our case, and I don't care too much about things like GC, the fast startup times from Buck2 being a native executable written in Rust is really wonderful.
As for the could? Well, I suppose. But I don't know how to juggle it, so I can't write it or maintain it.
Should this happen later, instead of now?
Initially it might seem like this is jumping the gun — if we wanted all those great features, we could just wait later for them, right? What's the issue with doing it years down the line when Jujutsu has 20 paid developers and +1 Jujillion installations worldwide?
But there's the rub: by the time you need a thing like this, you are presumably hitting the walls of technical debt that make moving forward on difficult obstacles nearly infeasible, and where existing human-scale constraints make "lateral movement" to new tools like this extremely difficult. It is actually easiest to adopt a tool like this early on, to the most aggressive degree possible, and thus avoid the conundrum entirely, if you believe that you will need it.
A good metaphor is trekking through an unknown jungle. If you walk 10 miles and realize you took the wrong turn 8 miles back, you can of course just begin the long trek back, and take the better path. But it cost you a lot of time. And surely wasn't an easy choice to make — unless the alternative was another 150 miles.
Luckily, the actual code needed to support the build right now is actually pretty darn small if you ignore the prelude. Keep reading for more details on that.
And what about Cargo?
I actually don't think Cargo should be deprecated, if possible, at least not for a very long time — because especially its IDE support through
rust-analyzer
is fantastic to have for now. Assuming that can be handled (not easy, but also not supremely hard), I think many other problems could be solved; for example, we could invert the whole process, and auto-generate Cargo.toml files fromBUCK
files instead of the other way around; then use those in crates published on https://crates.ioOr not. I think we could actually get pretty far and a lot of nice bonuses while supporting both. Ideally we would settle on one eventually, I guess.
Should this happen at all?
Maybe not. There are good arguments that Jujutsu might never outgrow the need for something like Cargo, and that the costs of adopting Buck would be wasted — for example, it's just OK to have some manual scripts or repeating yourself a few times, where Buck would allow abstractions. On the other hand, in such a case where you never need its power, Buck would likely never impose many burdens to maintain, so removal would be easy.
A good example of this is again the
jj web
example before. How do we drive typescript, react, all those tools? What if we wanted those code generators to build e.g. option or command-line-flag indicies for a website? We'd probably have to write a bunch of.bzl
code ourselves to do it, in a robust way. So, it's a bunch more work for us, though ostensibly we could make it more widely available, too. But that's a lot more work than a Makefile that's 30 lines and mostly works (except when it breaks.)Ultimately though it does hinge on whether or not those advantages seem big enough in the grand scale of things. If you're not going to lean into a tool like Buck, and it only ever plays second fiddle to a tool like Cargo, you'll never truly reap its advantages. Doing so requires real commitment, and it is not clear to me if this commitment is the right choice.
So what's the catch? And the risks?
Plenty, right now. Major drawbacks:
nix develop
exclusivelyrust-analyzer
support (though, it's a high priority for the upstream project, since it's also written in Rust)Warts include:
Cargo.toml
, so it's really only about 100 lines:Cargo.toml
for Reindeer, even after I went through Herculean *cough* efforts to get rid of that this week.Risks include:
N
developers, and you're deep in the red pretty quickly. (This means the final product needs to have high quality across the board.)Can I use this branch right now?
No. The build is currently broken, because I do not have the ability to work around the
openssl-sys
crate being broken. This is just a WIP snapshot so it does not get lost. But one day, you'll be able to build and runjj-cli
like so, perhaps:What's the goal of publishing this?
Because I don't want to lose my work, and I would be interested in what people think about all this. I suspect there are a lot of opinions from "sounds awesome" to "OK" to "I'm not convinced".
Will you keep working on it?
Maybe. Actually, the current total time on getting this branch as far as I did took less time than writing up this PR description. So it's not like I've put unbelievable effort into it.
And, here's the thing: I'm committed to Buck2 for the long run for my projects because I do believe it solves many chronic issues I face, to a much greater and more refined degree than its predecessors. Similarly, after a lot playing for the past few weeks, I firmly believe I'm also committed to Jujutsu for the long run, for the same reasons: because it works like how I think a tool of its class should, and does so far better than the competition — it's 10x better for me, not 1x or 2x. You might think of them like peanut butter and jelly, in my opinion. So I'm happy to see them go together, too.
So I might continue this, returning to it occasionally when it's promising, and if it's considered intruiging by the fellow developers who might also be miffed by some of the above issues. It can't be merged in the current state, I think, at least until full builds work. And assuming it went well I can see myself maintaining it for a good long time and documenting it and helping out any/all issues with it. But if nobody is interested, or it's only a resounding "meh", I'll probably shelve it for a good while until its turnkey or I've established my own changes across the codebase more, or something.