-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace submodules with flags for fetching and building grammars #1659
replace submodules with flags for fetching and building grammars #1659
Conversation
I really like this idea personally. Much tighter integration
This is a huge improvement in my opinion. Great idea. |
f391fc8
to
70a78d3
Compare
91f458d
to
fc61878
Compare
There's a snag with updating the flake.nix: this line breaks |
Do we need two flags to have separate fetch and build steps ? IMO replacing them both with a single flag to do the whole thing in one go would be more ergonomic (and more common). |
The separation is mostly for the sake of packagers. For example, nix should fetch the grammar repositories itself and then call out to helix to build them during packaging. I would be open to adding a flag that does both in addition to these flags though. |
fd10571
to
3985b72
Compare
0295eea
to
157cd4b
Compare
See following comment: |
I started porting the So I'll try to cleaning up the typing with Edit: with the |
ec7a0fe
to
ca785c4
Compare
Yeah I'd like to avoid an openssl dependency, it's nice that there's no build time requirements right now (git2 bundles libgit2 if needed). Here's an explainer on the Rust + NixOS + OpenSSL issue: https://nixos.wiki/wiki/Rust#Building_Rust_crates_that_require_external_system_libraries |
Don't we have a build time dependency in git (and thus openssl) now? The build host and target host both need git the CLI tool, so they should already have openssl. And wouldn't we need that anyway for downloading zip files? Sticking with shelling out isn't a bad choice, since these are relatively simple commands, but it seems odd to decide against libgit2 because NixOS makes easy things really hard. |
For the
It's not a decision based on NixOS, I've tried to avoid external dependencies because it simplifies the build process. It feels wrong to pull in openssl just for fetching grammars. |
77598dd
to
963e7bc
Compare
The submodules system is being replaced with a command-line flag hx --fetch-grammars Which shallow-clones grammar repositories at the given revision and hx --build-grammars For building grammars separate of the initial compilation of helix. Why remove submodules? * Cloning helix in general takes a long time because of the submodules, especially when the submodules are not fetched as shallow * Packaging is consistently painful no matter the package-manager * It is quite difficult to devise a scheme where users can declare a desired set of grammars and implement it with submodules This commit fully removes the existing tree-sitter submodules from the tree (as well as the .gitmodules file which is no longer used).
Here we add syntax to the languages.toml languge [[grammar]] name = "<name>" source = { .. } Which can be used to specify a tree-sitter grammar separately of the language that defines it, and we make this distinction for two reasons: * In later commits, we will separate this code from helix-core and bring it to a new helix-loader crate. Using separate schemas for language and grammar configurations allows for a nice divide between the types needed to be declared in helix-loader and in helix-core/syntax * Two different languages may use the same grammar. This is currently the case with llvm-mir-yaml and yaml. We could accomplish a config that works for this with just `[[languages]]`, but it gets a bit dicey with languages depending on one another. If you enable llvm-mir-yaml and disable yaml, does helix still need to fetch and build tree-sitter-yaml? It could be a matter of interpretation.
helix-syntax mostly existed for the sake of the build task which checks and compiles the submodules. Since we won't be relying on that process anymore, it doesn't end up making much sense to have a very thin crate just for some functions that we could port to helix-core. The remaining build-related code is moved to helix-term which will be able to provide grammar builds through the --build-grammars CLI flag.
This is not strictly speaking necessary. tree_sitter_library was used by just one grammar: llvm-mir-yaml, which uses the yaml grammar. This will make the language more consistent, though. Each language can explicitly say that they use Some(grammar), defaulting when None to the grammar that has a grammar_id matching the language's language_id.
build_grammars adapts the functionality that previously came from helix-syntax to be used at runtime from the command line flags. fetch_grammars wraps command-line git to perform the same actions previously done in the scripts in helix-editor#1560.
The vision with 'use-grammars' is to allow the long-requested feature of being able to declare your own set of grammars that you would like. A simple schema with only/except grammar names controls the list of grammars that is fetched and built. It does not (yet) control which grammars may be loaded at runtime if they already exist.
This is a rather large refactor that moves most of the code for loading, fetching, and building grammars into a new helix-loader module. This works well with the [[grammars]] syntax for languages.toml defined earlier: we only have to depend on the types for GrammarConfiguration in helix-loader and can leave all the [[language]] entries for helix-core.
This is a bit of a micro-optimization: in the current setup we waste a thread in the pool for a local grammar only to println! a message saying we're skipping fetching because it's a local grammar.
This restores much of the behavior that existed before this PR: helix will build the grammars when compiling. The difference is that now fetching is also done during the build phase and is done much more quickly - both shallow and in parallel.
This commit replaces the out-of-date builder in the flake which relied on submodules for fetching and the compiler for building. Now we disable fetching and building explicitly with the environment variable and then use builtins.fetchGit and a derivation mostly derived from upstream to compile the grammars. Anecdotally, this is still quite slow as builtins.fetchGit does not seem to do shallow clones. I'm not sure I see a way around it though without recording sha256s, which seems cumbersome.
The old flags were a bit long. --grammar is also aliased to -g to make it even easier.
963e7bc
to
14124e4
Compare
Here we perform a shallow fetch using builtins.fetchTree. In order to make this work, we need to specify the `ref' for any repository that doesn't have `master' as its default branch (I'm not sure why this limitation exists since we don't need this when performing the shallow fetch in `--grammar build') This `ref' field is ignored by helix, so I have left it undocumented for now, but I could be open to documenting it.
14124e4
to
af0383d
Compare
Looks like this was rebased a few hours ago and now the 789a171 revision no longer exists.
Great work! 👍🏻 |
This PR adds a non-interactive
--grammar {fetch|build}
(aliased to-g
) flag to thehx
binary which replaces the current mechanisms for fetching tree-sitter grammars (git submodule ..
) and building grammars (a compile-time step).This is an alternative to the scripts1 approach in #1560.
Closes #1549.
Here's an outline of what changes:
💥 user-facing
You no longer need to install all grammars! Add a
use-grammars
key to the top of yourlanguages.toml
:Now running
will fetch and build only the grammars allowed by
use-grammars
.However now note that it is now possible to build or package helix without any grammars. You may need to invoke these new commands yourself if syntax highlighting is not working.
Are you building from source and don't like how long it takes to fetch the submodules? Now they're gone, and
hx --grammar fetch
is much faster: it takes me 5 seconds to fetch the entire set of grammars on a 10 Mbps connection ⚡🏗️ development
We can now customize the set of grammars we'll build in CI for example. This makes the CI much faster: this branch consistently takes less than 1 minute for Check and Lint, 1-2 minutes for Doc, and 1-5 minutes for the Test workflow.
Grammar developers like myself rejoice! You can now point to an in-progress grammar along-side its
[[language]]
entry in thelanguages.toml
like so:Develop your grammar and then run
hx --grammar build
to compile it. Launchhx
on a file of your language and viola, interactive grammar development! When you're ready to add the language upstream, specify the revision and remote of the grammar.Also, now all configuration for a grammar lives in
languages.toml
rather than in the build code, which affects the grammars forAll grammars that need subdirectories within their repositories to be built. This can be configured now like so:
📦 packaging
The biggest change is that there are no more submodules, which has been a pain-point for at least nix (#1448).
Most packagers will not see a difference outside of the submodules change: grammars are fetched and compiled automatically in the helix-term build phase. Nix (and other packagers that sandbox network connectivity) may struggle with this. Set a
HELIX_DISABLE_AUTO_GRAMMAR_BUILD
enviroment variable to disable the automatic fetching and building of grammars.If a packager would like to package the grammars themselves, you may parse the
languages.toml
for[[grammar]]
entries and invoke a C compiler on the grammars.❄️ nix packaging
This PR takes packaging with nix into special consideration. Because of sandboxing, nix builds should disable automatic grammar fetching with by setting a
HELIX_DISABLE_AUTO_GRAMMAR_BUILD
environment variable. Instead Nix should parselanguages.toml
for the[[grammar]]
entries, download the grammars, and put them into the build'sruntime/grammars/sources
directory under the grammar'sname
key as each directory name. I am using an overlay in my dotfiles based on the derivation in nixpkgs which does this.Note though that there was a bug in nix's built-in toml parser (
builtins.fromTOML
) that was only fixed in v2.6. To work around this, nix builds should convertlanguages.toml
into JSON and usebuiltins.fromJSON
. (See this commit for an example.)You may also run over the tree-sitter grammars with a C(++) compiler with Nix, in order to fully control their packaging.
./grammars.nix
from this PR provides an example.🔮 future work
fetchFromGitHub
)https://github.com/helix-editor/helix/archive/<rev>.zip
Footnotes
Why not scripts? Scripts are difficult because (1) you need to maintain separate scripts for windows and unix, (2) it can be difficult to parse configuration data like build paths within grammar directories, and (3) its makes an arguably unergonomic and bespoke CLI for people to learn. ↩