-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce tree-sitter-language crate for grammar crates to depend on #3069
Conversation
Oh this is interesting, I think I like this quite a lot! |
1da97d0
to
01f7edb
Compare
5fecdbf
to
fbe78be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're willing to take on a backwards-incompatible change in the tree-sitter
crate, I think there might be a way to accomplish the same goal (language grammar crates don't need to depend on tree-sitter
) without introducing the new tree-sitter-language
crate.
Instead, you would update the signature of Parser::set_language
to:
type LanguageFn = unsafe extern "C" fn() -> *const ();
impl Parser {
pub fn set_language(&mut self, language: LanguageFn) -> Result<(), LanguageError> {
/* ... */
}
}
So the same type as you've put into ts-lang
, but without a struct wrapper around it, and just put it into tree-sitter
crate directly. (It's the named struct wrapper that requires the new crate in your original formulation.) And then set_language
takes in that factory function directly (and calls it to get the language pointer to then pass on to ts_parser_set_language
).
The call would end up looking like
parser.set_language(tree_sitter_python)?;
And the language grammar crates would be updated to change the return type of their language constructor from tree_sitter::Language
to *const ()
.
What do you think?
We might also want to keep a version of |
@dcreager The drawback I see with that approach is that I think I admit that this extra crate is kinda weird, and I don't totally love it. And the backward-incompatible change is going to introduce some busy work for my use case at Zed as well. But I would like to properly use the |
Oof, that's a great point, I hadn't thought of that. Though I think that means the |
Yeah, in the latest commit, I have |
unsafe { tree_sitter_PARSER_NAME() } | ||
} | ||
/// The tree-sitter [`LanguageFn`] for this grammar. | ||
pub const LANGUAGE: LanguageFn = unsafe { LanguageFn::from_raw(tree_sitter_PARSER_NAME) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in the latest commit, I have
LanguageFn::from_raw
as a const unsafe function that is called in the generate code in the language crates.
Nice! I'm convinced. With the unsafe
call now living in the grammar crates, I think this is the least gross solution. 😂
cli/src/generate/templates/lib.rs
Outdated
//! let mut parser = tree_sitter::Parser::new(); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::language()).expect("Error loading CAMEL_PARSER_NAME grammar"); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::LANGUAGE.into()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about set_language
taking in LanguageFn
then? That would simplify this part to
parser.set_language(tree_sitter_PARSER_NAME::LANGUAGE).unwrap();
It's just aesthetic really but I like how it reduces the noise a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's true the into()
thing is a bit noisy.
My thing with set_language
is that now we have a second code path for creating Language
objects, which doesn't go through a LanguageFn
: loading languages from WASM files via WasmStore::load_language
. So I wouldn't want to change the type to take a LanguageFn
.
Although, we could (I guess) make it take an impl Into<Language>
, so that you could pass either a Language
or a LanguageFn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thing with
set_language
is that now we have a second code path for creatingLanguage
objects, which doesn't go through aLanguageFn
: loading languages from WASM files viaWasmStore::load_language
. So I wouldn't want to change the type to take aLanguageFn
.
Oh that's nice! I hadn't been following the wasm stuff closely. So that's how you're doing dynamic linking if though it's not very well standardized on wasm yet?
Although, we could (I guess) make it take an
impl Into<Language>
, so that you could pass either aLanguage
or aLanguageFn
.
That would definitely be a nice way to provide backwards-compatibility. The new tree-sitter
crate with this change would still work with old language grammar releases, since you'd just have the Language
directly to pass in. You can update your grammar dependencies at your leisure and it would be a one-liner for each to change how you're passing the language into your parser.
Also, if you went with TryInto
instead I think that could even subsume the error handling from WasmStore::load_language
. So you'd end up creating a wrapper along the lines of
pub struct WasmLanguage<'a> {
store: &'a mut WasmStore,
name: &'a str,
bytes: &'a [u8],
}
impl<'a> TryFrom<WasmLanguage<'a>> for Language {
type Error = WasmError;
fn try_from(wasm_lang: WasmLanguage<'a>) -> Result<Result, WasmError> {
// the body of `WasmStore::load_language`
}
}
impl WasmStore {
pub fn language(&mut self, name: &str, bytes: &[u8]) -> WasmLanguage {
WasmLanguage { store: self, name, bytes }
}
}
// let store: WasmStore = ...;
// parser.set_language(store.language("python", python_bytes))?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm wrong about my suggestion that Into
or TryInto
would provide transparent backwards compatibility, since set_language
currently takes a reference to a language. Given that, I retract my suggestions — I think the dependency inversion on its own is a huge win, and worth getting in ASAP. We can iterate on the set_language
ergonomics later if it turns out to be an actual pain point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Apologies if my comments are coming across as nit-picking — I really like the dependency inversion in this change and want to make sure we make it as ergonomic to use as possible]
cli/src/generate/templates/lib.rs
Outdated
//! let mut parser = tree_sitter::Parser::new(); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::language()).expect("Error loading CAMEL_PARSER_NAME grammar"); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::LANGUAGE.into()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thing with
set_language
is that now we have a second code path for creatingLanguage
objects, which doesn't go through aLanguageFn
: loading languages from WASM files viaWasmStore::load_language
. So I wouldn't want to change the type to take aLanguageFn
.
Oh that's nice! I hadn't been following the wasm stuff closely. So that's how you're doing dynamic linking if though it's not very well standardized on wasm yet?
Although, we could (I guess) make it take an
impl Into<Language>
, so that you could pass either aLanguage
or aLanguageFn
.
That would definitely be a nice way to provide backwards-compatibility. The new tree-sitter
crate with this change would still work with old language grammar releases, since you'd just have the Language
directly to pass in. You can update your grammar dependencies at your leisure and it would be a one-liner for each to change how you're passing the language into your parser.
Also, if you went with TryInto
instead I think that could even subsume the error handling from WasmStore::load_language
. So you'd end up creating a wrapper along the lines of
pub struct WasmLanguage<'a> {
store: &'a mut WasmStore,
name: &'a str,
bytes: &'a [u8],
}
impl<'a> TryFrom<WasmLanguage<'a>> for Language {
type Error = WasmError;
fn try_from(wasm_lang: WasmLanguage<'a>) -> Result<Result, WasmError> {
// the body of `WasmStore::load_language`
}
}
impl WasmStore {
pub fn language(&mut self, name: &str, bytes: &[u8]) -> WasmLanguage {
WasmLanguage { store: self, name, bytes }
}
}
// let store: WasmStore = ...;
// parser.set_language(store.language("python", python_bytes))?;
cli/src/generate/templates/lib.rs
Outdated
//! let mut parser = tree_sitter::Parser::new(); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::language()).expect("Error loading CAMEL_PARSER_NAME grammar"); | ||
//! parser.set_language(&tree_sitter_PARSER_NAME::LANGUAGE.into()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm wrong about my suggestion that Into
or TryInto
would provide transparent backwards compatibility, since set_language
currently takes a reference to a language. Given that, I retract my suggestions — I think the dependency inversion on its own is a huge win, and worth getting in ASAP. We can iterate on the set_language
ergonomics later if it turns out to be an actual pain point.
This is looking very cool! Let me try to check my understanding of what the dependency structure will look like with these changes in:
Results:
This would improve things a lot! I wonder if it would make sense to try to support multiple ABI versions of the same grammar easily. If I understand things correctly, the cli supports generating artifacts for older ABI versions, and the lib also supports reading multiple ABI versions. If we made the ABI part of the grammar artifact crate name, it would be easy to provide a grammar for multiple ABI targets, and they could all be checked into the repo, or published to crates.io. Would that make sense to do? |
@maxbrunsfeld I think this is ready - I've rebased it for you and fixed conflicts, as well as updated the generating files code to update |
Awesome, thanks @amaanq . I guess let’s go for it. |
…end on Co-authored-by: Conrad <[email protected]> Co-authored-by: Marshall <[email protected]> Co-authored-by: Amaan Qureshi <[email protected]>
🥳 |
Update to tree-sitter v0.23.0. The change contains breaking changes, which are explained in the links below. What the update does is: it removes the dependency on tree-sitter from the bindings. Now clients can chose their own tree-sitter version to use the bindings with. Most changes are form the auto-updated bindings. This commit also fixes the Go build. See - alex-pinkus/tree-sitter-swift#435 - tree-sitter/tree-sitter#3069
Update to tree-sitter v0.23.0. The change contains breaking changes, which are explained in the links below. What the update does is: it removes the dependency on tree-sitter from the bindings. Now clients can chose their own tree-sitter version to use the bindings with. Most changes are form the auto-updated bindings. This commit also fixes the Go build. See - alex-pinkus/tree-sitter-swift#435 - tree-sitter/tree-sitter#3069
Update to tree-sitter v0.23.0. The change contains breaking changes, which are explained in the links below. What the update does is: it removes the dependency on tree-sitter from the bindings. Now clients can chose their own tree-sitter version to use the bindings with. Most changes are form the auto-updated bindings. This commit also fixes the Go build. See - alex-pinkus/tree-sitter-swift#435 - tree-sitter/tree-sitter#3069
Update to tree-sitter v0.23.0. This change contains breaking changes, which are explained in the links below. What the update does is: it removes the dependency on tree-sitter from the bindings. Now clients can chose their own tree-sitter version to use the bindings with. Most changes are form the auto-updated bindings. This commit also fixes the Go build. See - alex-pinkus/tree-sitter-swift#435 - tree-sitter/tree-sitter#3069
This new crate
tree-sitter-language
just provides aLanguageFn
type that grammar crates liketree-sitter-json
can create instances of. Formerly, those grammar crates depended ontree-sitter
itself, which was bad, because they had to depend on a specific version of the library, even though they don't use the library.This is a breaking change. Grammar repos will need to regenerate their rust bindings.