Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate treesitter grammar #14527

Open
vzarytovskii opened this issue Jan 2, 2023 · 21 comments
Open

Generate treesitter grammar #14527

vzarytovskii opened this issue Jan 2, 2023 · 21 comments
Labels
Area-Tooling Generic tooling bugs and suggestions, which do not fall into any existing category. Feature Request help wanted
Milestone

Comments

@vzarytovskii
Copy link
Member

vzarytovskii commented Jan 2, 2023

Is your feature request related to a problem? Please describe.

Currently, more and more tooling and editors are relying on treesitter for navigation, parsing and semantic highlighting (e.g. in-browser VScode, nvim, github,), we should provide TS grammar for F#.

Describe the solution you'd like

TS grammar should be (if possible) generated from our fsl/fsy and hosted in the repo.

Links
Treesitter docs: https://tree-sitter.github.io/tree-sitter/
Existing grammars, incl. some ws-sensitive:
OCaml: https://github.com/tree-sitter/tree-sitter-ocaml
Python: https://github.com/tree-sitter/tree-sitter-python
Yaml: https://github.com/ikatyang/tree-sitter-yaml
Haskell: https://github.com/tree-sitter/tree-sitter-haskell

@github-actions github-actions bot added this to the Backlog milestone Jan 2, 2023
@vzarytovskii vzarytovskii added Area-Tooling Generic tooling bugs and suggestions, which do not fall into any existing category. help wanted labels Jan 2, 2023
@ShalokShalom
Copy link
Contributor

Helix also uses it; exclusively .

@auduchinok
Copy link
Member

auduchinok commented Jan 2, 2023

@vzarytovskii Do you have any thoughts about how that would work with the lex filter?
Perhaps, we could look at the Python implementation for the inspiration, as it also has whitespace-sensitive syntax.

@vzarytovskii
Copy link
Member Author

@vzarytovskii Do you have any thoughts about how wold that work with the lex filter?
Perhaps, we could look at the Python implementation for the inspiration, as it also have whitespace-sensitive syntax.

Yeah, no specific ideas just yet, probably should figure it out when we'll start working on it.

@vzarytovskii vzarytovskii changed the title Generate treesitter grammar from fslexyacc output Generate treesitter grammar Jan 2, 2023
@Eliemer
Copy link

Eliemer commented Jan 27, 2023

How can someone help to get this started? Interested in contributing

@adelarsq
Copy link

@Eliemer These documents has some context on how to proceed https://tree-sitter.github.io/tree-sitter/creating-parsers

@NatElkins
Copy link
Contributor

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Jan 29, 2023

https://github.com/baronfel/tree-sitter-fsharp

I am aware of this grammar, but if you look at the README, you'll see that it does not cover all language features and whitespace-sensitive aspect.

Generating it from fslexyacc files and lexfilter (if possible of course) has a benefit of having it always up to date when we are updating it with new features.

@ShalokShalom
Copy link
Contributor

On my endeavour to find an ANTLR grammar for F#, I discovered a few things, who might be interesting. First, there are a gazillion similar formats, obviously. 😊

So, I digged deep into this ecosystem and there are all sorts of compiler in every direction, some are more maintained than others.

As an example, I discovered an EBNF <--> Treesitter compiler .

And there is a similar project, that goes only from Treesitter to EBNF, and it shows an already a generated EBNF file for OCaml:

https://github.com/mingodad/plgh/blob/main/tree-sitter-ocaml.ebnf

So, what's obvious, I think, is that EBNF is a considerably easier format, I think.

So, at that point it seems that editing the existing EBNF of OCaml and than translating it to Treesitter might be an option. 🤷🏻‍♂️

I dont know, how it compares to generating from Yacc and Lex 🙈

I also found a couple of other, very interesting projects, and they would help to generate an ANTLR file, that I strife to create for OneDev.

So if going the route from EBNF to Treesitter sounds acceptable, would this provide a path for both, Antlr and Treesitter.

P.S:

And if that all doesn't help, I also stumbled across a couple of articles, who might help to implement treesitter directly, and understand its format.

https://derek.stride.host/posts/comprehensive-introduction-to-tree-sitter

https://gist.github.com/Aerijo/df27228d70c633e088b0591b8857eeef

@vzarytovskii
Copy link
Member Author

Ocaml syntax does not account for whitespace sensitivity (i.e. lexfilter), so won't be much helpful here unfortunately.
I think, if we don't want to straight up generate it, but write a grammar manually first, we should be looking one for python.

@ShalokShalom
Copy link
Contributor

Yeah, I actually considered another way now.

Going from .fsy to EBNF and then to Treesitter.

This doesn't involve OCaml at all.
I will try to get this running soonish.

@vzarytovskii
Copy link
Member Author

Yeah, I actually considered another way now.

Going from .fsy to EBNF and then to Treesitter.

This doesn't involve OCaml at all.
I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

@Nsidorenco
Copy link

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

@vzarytovskii
Copy link
Member Author

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

Nice

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Feb 9, 2023

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

I would like to help with testing and improving it.
@Nsidorenco do you have any to-do things in mind (or are ones in README up to date)? I can start using it in my day-to-day work with compiler and maybe also start fixing things.

@ShalokShalom
Copy link
Contributor

Yeah, I actually considered another way now.
Going from .fsy to EBNF and then to Treesitter.
This doesn't involve OCaml at all.
I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

How is whitespace significance breaking either of the protocols?

Or do you think its lost in the translation?

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Feb 9, 2023

Yeah, I actually considered another way now.
Going from .fsy to EBNF and then to Treesitter.
This doesn't involve OCaml at all.
I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

How is whitespace significance breaking either of the protocols?

Or do you think its lost in the translation?

Yeah, I think there's a possibility of losing a bunch of info during conversions. Besides fslexyacc alone doesn't carry the indent/whitespace info.

@ShalokShalom
Copy link
Contributor

Yeah, I will see.

Considering Python is popular, do I guess this info is not being lost.
The Yacc > EBNF converter is not updated since 2 years, the EBNF to Treesitter converter is very well maintained.

Besides fslexyacc alone doesn't carry the indent/whitespace info.

What else does?

Chet told me, the files are at the compiler repo:

https://github.com/dotnet/fsharp/blob/main/src/Compiler/pars.fsy
https://github.com/dotnet/fsharp/blob/main/src/Compiler/pppars.fsy

@Nsidorenco
Copy link

@vzarytovskii any help is much welcomed.
the README is relatively up-to-date. Off the top of my head the biggest remaining parts are

  1. testing
  2. improve the precedence of rules (to reduce parser size)
  3. adding missing language features, like annotations
  4. improve the external scanner to open a new ident scope on brackets and braces

@vzarytovskii
Copy link
Member Author

What else does?

lexfilter in the repo

@ShalokShalom
Copy link
Contributor

Yeah, I already found your previous comment on Discord about that, many thanks.
I think Nsidorenco is already very far, so generating seems to serve no purpose at this point.

@Nsidorenco I am testing it with Helix, but I am unsure why it currently fails. So I cant provide you any meaningful feedback as of now, and hope I can do so in the future.

Thanks a lot for developing this, you`re great 🥳

@vzarytovskii
Copy link
Member Author

The easiest way to be testing it, subjectively, is with nvim-treesitter and nvim-treesitter/playground, it has a great way of visualizing the tree (probably prim-types.fs is an overkill of a test, since FSharp.Core is a bit special):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Tooling Generic tooling bugs and suggestions, which do not fall into any existing category. Feature Request help wanted
Projects
Status: New
Development

No branches or pull requests

8 participants