-
Notifications
You must be signed in to change notification settings - Fork 120
Update draft tree-sitter grammar in #303 #438
base: master
Are you sure you want to change the base?
Update draft tree-sitter grammar in #303 #438
Conversation
Oh, hmm. Should I not be including the merge commit from
All of the other changes are from master... |
Also, any assistance on debugging the window's builds would be much appreciated. Specs are passing on linux and macOS, but it looks like something is causing the tree-sitter grammar to not load on Windows because the specs appear to be using the existing TM grammar instead of the TS grammar. Possibly related: atom/atom#22675 |
Best would be to rebase changes after current master's head. Just make sure you don't change commits from master. Easiest way which comes to mind: You can make patch from your commits (you can squash them) apply onto original And make PR to master. About windows build: cleanup commits and I'll try this on my end, but you should not be concerned about that if this is atom's problem. |
d283f7a
to
48fa0fb
Compare
OK thanks! I've rebased this branch (including original commits from #303) onto B/c this is still draft, I would prefer to keep my commits unsquashed as I work, but I agree that they should be squashed before (if?) this gets merged. |
Looks good now. |
Seems like windows problems are caused by different node versions used for tree-sitter-php compilation. It will probably solve itself when that package will be bundled with atom itself, what will happen after update of package.json on master. |
I'm ported over as many of the existing tests as possible, and things are largely working OK, aside from the BCs mentioned in the description, many of which are out of our control at this point. Two things that are in our control, though are:
|
About punctuation problems: get a few most popular themes, get some php source codes like symfony, and check if it's looking ok. TM has more scopes that is ever used, so if not needed - move on.
Function and constant names are out of date by now. I'm thinking about regenerating these from php source. For the first release could be probably skipped. I'll try to get tree-sitter running on my machine to help with some grammar, but maybe you could get some help in |
Hi, quick update: this seems to be working just fine for day-to-day use for non-template PHP files. I'm playing w/ how to support HTML+PHP in the same document. More on this later, but the main scope of this grammar appears to need to be I'm also playing w/ a I'm working on developing a tree-sitter phpdoc parser so that we can get phpdoc scopes working. There was already some code out there (https://github.com/john-nguyen09/tree-sitter-phpdoc) that has gotten me started. So far this is also working fine with the exception that the parser is looking for "correct" syntax, which means that certain things the current PHPdoc tests were allowing won't work b/c they were actually invalid phpdoc. (Ie Now I have an important question I need some input on: unless we get a BC release of tree-sitter-php, this PR will be stuck w/o support for lots of modern PHP features until Atom updates it's tree-sitter support. As an alternative, we could maintain a lightweight fork of tree-sitter-php that simple downgrades the version of tree-sitter that is used to generate the grammar, thereby allowing the modern syntax from tree-sitter-php@master to work to work w/I current Atom. I don't love the idea of maintaining fork like that, but I also doubt that it would be too much work, and I like that it would remove one (of many) blocking issues related to this work. (I'm using this approach locally to add scopes for arrow functions, etc and it's working just fine.) What do you think?
OK, maybe what I'll do is just manually add the built-ins that I encounter as I work just to get started, when we can work on a more robust a/o long term solution for later releases. |
Another quick update: this is not finished, but it has been my daily PHP driver for ~4 months now and -- as far as I can tell -- seems to work just fine. There are still a number of regressions between the TS grammar and the TM grammar, and most of these are noted in the PR description. There are 2 main things that have changed since I last posted an update:
I think that these are mostly working correctly, but there are a number of differences from the TM grammar that are flagged in the NOTE/WARNING/WATCH OUT
I know that this is not maintainable in the long run, but it felt like a decent way to get the ball rolling until something more sustainable can be figured out. A few months have passed since I worked on this code, and I see now that there are a few updates to bring in from @sadick254 @darangi Would greatly appreciate any input or guidance on this, if you can spare the time. |
It looks like CI is failing, but the tests are passing for me locally ... of course. 🙄 I'll try to look into it, but I won't have time to do so today. |
I've been chipping away at some issues w/ the embedding grammar the past few days. This isn't something that I've looked at much before, but I'm tinkering w/ having some add'l tests to confirm that the embedding tree-sitter grammar is up to snuff w/ the embedding TextMate grammar. In particular, shebangs (eg |
@claytonrcarter I came here just by coincidence from the linked issue in the ts php parser. But I stumbled over hash bang handling in the js parser yesterday: https://github.com/tree-sitter/tree-sitter-javascript/blob/master/grammar.js#L92 Maybe it helps. |
Also remove scope for variable and for syntax-error
Fixes scopes of anonymous nodes Scopes constant.language.php for language constants Scope the class of catch expressions as a class name Change object_creation_expression to scope as entity.name.type.class
This uses a custom, "off label" package that is just the up-to-date tree-sitter-php rebuilt to use an older tree-sitter ABI that is compatible w/ the older verion that ships w/ Atom.
These are exhaustive to tests ported from the TextMate grammar test for maximum compatibility. Never-the-less, there are still lots of FIXME comments to flag area that still need improvement or work.
84666a6
to
5a7255d
Compare
Just rebased this to update it and squashed my ~70 commits down into a more reasonable handful. Though it's not without some issues, this is still my daily driver. |
Requirements
Description of the Change
mb-tree-sitter-parser
branch from #303. If this is merged, #303 can be closed.master
, resolving conflictstree-sitter-php
to v0.16.2source.php
instead oftext.html.php
. This matches the TM grammar and fixes things like snippets.I say that this "finishes" the tree-sitter grammar because there are still many current lanuage features that aren't available in
tree-sitter-php
0.16.2. Updating to 0.19 (which is itself currently blocked by atom/atom#22130) won't even help w/ these, as many of them have only been added to TS-php recently. Until this is resolved, the following language features are being parsed as errors:??=
, and moreCurrent Status
This is still very much⚠️ draft ⚠️ , but appears to be usable enough for day to day work (at least for me). Most of the TM grammar specs were ported to TS w/o issue, though many were left out b/c of the current status of
tree-sitter-php
. I am using this for my day to day work, but otherwise I would suggest that it's blocking on atom/atom#22130 and on a new release oftree-sitter-php
that includes all of the modern language goodies.TODO
tree-sitter
tree-sitter
, not this grammar.)//
,/*
,*/
and/**
\
(as inFoo\Bar
)"
and'
.
in1.23
Breaking Changes:
scopes for opening & closing punctuation: tree-sitter grammar includes opening & closing punctuation in the meta scopes, while the original TM grammar does not. For example, in
function foo($arg) { echo 'bar'; }
, the TM grammar applies scopemeta.function.parameters.php
to$arg
andmeta.function.body.php
toecho 'bar';
, while this TS grammar applies scopemeta.function.parameters.php
to($arg)
andmeta.function.body.php
to{ echo 'bar'; }
. This is due to how TS parses and not b/c of any decision of mine.In order to maintain BC on this, we would need to select "everything but the opening and closing punctuation (brace, paren, bracket, etc)" and I don't believe that this is possible w/ Atom's current TS selector implementation. This applies to lots of things: class definitions, use declarations, arrays, function/method parameters, etc.
scopes for semicolons: same as above. The TM grammar did not include the semicolon in it's meta scopes, but TS does include them in statements. This means that, for example,
use A;
will now havemeta.use.php
applied to the semicolon, whereas the TM grammar would stop that scope atA
.scopes for typehinted parameters: the TM grammar supports 3 scopes for function/method parameter defns:
meta.function.parameter.no-default.php
when no default value is provided,meta.function.parameter.default.php
when adefault value is provided, and
meta.function.parameter.typehinted.php
when a typehint is provided and which overrides the previous 2 regardless of whether a default value has been provided. Because of how TS parses, I don't know how to duplicate this exact behavior. Instead, this grammar leaves thedefault/no-default
scopes in place for these parameters and then applies an additionalmeta.function.parameter.typehint.php
scope just to the typehint.For example, in
function foo($arg1, $arg2 = 2, int $arg3, int $arg4 = 4) {}
, the TM grammar would scope the first param asno-default
, the second asdefault
, and the 3rd and 4th would both be scoped astypehinted
. This TS grammar instead scopes the 1st and 3rd params both asno-default
, the 2nd and 4th asdefault
, and then applies thetypehint
scope only to theint
typehints.In order to maintain BC on this, we would need to select "parameter nodes that contain a typehint node" and I don't believe that this is possible w/ Atom's current TS selector implemenation.
heredoc and nowdoc indetifiers: as far as I can tell, TS does not expose the heredoc/nowdoc identifiers as a named node, which makes it impossible to scope them as the TM grammar does. For example, in
<<<HEREDOC
, TM would scopeHEREDOC
askeyword.operator.heredoc.php
, but this grammar is unable to do so. The overall heredoc/nowdoc string is scoped correctly, but the identifiers are no longer scoped.string interpolation: along w/ newer language features, string interpolation is currently unsupported by TSPHP, so is not supported here either.
PHPDoc &
language-todo
: PHPDoc and TODOs are no longer highlighted. Regarding PHPDoc, see issue @ TSPHP regarding PHPDoc. Basically, the recommendation is to use a separate tree-sitter parser just for phpdocs. As forlanguage-todo
, it looks like it just doesn't work w/ any tree-sitter grammars, according to this long standing issue. There are initial drafts of TS parsers for phpdoc and TODO, but I haven't explored what it would take to get them updated ad usable for this.Alternate Designs
The main alternate I thought about was leaving the new specs in Coffeescript to match the existing specs. They would probably be alot easier to read in Coffeescript, as JS+prettier is turning 1 line of CS into 5-8+ lines of JS.
Also, this PR is attempting to maintain the legacy/textmate scopes as closely as possible. At the risk of introducing some breaking changes, we could instead follow some of the other tree-sitter grammars and implement a much simpler, stripped down set of scopes. For example, the JS and C TS grammars scope all operators as
keyword.operator
(+/- a language scope like.js
) while this grammar is applying add'l scopes like.arithmetic
,.assignment
, etc based on the type of operator. Another example: the JS grammar seems to scope all parens/brackets/braces the same, while this grammar applies add'l scopes based on if the punctuation is used in a function, class, method, use, namescape, etc. Not bad, per se, but it does set this grammar apart from the others a little bit, as far as I can tell. See this draft RFC: atom/atom#19623 and this issue for more background: atom/language-javascript#649Benefits
Finishes up a long-standing PR; speeds up PHP highlighting, folding, indenting, etc in Atom.
Possible Drawbacks
Until atom/atom#22130 is resolved, and
tree-sitter-php
is released w/ support for newer language features, the tree-sitter grammar will lack support for newer language features that are already supported by the existing TM grammar. This could be seen as a regression.Applicable Issues
Previous PR this is based on: #303
tree-sitter-php
issue to add full language support: tree-sitter/tree-sitter-php#58PR to update Atom to
tree-sitter
0.19.0: atom/atom#22130