-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree-sitter fixes for December (including a PHP grammar!) #852
Tree-sitter fixes for December (including a PHP grammar!) #852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi this is cool! I have a couple of fiddly comments on some of the injections code, but nothing really major there.
As you know, I'm new to Pulsar and have always struggled trying to wrangle Atom builds with Electron. That said, I'm getting an error when I run this branch; it seems that the first layout of the file is sort of OK, but then subsequent updates don't change anything. This seems to also be affecting injections. I'm running the latest release (1.112.1) but I'm struggling to build from source, so it could be that this is not happening on master
, but only on 1.112.
When I open a php file w/ all the tree-sitter stuff turned on, I get this in the console:
Note that I get this 3-4 times per highlight (eg per keystroke). At this point, I've got like 1800 of them in the console for the little test file I pasted below.
And here it is pasted:
/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617 Uncaught (in promise) RuntimeError: abort(Assertion failed: undefined). Build with -s ASSERTIONS=1 for more info.
at abort (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617:29)
at assert (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:343:25)
at getDylinkMetadata (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:801:25)
at loadWebAssemblyModule (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:966:36)
at /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:2739:52
at WASMTreeSitterGrammar.getLanguage (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-grammar.js:139:24)
at LanguageLayer.update (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:3286:11)
at async Promise.all (index 1)
at LanguageLayer.update (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:3290:9)
abort @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617
assert @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:343
getDylinkMetadata @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:801
loadWebAssemblyModule @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:966
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:2739
Promise.then (async)
seek @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:2374
buildScreenLines @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/screen-line-builder.js:92
getScreenLines @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/display-layer.js:637
queryScreenLinesToRender @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:907
updateSyncBeforeMeasuringContent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:402
updateSync @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:279
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:225
performDocumentUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:264
requestAnimationFrame (async)
requestDocumentUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:250
updateDocument @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:208
scheduleUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:224
didRequestAutoscroll @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:2283
scrollToScreenRange @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:5143
autoscroll @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:795
changePosition @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:785
setScreenPosition @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:66
moveLeft @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:313
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:276
modifySelection @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:1222
selectLeft @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:276
backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:596
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1786
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1801
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1800
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/text-buffer.js:1320
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:2467
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1799
mergeSelections @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:4053
mergeIntersectingSelections @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:4015
mutateSelectedText @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1798
backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1786
object.<computed> @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/underscore-plus/lib/underscore-plus.js:77
core:backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:441
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:691
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/text-buffer.js:1320
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:2467
newCommandListeners.<computed> @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:691
handleCommandEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/command-registry.js:405
module.exports.KeymapManager.dispatchCommandEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/atom-keymap/lib/keymap-manager.js:617
module.exports.KeymapManager.handleKeyboardEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/atom-keymap/lib/keymap-manager.js:408
handleDocumentKeyEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/window-event-handler.js:153
Show 16 more frames
Here is my test file in TextMate mode:
And here it is in tree-sitter mode:
Note that:
- the
TODO
is highlighted - the phpdoc is not
- the heredocs are not highlighted as injections, only PHP strings
- the nowdoc is not highlighted even as a string?
After this first layout, if I add anything, it's not highlighted. See lines 10&11 in this pic:
I'll tinkering with this. Like I said, it may just be an issue on 1.112 and not on master. Or maybe just something on my system? I'm not in safe mode, but I did remove all plugins except a link to the php package.
atom.grammars.addInjectionPoint('text.html.php', { | ||
type: 'comment', | ||
language: (node) => { | ||
return TODO_PATTERN.test(node.text) ? 'todo' : undefined; | ||
}, | ||
content: (node) => node, | ||
languageScope: null | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 3 separate comment
injections, all of which look very similar aside from the injection language they're looking for and some syntax. I think that these can be consolidated into 1, what do you think?
(The only part that's new to me and that I'm unfamiliar with is languageScope: null
and what that does; I see that it's null
for the first 2 injections (TODO and hyperlinks), but not for phpDoc. Maybe that means that it's not possible to consolidate these?)
atom.grammars.addInjectionPoint('text.html.php', { | |
type: 'comment', | |
language: (node) => { | |
return TODO_PATTERN.test(node.text) ? 'todo' : undefined; | |
}, | |
content: (node) => node, | |
languageScope: null | |
}); | |
atom.grammars.addInjectionPoint('text.html.php', { | |
type: 'comment', | |
language(node) { | |
if (TODO_PATTERN.test(node.text)) return 'todo'; | |
if (isPhpDoc(node)) return 'phpdoc'; | |
if (HYPERLINK_PATTERN.test(node.text)) return 'hyperlink'; | |
}, | |
content: (node) => node, | |
languageScope: null | |
}); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can't be consolidated because we might need to inject more than one of them into each comment. If we want both TODO
s and URLs to be highlighted inside an ordinary PHP comment, we need to create two injection points, or else only one or the other will be picked.
languageScope: null
means that the root scope of the injected grammar (the one specified in the grammar's CSON file) isn't actually added to the injection's ranges. This makes sense for the TODO and hyperlink injections because there's no point in adding a text.todo
scope or a text.hyperlink
scope into the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inject more than one of
Oh! I never even considered that. It's really cool that that works!
I kept
Yes, I think that this is the case. I poked around and realized that the error is coming from the |
You don't need to build, luckily. You just need to check out the repo, do a |
OK, addressed feedback. Also updated the |
👍 Now that I have my build fixed, I'll play with this PR with some of my PHP files and see if anything turns up. This is really great, thank you! |
OK, so you did ask me to be a little nitpicky. 😄 Here's a round of things I noticed after taking screenshots of a file I loaded up and doing a visual diff. Changes that don't affect highlighting
Changes that do affect highlightinguse Exception;
use Foo;
use Illuminate\Support\Arr;
Filestext mate screenshot example code from screenshots<?php
namespace SDK;
use Exception;
use Foo;
use Illuminate\Support\Arr;
use Illuminate\Support\Collection;
use Illuminate\Support\Str;
class Coupon
{
private string $code;
private string $description;
private string $expires;
private bool $single_use;
private ?int $amount_flat;
private ?int $amount_percent;
private ?bool $free_shipping;
private ?int $minimum_subtotal;
private null|string|array $branch;
private ?string $country;
private static array $config;
private function __construct(array $attributes, string $code)
{
foreach (get_class_vars(get_class($this)) as $key => $value) {
switch ($key) {
case 'config':
break;
case 'code':
$this->$key = $code;
break;
default:
$this->$key = $attributes[$key] ?? null;
}
}
}
public function __get($name)
{
1111111;
if ($name === 'config') {
throw new Exception('Call Coupon::find() or ::getConfig() instead');
}
return $this->$name;
}
public function __isset($name)
{
$result = null;
try {
$result = $this->__construct($name);
} catch (Exception $e) {
// Swallow the exception the __get() can throw.
} catch (Exception) {
// Swallow the exception the __get() can throw.
}
return isset($this->$name) && !is_null($result);
}
public static function find(string $code): ?self
{
$normalizedCode = Str::upper(trim($code));
$code_lookup = self::getConfig()[$normalizedCode] ?? null;
if (!$code_lookup) {
return null;
}
return new self($code_lookup, $normalizedCode);
}
/** @return Collection<Coupon> */
public static function all(): Collection
{
function($item) { $item };
return collect(self::getConfig())->map(
fn($config, $code) => new self($config, $code),
);
}
} |
BTW, I was playing with the tests that I originally wrote for my old PR at atom/language-php#438 and I was able to get them working w/ this WASM grammar. In case they may be useful, they allow specs to be written like this: it("should tokenize = correctly", async () => {
await editor.setPhpText('$test = 1;');
expect(editor).toHaveScopesAtPosition([1, 0], '$', ["source.php", "variable.other.php", "punctuation.definition.variable.php"]);
expect(editor).toHaveScopesAtPosition([1, 5], ' ', ["source.php"]);
expect(editor).toHaveScopesAtPosition([1, 6], '=', ["source.php", "keyword.operator.assignment.php"]);
expect(editor).toHaveScopesAtPosition([1, 8], '1', ["source.php", "constant.numeric.decimal.integer.php"]);
expect(editor).toHaveScopesAtPosition([1, 9], ';', ["source.php", "punctuation.terminator.expression.php"]);
}); And then failures look like this:
Happy to share if that seems like it may be helpful. |
OK, the main takeaway here is that a lot of these decisions will end up being different from what the TM grammar did. Part of the purpose of this whole thing is that it’s a rare opportunity to flex some muscle and enforce more scope consistency across languages than what we’ve had in the past. Consistency means that syntax theme authors can make high-level decisions about colors without having to check everything in every language and add a ton of language-specific exceptions. I wrote up a whole reference document for scopes and probably should’ve pointed you to it before this review. It’s also a chance to break with some bad conventions that had emerged and enforce some clarity and separation. For instance, most community TextMate grammars had decided that all function calls — not just function definitions — would be scoped as My tactical rules were something like:
Your list is great for helping me realize where I’d missted stuff, so thanks a bunch. There are still a lot of places where I deviated from what the TM grammar did. If the outcome is nonetheless catastrophic for your editor experience, that’s the point at which we’d talk about changes to the theme. (We made a handful of changes to built-in themes to ensure continuity even when scopes changed.) Here’s what I addressed (changes soon to be added to the PR):
Open to more dialogue here, so let me know if any of this feels flat-out wrong. And thanks again; I wish all language packages could have someone advocating for them like this. |
I might take you up on that. Hardly any of the language packages are testing the new grammars because I didn't want to codify a bunch of tests while scope decisions were still up in the air. There are other styles of scope testing that have historically been used, but I don't love them for various reasons. |
This is very cool. Between atom/language-php#303 and atom/language-php#438, I've been looking forward to this for a long time and I'm really excited to see it happening! I drove with this today and it seemed stable and consistent. I found a few more small issues, and have a few questions, too. Scope issues I noticed
Classes scoped differently in different placesTake this example: function (Aaa $x): Bbb { Ccc::x(); }
use Ddd;
use Eee\Fff;
class Ggg extends Hhh implements Jjj {} In TM:
In this PR:
I'm also noting that the highlighting here in github doesn't really line up with either of these screenshots. 😆 Responses
This makes sense and I'm totally on board with this.
No, not at all. And if it changes something, it think that just means that my theme needs a post-merge update. 😄
I think how you handled this (adding them as a regex) makes sense. It's a shortish, well-defined list that doesn't change very often.
Oops. I had this wrong ... I think I copied from the wrong thing. The visual changes I'm seeing aren't a big deal; I'm not sure the TM highlighting is even right so much as it's just what I have to compare against. 😄 In my theme, here's a few examples of how these are a little all over the place in TM (w/ my theme):
OK, I think this makes sense. Question though, why would
You had me scared for minute 😄, but no, that's the official name of them in phpdocumentor. (See docs or tag reference docs at phpdocumentor.) I don't have a strong opinion here; I can see why TM called them keywords , but I think that Thank you again for working on this! |
No! Let's fix it.
Ugh. I'm looking around for what we do for similar constructs in other languages and there is just no consistency or logic. You're right that
Yes. I've never understood the argument that My feeling is that I want Consider this block on which I’ve purposefully omitted syntax highlighting:
At a glance it’s a bunch of words and some punctuation. Obviously they won’t all have the same name unless you’re a masochist, but the more hints you have to tell things apart, the better. The If that’s not how some people approach highlighting, that’s fine; this approach doesn’t preclude other philosophies. A syntax theme can choose to scope Scoping is about assigning meaning, and the aesthetics come later. I would be happy to get some more detail onto the type annotation scope names if we can figure out how; since |
Whoops, I missed the whole first section of your last comment. Looking again now. |
Quite so. Changing!
Yeah, I changed the syntax for interpolating the node type a long time ago and this was a weird holdover. Well spotted!
Also addressed. |
These cover some of the cases mentioned at pulsar-edit#852 (comment)
Oh, one more before I turn in: |
PHP is bizarre. Just ensured |
These cover some of the cases mentioned at pulsar-edit#852 (comment)
* Show `def self.foo` methods as `self.foo` in the symbols list. * Highlight the `foo` in `def self.foo` as a method name. * Properly highlight `..` and `...` range operators. * Highlight keyword parameters as `variable.parameter.keyword`.
When the cursor is placed right where an injection _starts_, and the user presses Return, we usually shouldn't use the injection layer for hinting.
…when auto-indenting the entire buffer.
…when straddling injection boundaries.
Spotted it falling down on a very large JSON file I had. Seems to be fixed on `master`. Not sure which verison I originally built it from.
16a6b4b
to
50bfa51
Compare
Just rebased after landing #855. I hope it didn't make anything explode. |
I have placed two small |
…when destructuring an array.
Hi there, I think the recent rebase had a few casualties, at least in the PHP grammar: https://github.com/pulsar-edit/pulsar/compare/16a6b4b577366f5b1f09ba7698bebd53e30e7dfd..50bfa5141eb2cb95a35c8375b5e003fdf43ef854#diff-9d0f8cee91ca39fc6d6b6e98a21367194ae76b7df77aa5f7bd6b26f1cabd952b I wasn't following the rest of the changes in this PR, so I didn't check on them. Also, I've been crash coursing myself through tree-sitter grammars and I have a PR that I will try to push soon that adds and fixes a few things. I'll target it at this one for your review. |
Yup. Not sure how that happened. Let me try to apply those commits again. |
Amazingly, I think it was just one commit that got skipped somehow. Thanks for catching it. |
Thank you again for all of your work on this, and for putting up w/ my constant nitpicking on behalf of PHP. 😆 FYI I opened savetheclocktower#2 w/ some updates to this. It's targeted at this branch. |
Another question on scope decisions: in namespaces, why is |
These cover some of the cases mentioned at pulsar-edit#852 (comment)
No, it's just a difference of opinion. You'll just as often see stuff like that scoped as I was on the fence about this until my experience with the JavaScript Likewise, in PHP, once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple very minor updates to the PHP grammar, then I promise I'll be quiet about this so you can land it. 😄
"final" | ||
"implements" | ||
"namespace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] is this because namespace
controls how we address the class in usage (eg by changing its namespace)? I would have pegged it more as a keyword (which is how TM scopes it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace
declares an entity. Just like class
or function
. If those are storage
, so is namespace
.
[ | ||
"&&" | ||
"||" | ||
"??" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is correct. ??
isn't a logical operator like &&
and ||
. I suggest moving it to keyword.operator.comparison.php
. [docs]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if you agree, then it looks like <>
and <=>
could also be added as comparison operators: https://www.php.net/manual/en/language.operators.comparison.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ||
is a logical operator, so is ??
. The former will return the left-hand side unless it's falsy. The latter returns the left-hand side unless it's null
.
<>
and <=>
, on the other hand, are comparison operators, so I'll add them as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, as far as I can tell! I have several small items for improvement, but I don't know that they are critical.
I didn't look at the symbols-view changes very hard, because I'm only familiar with the tree-sitter stuff, and new to it at that. The indent changes seemed to make sense from a read through, but I didn't test them specifically. (Except that I've been driving with this branch this week, and it's been good, and better every day.)
Suggestion: the Release Notes should may also mention that many other languages got new TODO and url highlighting.
@@ -0,0 +1 @@ | |||
; placeholder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] In the new php grammar, empty.scm
, folds.scm
and highlights-html.scm
are all empty. Can they be removed instead of left empty? Or is there a strategic reason to have them empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, shit, folds.scm
shouldn't be empty. I just forgot about it.
highlights-html.scm
doesn't need to exist, and empty.scm
is empty on purpose, as the name should hint at :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to answer why we need empty.scm
: I thought I'd made it so that no queries were mandatory, but Pulsar complained when I tried to omit highlightsQuery
, so I'm doing an empty one for now and I'll fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added folds queries for everything I could think of. All classes, functions, control loops, conditionals, and enums should be foldable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to answer why we need
empty.scm
: I thought I'd made it so that no queries were mandatory, but Pulsar complained when I tried to omithighlightsQuery
, so I'm doing an empty one for now and I'll fix it later.
Might I suggest that we add this as a TODO, just in case someone is searching for TODOs in the project, they can see that it should be removed eventually? Might help future-you
packages/language-php/grammars/tree-sitter/queries/highlights.scm
Outdated
Show resolved
Hide resolved
(tag_name) @entity.name.tag.phpdoc.php | ||
(named_type) @storage.type.instance.phpdoc.php | ||
(variable_name) @variable.other.phpdoc.php | ||
(uri) @markup.underline.link.phpdoc.php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience so far, the phpdoc highlights have been working really well as is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, for what that's worth, and with the previously stated caveats re symbols view etc. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't have any code that I can test any of these changes with, however, glancing through the changes there were not any obvious errors.
Most of the changes seem to be the additional of TODO and url highlighting, as well as Tree-Sitter scope standardization
I had a single question which should be posted in a comment after this review is posted.
From a theoretical reading, this change seems fine and the package tests succeeded as well.
NOTE: I could spin the new PHP grammar off into its own PR, but I'm not sure what the point would be. The rest of this description is for the PHP grammar itself; read the other fixes’ commit descriptions for further detail.
Description of the Change
This is the modern Tree-sitter PHP grammar that’s been sitting on my hard drive for months.
The base PHP grammar is
text.html.php
. Anything that looks like HTML gets handled in an HTML injection. Thetree-sitter-php
parser is weeeeeird in how it decides to arrange and hierarchize nodes, so in order to properly identify PHP blocks (the parts between, and including,<?php
and?>
), we’re re-injecting the PHP grammar into itself.Why is this necessary? For the same reason that there’s no simple way to describe the group consisting of you, your second cousin once removed, your great grand-uncle, and your three unmarried aunts on your father’s side. There’s no name for that group because that’s just a random sampling of your family tree. Now pretend that you had to create one contiguous region of the buffer from the boundaries described by six seemingly random nodes. Same concept!
The
least worstbest way to do this is viaaddInjectionPoint
, and even then I had to invent a couple of features to pull it off — specifically theincludeAdjacentWhitespace
option toaddInjectionPoint
and the ability to apply an injection’s root scope to its “content ranges” rather than the entire extent of its injection.Appreciations go to @claytonrcarter for wandering into our Discord and causing me to become aware of his Tree-sitter parser for PHPDoc. We now have good highlighting of documentation comments in PHP, much like the JSDoc injection into JavaScript/TypeScript documentation comments.
I had to rebuild
web-tree-sitter
to add another C stdlib function, so I also took the opportunity to move us to version 0.20.8. This is a modified Tree-sitter version that has some extra externals as described here. It also includes a fix that’s currently pending on the maintree-sitter
repo (authored by yours truly) to make some of the newly-introduced predicates work correctly in theweb-tree-sitter
bindings.Alternate Designs
Technically, the PHP-injected-into-itself layer doesn’t need to do any parsing. All the parsing is already handled by the base layer. The purpose of creating an injection layer here is just to identify the buffer ranges that need to be annotated with
source.php
, so once the injection is in place, the parser might as well be a no-op. Since that doesn’t exist, I’ve just chosen to have the injection usetree-sitter-php
.In theory, this means some unnecessary work, but it shouldn’t have any meaningful impact on user experience. I’d care more about finding a solution to this if I thought that solution would have any applicability in any scenario other than this one strange one.
Possible Drawbacks
There’s an annoying thing I realized about the
tree-sitter-html
grammar only today: it assumes it’s looking at an entire HTML document. Open up a blank document, set the grammar to HTML (either modern or legacy Tree-sitter), then type</div>
. It won’t highlight until you put an opening<div>
before it!I only realized this now because I was doing sanity checking on my local WordPress codebase. WordPress themes often break up parts of the page between different PHP includes, so a file is not guaranteed to have a
<div>
for every</div>
, and vice versa. Otherwise this isn’t really PHP’s fault or the fault of this new grammar.This is pretty silly on
tree-sitter-html
’s part and I might venture over there to see if this has been discussed.Verification Process
Check out this PR and throw the weirdest PHP files you have at it. Make sure modern Tree-sitter grammars are enabled as described here, including the
grammar-selector
setting change so that you can switch between TextMate and Tree-sitter grammars.Compare the old grammar and the new grammar in how they highlight code on all bunled syntax themes.
Release Notes
Added a modern Tree-sitter grammar for PHP.