-
-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix errors in grammar syntax #416
Conversation
I have absolutely no idea how this happened, but there were a ton of regex special characters that were missing the second backslash(they need to be double escaped because reasons). I recopied the grammar from @AlfishSoftware's extension, fixed our handful of differences, and it's working again. I replaced the invalid lookbehinds and it seems to work. Next, I'll run |
I followed the steps to set up linguist, ran
@lildude does this mean that your compiler is happy now? @Calinou I think this is done, but it might be safer to wait for the thumbs-up from the |
By the way, I didn't test it in VSCode (no time at the moment), so I don't know if the dot changes affect |
I'll definitely double check that, thanks for the tip. |
It was doing exactly this. Fantastic catch, as usual. |
Oh. I just realized review comments aren't published right away, it seems? Sorry about that, I guess you didn't see the first one (marked outdated) before. |
I replied to every comment I can see, and I marked the first one as 'resolved' after I (tried to) fix the relevant bits. If there are other comments I didn't interact with, then I definitely blame github's rather obtuse UI. |
94fdc33
to
8f318f9
Compare
syntaxes/GDScript.tmLanguage.json
Outdated
"match": "\\b(?<=for\\s[\\w]*\\s)(in)\\b", | ||
"match": "\\b(?:for\\s[\\w]*\\s)(in)\\b", | ||
"name": "keyword.control.gdscript" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test this, but I suspect that it will now consider the whole thing as a keyword.
E.g. for i in array
then for i in
is a keyword. It shouldn't mark the loop variable (i
in this case) as a keyword.
The correct thing to do here is to join this with the for
keyword so that one rule recognizes both keywords. You don't need to remove for
from control_flow
(for
should still be a keyword even before in
is typed) as long as this is before so it has higher priority; but change this rule to something along the lines of:
{
"match": "\\b(for)\\s+([a-zA-Z_]\\w*)\\s+(in)\\b",
"captures": {
"1": { "name": "keyword.control.gdscript" },
"2": { "name": "variable.name.gdscript" },
"3": { "name": "keyword.control.gdscript" }
}
},
I changed \w
(which matches [a-zA-Z0-9_]
including numbers) since I believe identifiers can't start with a number (correct me if I'm wrong). It's probably worth checking this on other places in the code too. I'm assuming identifiers allow ASCII only.
I also allowed more than one space in the syntax.
Btw, using a tool like https://regex101.com/ while you write it really helps with understanding nuances in regexes and checking that it will really do what is expected, and prevent against corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot believe that I missed this. You were correct, i
was purple.
Your rule worked great. I removed the middle capture group and it's name block, because we don't actually use the variable.name
tag for variables. Python doesn't tag variable names, they're just white, so that's the behavior I want to imitate.
I changed \w (which matches [a-zA-Z0-9_] including numbers) since I believe identifiers can't start with a number (correct me if I'm wrong). It's probably worth checking this on other places in the code too. I'm assuming identifiers allow ASCII only.
Many other places used the full form [a-zA-Z_][a-zA-Z_0-9]*
, but I just replaced them with [a-zA-Z_]\\w*
. I wish we could give a block like this a custom name so we don't have to copy it everywhere.
Btw, using a tool like regex101.com while you write it really helps with understanding nuances in regexes and checking that it will really do what is expected, and prevent against corner cases.
I started off doing this, but having to add all the extra escapes to move a rule over to vscode was too annoying so I stopped. I've thought about switching to yaml files, since the regex strings won't need escapes, but there was some problem with that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth checking this on other places in the code too.
Many other places used the full form
[a-zA-Z_][a-zA-Z_0-9]*
, but I just replaced them with[a-zA-Z_]\\w*
.
I was actually referring to places capturing identifiers with just \w+
, which would allow names starting with numbers, e.g. 1whatever
, 0e2f
, 0xFACADE
. I took a look and saw it in 2 places.
So, unless this is on purpose for some reason, this seems like it makes more sense:
"match": "(:)\\s*([a-zA-Z_]\\w*)?",
"match": "(setget)\\s+([a-zA-Z_]\\w*)?(?:[,]\\s*([a-zA-Z_]\\w*)?)?",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch, thanks. I found some other places where identifier names were being matched with [[:alpha:]_]\\w*
, so I standardized all of them to [a-zA-Z_]\\w*
.
I checked this latest commit with linguist again, and there's no warnings/errors. I've decided that I'm definitely going to set up that grammar unit testing tool, but I don't have time right now. |
I checked the latest changes with linguist and it looks like they're good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Cool! Later I'll replicate those changes to gdshader (removing look-behinds) back in my own extension too, just in case. |
Does this PR fix #415? |
It does, but there's been some delay in communicating with the Github LInguist project to make sure the original source of the reported issue is fixed. I intended to close all the related items together but then time got away from me. Thanks for noticing my miss! |
Per #415, this PR fixes some regexes that appeared to work in VSCode/onigumura but do not work in linguist using PCRE.
When this PR is done, I need to remember to post in github-linguist/linguist#3924 and let the linguist project know that we've fixed our grammars.
Additionally, @lildude has documented a process here that we can use to verify that these changes satisfy
linguist
's requirements. It's worth investigating if this process or a derivative can be integrated into our CI, which will at least provide syntax validation of our grammar files.All three of the errors are invalid uses of positive lookbehind using a non-fixed length.
Problem areas:
godot-vscode-plugin/syntaxes/GDScript.tmLanguage.json
Line 298 in e7674c1
godot-vscode-plugin/syntaxes/GDShader.tmLanguage.json
Line 345 in e7674c1
godot-vscode-plugin/syntaxes/GDShader.tmLanguage.json
Line 349 in e7674c1
I've already fixed the first problem, in the GDScript grammar, but unfortunately I've discovered that there was a regression sometime between
1.3.1
and now. The GDShader syntax doesn't even run, and there aren't any error messages about failing to load the grammar.I'll have to find the cause of this regression before I can fix the two regex errors in GDShader.