Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix: add language injections #3594

Merged
merged 2 commits into from
Sep 6, 2022
Merged

Conversation

nrdxp
Copy link
Contributor

@nrdxp nrdxp commented Aug 29, 2022

By simply placing a comment with the name of the desired language above a multi-line string, that language will be injected.

Opening as a draft because I'm sure there must be a better way to write this, and hopefully someone with more experience can give me some pointers. Tried using repetition via * but I couldn't seem to get it right.

@the-mikedavis
Copy link
Member

Unfortunately queries don't have a way to express arbitrary nesting. Queries I've seen in neovim handle it similar to this.

I'm not sure this will work for many languages out of the box though because some injection regexs are very specific like:

injection-regex = "^(ts|typescript)$"

We may want to remove the ^ and $ parts from those.

@nrdxp
Copy link
Contributor Author

nrdxp commented Aug 29, 2022

Unfortunately queries don't have a way to express arbitrary nesting

Well I guess that's why I couldn't figure it out then. I wonder if it might be better to try and parse the first line of an indented string and parse the shebang (if one exists). Not sure if there is a way to grab the first line easily though 🤔

@the-mikedavis
Copy link
Member

Sub-selecting in nodes is also not really well supported by tree-sitter queries 😅

For injection in particular it's best to have the language injection token expressed by the parser - for example some node under string or comment that contains the first word. Then you could capture that as @injection.content. Without that we would need to add some predicates to sub-select nodes. Neovim has some of those if I remember correctly but I'm not sure how I feel about them. To me it seems better to always have the parser express things you want to capture with queries

@nrdxp
Copy link
Contributor Author

nrdxp commented Aug 31, 2022

I'm moving out of draft as I have been able to accomplish injections in many common places in Nix where shellcode is typically expected. The comment based injection still works as well, and if you are okay with me modifying other language injection regex's I can do that to support more languages, but many already work as is.

edit
I noticed some issues with interpolation under certain cases. Specifically, if there is interpolation inside a quoted string like:

''
  "${toString 1}"
''

Then it stops highlighing as bash for the rest of the string. This doesn't happen if there is no ", so I'm not 100% sure yet if I can fix this in the query or if it needs to be addressed upstream.

edit 2

So I've identified that the issue lies with the the first ", which isn't picked up by the bash query, as it get's conflated with the escape sequence ${. It also doesn't matter how much space or other characters there are between the first " and the ${. One can confirm this by adding a third " like so: "${toString 1}"", and the contents following that interpolation will again be highlighted properly by bash. It doesn't happen in other languages, so I'm wondering if the bug lies in the bash tree-sitter implementation.

@nrdxp
Copy link
Contributor Author

nrdxp commented Sep 1, 2022

I opened a PR upstream to add these same queries there, so maybe we don't need to merge this, but just uncomment the injections in the upstream binding if it is accepted?

runtime/queries/nix/injections.scm Outdated Show resolved Hide resolved
runtime/queries/nix/injections.scm Outdated Show resolved Hide resolved
@the-mikedavis
Copy link
Member

I opened a PR upstream to add these same queries there, so maybe we don't need to merge this, but just uncomment the injections in the upstream binding if it is accepted?

We don't use rust bindings or inherit the queries from the repo. It's good to upstream the changes so that other tree-sitter consumers can get the benefits too but for helix the queries only need to be updated in-repo

@nrdxp nrdxp force-pushed the nix-inject branch 2 times, most recently from f74db4b to 18f603d Compare September 3, 2022 00:24
@nrdxp
Copy link
Contributor Author

nrdxp commented Sep 4, 2022

So I've identified the reason for the bash quoting issue described above, and it should have been obvious but I guess I wasn't thinking correctly 😅 .

Essentially the two strings on either side of an interpolation are highlighted seperately, so instead of having a balanced quote count, the string on the right side thinks it starts with an opening ", instead of matching with the " in the previous fragment. Not sure if there is a way to tell tree-sitter to concatenate these strings together before applying highlighting, but that'd be the best solution for that case. The other option is to just not enclose an interpolation with quotes, but there is already a lot of code in nixpkgs which does this so it doesn't help while reading, and also doesn't help if we genuinely want to avoid wordsplitting in the actual script.

Also found another issue outline in nix-community/tree-sitter-nix#32 but that will have to be worked out upstream.

@nrdxp nrdxp force-pushed the nix-inject branch 2 times, most recently from b2f9f88 to 8314a3f Compare September 6, 2022 22:05
By simply placing a comment with the name of the desired language just
before a multi-line string, that language will be injected.

Also, common functions and attributes which are expected to be shell
code are automatically injected.
@nrdxp
Copy link
Contributor Author

nrdxp commented Sep 6, 2022

Turns out injection.combined is what I need to avoid the quoting imbalance, and I simplified the queries to remove any possibility for nesting. If users want to put arbitrary nested expressions in buildPhase and the like, they can just use the comment style to mark where the bash script starts. I think this is fine since 90% of the time, you just immediatley define the string.

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look super sweet, thanks for working on this!

@mmlb
Copy link
Contributor

mmlb commented Dec 8, 2022

This is pretty nice, thanks for adding it in. I'm just messing around with this and it seems a little clunky because it seems like the multi-line string needs to be the first non-whitespace chars on the line right?

I've tried the following and both work:

k = #shell
''exec ${kubectl}/bin/kubectl "$@"'';

and

k = 
#shell
''
exec ${kubectl}/bin/kubectl "$@"
'';

It'd be nice if the following worked too

#shell
k = ''exec ${kubectl}/bin/kubectl "$@"'';

Would that be possible with tree-sitter? (I am not versed in tree-sitter in the slightest :( ).

@mmlb
Copy link
Contributor

mmlb commented Dec 8, 2022

Actually I think something might be funky with the current code, I just noticed this when writing out the previous comment:

k = #shell
''exec ${kubectl}/bin/kubectl "$@"'';
kk = #shell
''exec ${kubectl}/bin/kubectl "$@"'';

the string for k highlights exec nicely, but kk's is not. Should I open this as a bug?

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 8, 2022

@mmlb I ran into this myself when writing the queries. Tree-sitter queries do not have a way to express arbitrary nesting, probably because it would lead to many false positives. I tried to work around that by simple mimicing queries multiple times inside a many nested levels and we hit some false positives ourselves that way so we decided to leave it explicit.

In order to make this as nice as possible though, there are certain names for k that will automatically consider the value a bash script. For example, you could use script or command and the value will highlight as a bash string without requiring the #shell comment marker.

@mmlb
Copy link
Contributor

mmlb commented Dec 9, 2022

Ah I see, I figured it would be something along those lines, 👍. Thanks for doing this really loving this! ❤️

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 12, 2022

Ah I see, I figured it would be something along those lines, +1. Thanks for doing this really loving this! heart

Glad you find it useful, if we can also get #3970 merged then we can also do highlights based on shebang or even file name in expressions like writeScript "foo.sh" "echo foo", which would highlight as bash based on the .sh in the first arg.

@mmlb
Copy link
Contributor

mmlb commented Dec 12, 2022

Ah I see, I figured it would be something along those lines, +1. Thanks for doing this really loving this! heart

Glad you find it useful, if we can also get #3970 merged then we can also do highlights based on shebang or even file name in expressions like writeScript "foo.sh" "echo foo", which would highlight as bash based on the .sh in the first arg.

Yep that would be very nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants