-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parsing unterminated statements #65
Comments
To clarify, what we actually need is in addition to parsing a full So essentially, we would need a way to change the root rule to use at runtime, which isn't really tree-sitter-c specific. I wonder if that is even theoretically possible with how the code generator works. Alternatively we will have to use two grammars for this where one is the original tree-sitter-c and the other is conceptually what is shown in the issue description (which of course breaks parsing regular translation_units). |
Maybe worth to transfer the issue to the tree-sitter repository then? @maxbrunsfeld |
When you need to parse a fragment of incomplete source code (like a For example, to parse a There is a long-standing Tree-sitter issue about selecting alternative root rules at runtime, but that is going to be complex to implement, and this workaround actually seems quite straightforward and scalable, in cases where you had many different rules that you wanted to try. |
Appending char *x;
But we could in theory use a cast, so assuming we want to parse void a() { (const char *[42])x; }
The reason why in practice we can't do this is that the string that we want to parse could do some sort of injection and easily escape our wrapping, for example when we try to parse But I think the first workaround proposed in tree-sitter/tree-sitter#870, which is to always prepend some magic string to tell the parser how to proceed could work very well for us. |
Just for the record, this is what I came up with: [$._type_specifier, $._expression],
[$._type_specifier, $._expression, $.macro_type_specifier],
[$._type_specifier, $.macro_type_specifier],
+ [$.type_expression, $._abstract_declarator],
+ [$.type_expression],
[$.sized_type_specifier],
],
word: $ => $.identifier,
rules: {
- translation_unit: $ => repeat($._top_level_item),
+ translation_unit: $ => choice(
+ repeat1($.type_expression),
+ repeat1($._top_level_item)
+ ),
+
+ type_expression: $ => seq(
+ '__TYPE_EXPRESSION',
+ repeat($.type_qualifier),
+ field('type', $._type_specifier),
+ repeat($.abstract_pointer_declarator),
+ repeat($.abstract_array_declarator),
+ repeat($.abstract_pointer_declarator),
+ ), You can see the examples of what it can parse here: XVilka@fed7bd0:
|
Currently parser is only able to successfully parse terminated statements, like:
But if you feed something like
or
It emits an error.
It would be beneficial to support parsing such statements too.
@thestr4ng3r proposed the following change in the grammar:
The text was updated successfully, but these errors were encountered: