-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discarding white-space (or any other matched rule) by not a creating node. #66
Comments
Placing the In contrast, saying that a rule should never generate a tree node, wherever it is used, would break any use of the rule in a non-sequence position, where a tree node is required. For example, consider this grammar:
If rule I also think this would produce confusing behaviour where the effect of using a rule isn't clear without reading the rule itself, and makes it unclear how many child elements a sequence will actually have when parsed. This is currently clear from reading just the sequence rule itself, whereas this addition would require reading any rules the sequence uses, and so on recursively. I'm not sure what you mean by this example:
The effect of the |
Ignoring the last example and looking just at the first, It is unclear why @ on rule names would "not make sense in general." The example you provide against the use of @ on rule names is a simple error in the intent of the person crafting the definition not a proof for exclusion. (Think back to the use of predicate analysis rule generation you will have encountered at university - the exclusion of a rule which leads to the null functional response is an error)
Saying this would require recursive scanning of the grammar by the developer is not true - running the generator tells us exactly where the error is located and is only of the type specified above. Further, such issues are already a part of canopy.
In the case I present using @ on the rule name would be equivalent to placing the @ on every usage of that rule.
with
Now extrapolate that to a full language grammar with dozens of keywords, clauses, and sub clauses. As it is, I have implemented this in my version of canopy - it is saving me hours of debugging of the definition of a query and analysis language which I am developing in parallel with my own peg parser generator which creates rust code (code complete as of a week ago). It is also saving my not so young eyes from going @'y when I try to find a needling bug in an @ stack. ;-) As before, I ask for an academically complete reason why using @ on rule names will lead to side-effects which are not present when using @ muting in the rule definition before I make a complete fool of myself. |
In general, Canopy assumes that all parsing expressions return a value if they match the input. An expression that doesn't match produces the special internal value So to begin with, we would need another special internal value to signal that an expression matched, but produced no value. Let's call this value In Canopy grammars, a reference to a named rule, for example your
The basic intention of rules is just to give a name to a parsing expression, so it can be referenced in other expressions. In general it should be safe to replace any reference to a rule with its definition, except for rules that are self-recursive for which this would be impossible. You should get exactly the same parsing result by using a rule by name vs using its definition directly. This makes grammars act like normal programs and is important for debugging and refactoring. So for this proposal to make sense, all the above possible uses of rule references would need to be able to accommodate an expression that produces no value. For some of these uses, this causes a problem:
Consider this grammar:
This produces a situation where rule This is why I've emphasised that In theory we could add logic to the grammar parser that checks that muted rules are only ever used in sequence position, either directly or indirectly. However I expect this would be complicated to implement, especially considering recursive and cyclic rules, and it's likely to produce confusing error messages. In general I'm not keen to add these sorts of non-local effects to the system, and I'd rather not make rule definitions and more special than they already are. The main alternative idea I have is to add an operator that indicates which nodes you want to keep from a sequence, rather than those you want to discard. I don't have a lot of usability feedback to go on here. Do you have examples of some of the stack exchange threads to mentioned where people are having trouble with this feature? |
Consider a simple grammar for a language which expects white-space as a delimiter.
e.g. let @x = 5 let @q= 10
In order to reduce the number of white-space TreeNodes to one per white-space sequence we can create rules for white-space as:
However, the white-space itself is of no importance to the parser, it just uses memory and consumes time being ignored when executing nodes.
Is it wise to add a symbol on the parser rule which means consume but do not add a TreeNode for this match?
e.g.
Another use may be for language specific annotation.
Note: this is different to using the current muting '@' as that mutes parts of an expression rather than an entire rule. Yes, it would be possible to add muting to all uses of wp or wps, but that makes the grammar appear messy.
Thoughts?
The text was updated successfully, but these errors were encountered: