Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result location @$ is undefined in v0.6.0-188 #12

Closed
nwhetsell opened this issue Aug 28, 2017 · 9 comments
Closed

Result location @$ is undefined in v0.6.0-188 #12

nwhetsell opened this issue Aug 28, 2017 · 9 comments

Comments

@nwhetsell
Copy link

nwhetsell commented Aug 28, 2017

I’m not sure if I’m doing something wrong, but result locations @$ seem to be always undefined in v0.6.0-188. If I run

npm install https://github.com/GerHobbelt/jison/archive/0.6.0-188.tar.gz
cat <<'EOF' > test.jison
%lex
%%
. return 'CHARACTER';
/lex

%start characters

%%

characters
  : character
  | characters character
  ;

character: 'CHARACTER' { console.log(@$); };
EOF
node_modules/jison-gho/lib/cli.js test.jison
node --eval "require('$(pwd)/test.js').parse('abc')"

on macOS 10.12.6, the output is

undefined
undefined
undefined
@GerHobbelt
Copy link
Owner

@$ is the yylloc info result of the rule, similar to $$ being the value result.

Previously (0.4.18-xxx) you got away with this in most circumstances because 0.4.18 run-time always performed the old 'default actions':

$$ = $1;
@$ = @1;

before executing any rule action code chunk, so that $$ and @$ acted as identical copies of $1 and @1. v0.6.0-XXX doesn't do this any more (for run-time performance reasons) and only injects default actions when there's no user code at all (which is more like the bison behaviour, incidentally).

Anyway, long story short:

The correct way to reference any rule production term, is by numeric or named reference, e.g.

character: 'CHARACTER' { console.log(@1); }
               ;

but I never use the quoted token names as that is only extra overhead cost, hence your example would be written as (showcasing 'named reference' for CHARACTER, by the way):

%lex
%%
. return 'CHARACTER';
/lex

%start characters

// %token CHARACTER  <-- is *implicit* due to the way I wrote the character rule below
 
%%

characters
  : character
  | characters character
  ;

character: CHARACTER 
        { console.log(@CHARACTER); }
    ;

The next snippet is a self-contained executable, if you compile it with jison --main (see jison --help):

%lex
%%
. return 'CHARACTER';
/lex

%start characters

// %token CHARACTER  <-- is *implicit* due to the way I wrote the character rule below
 
%%

characters
  : character
  | characters character
  ;

character: CHARACTER 
        { console.log(@CHARACTER); }   // or use `@1` to ref the 1st term in the rule production
    ;


%%

// compile with `jison --main test.jison`
// then execute this `main()` with `node test.js`

parser.main = function () {
	parser.parse('abc');
};

@GerHobbelt
Copy link
Owner

GerHobbelt commented Aug 28, 2017

BTW:

@$ is useful when you construct your own location reference for the rule, e.g.

rule: a b c d e f 
    { @$ = yylexer.mergeLocationInfo(##a, ##f); $$ = $a + $b + $c + $d + $e + $f; }

## is a jison-specific advanced use prefix: it transforms the production term reference name/number into the index number for the parse stack(s) yyvstack (value stack, i.e. the array carrying all the $a/$b/$c/... values), yylstack (location infos @a,@b,@c,...), yysstack (parse state stack, in case you want to use these internals), etc.

The supported references in parser action code:

  • $a / $1 produces the value of the term (for a terminal, this is the value returned by the lexer in yytext in the matching lexer rule)

  • @a / @1 produces the location info (which MAY be NULL if you're not careful: this is related to the default action code injection and worth a chapter or at least a book section in itself 😉 )

  • #a / #1 (jison-specific, not available in bison/yacc/etc.) produces the token number for this term (handy when you have an external token definition set and have %import-ed that one into the lexer and the parser: this way you have one numeric value representing the same (terminal) token everywhere in your app code)

  • ##a / ##1 (jison-specific, not available in bison/yacc/etc.) produces the stack index number as mentioned above: this is handy when you're doing advanced stuff that needs full disclosure of the parser internals, particularly all parse stacks.

@nwhetsell
Copy link
Author

nwhetsell commented Aug 28, 2017

I actually am trying to access the location of the rule result, the “left hand side grouping” described in http://www.gnu.org/software/bison/manual/html_node/Actions-and-Locations.html#Actions-and-Locations. From that page:

…there is a default action for locations that is run each time a rule is matched. It sets the beginning of @$ to the beginning of the first symbol, and the end of @$ to the end of the last symbol.

I’m not sure that @$ was the same as @1 in Jison v0.4.18-186 and earlier. For example, here is a test of a Jison grammar created with Jison v0.4.18-184:

https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L453-L508

The part of the test of the object created from new parser.Instrument (https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L482-L505) checks how this part of the Jison grammar—

https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/orchestra.jison#L511-L520

—parses this text (from https://github.com/nwhetsell/linter-csound/blob/0494a0650bc93cbdd8a656239dd4376d42844754/lib/csound-parser/spec/orchestra-parser-spec.js#L457-L459)

instr A440
  outc oscili(0.5 * 0dbfs, 440, giFunctionTableID)
endin

The first argument of the parser.Instrument constructor is @$. The test shows that the value of @$ in Jison v0.4.18-184 covered several lines starting at the instr token and ending at the endin token. If @$ was the same as @1, it would’ve been on one line, and only covered the instr token.

Is there any way to restore the default action for locations — that is, to bring back the old meaning of @$ — in Jison v0.6.0? Or do I need to pass the first and last token locations to my AST node constructors to get location information?

@GerHobbelt
Copy link
Owner

Nothing is set in stone. 😉

Given your need (and mine, which is a bit different), I am thinking about what would be the best way to tackle this.
The main reason to simplify/remove the default actions (particularly when you also have user actions for the given rule production) is speed: most grammars I know and use don't employ this behaviour but 'most' is certainly not 'everyone'.

The/my goal (fast parsers and lexers) is tackled through 'code analysis' and resulting selective code strippin/rewriting, which is now done using straight text manipulation via regexes and .replace() calls, remains as-is, this would imply that I need to have some sort of user-configurable 'switch' where I can simply select preferred behaviour in a few aspects.
Maybe something like

%options default-action='<setting>'

where setting can be one of the following: "none", "bison", "ast".
Or slightly more specific, as there's default value actions and default location actions:

%options default-action='<valuesetting>,<locationsetting>'
// valuesetting: none, bison, ast --> $$=undefined, $$=$1, $$=[$1....$n]
// locationsetting: none, bison  --> @$=undefined, @$=merge(@1..@n)

Anyway, that's just pondering how this might be made to look at the jison usage level.
Making it happen is another story: I'm looking into migrating the code analysis and rewriting/stripping to a recast/jscodeshift-based process, but that is non-trivial, so will take some time.

The quick fix for you would be to recover (most of) the old behaviour but that would negate all my performance gains through simplified code as most of my grammars don't use these default actions, thus they are surplus=overhead for me.
Let me think about it for a bit, maybe I come up with a decent 'quick fix' overnight.

@nwhetsell
Copy link
Author

Is there an option I can use to get this code to run? That looks like it should restore the default action.

@GerHobbelt
Copy link
Owner

Not yet, except of course writing that code every explicit action, which is a bother and doesn't help produce readable grammars.

The mistake I made is bundling $$ and @$ treatment together (see for related material: zaach#343): right now in build 188/189, when you have an explicit action for a rule, it is detected and ALL default actions are not executed, while the only breaking change compared to zaach#343 which should have been done is kill the $$=$1 default action.

Anyway, I'm working on this right now to provide a better (and when the user wants, backwards compatible) behaviour re default action injection in the generated grammar.

GerHobbelt added a commit that referenced this issue Sep 12, 2017
…fault-action-mode=<value_mode,location_mode>`.

- throw an exception in the jison compiler when you happen to feed it an obsoleted `no-default-action` option (`options.noDefaultAction` flag in the Jison library API); the exception message clearly points out the issue (obsoleted option) and what you should do to replace/fix it.
- add `--default-action-mode=<value_mode>,<location_mode>` option to the JISON CLI and include clear documentation in the `--help`
- JISON library API should be a *facade* for the underlying tools, so you only need to `require('jison-gho')` and have access to them all -- and be assured that you use the latest compatible versions of every tool too.
- add slightly more advanced action code analysis to properly inject value and location default action code in every rule's production action, without losing the performance gain we have obtained thus far: we **only want default action code where it is actually useful == will be used by the user-written action in this or other actions**: hence we analyze the code to inspect whether location tracking (`@$`, `@1`, etc.) constructs are used and only then do we insert (rather costly) location tracking code. Ditto for value tracking (`$$`, `$1`, etc.), though that code is, generally, less costly than the location tracking code.
- we now support a "default action" code which produces a very simple parse tree, rather than the rather useless bison `$$ = $1;` convention, which would have you loose quite a bit of information in any non-trivial grammar, if you haven't added plenty user-written action codes for your multi-term grammar rule productions. Now `--default-action-mode=ast` actually enables you to write grammars without any user-written action code for the grammar rules and still get something quite useful from the `parse()` operation! (Every rule's production bundles its terms' values in an array, thus producing a hierarchy of arrays as a parse result, which is equivalent to a (very simple) parse tree a.k.a. AST.
- tightened the tests: using `assert.strictEqual` rather than `assert.ok` where applicable: better failure reports when a test breaks!
- extended the API tests to (at least partly) cover the new `default-action-mode` option permutations.
- given the discussion in #12, we've implemented a close derivative of the concept mentioned in #12 (comment): this is the new `default-action-mode`. From the `--help`:

  `--default-action=ast,none`

  Supported value modes:
  - classic : generate a parser which includes the default "$$ = $1;" action for every rule.
  - ast     : generate a parser which produces a simple AST-like tree-of-arrays structure: every rule produces an array of its production terms' values. Otherwise it is identical to "classic" mode.
  - none    : JISON will produce a slightly faster parser but then you are solely responsible for propagating rule action "$$" results. The default rule value is still deterministic though as it is set to "undefined": "$$ = undefined;"
  - skip    : same as "none" mode, except JISON does NOT INJECT a default value action ANYWHERE, hence rule results are not deterministic when you do not properly manage the "$$" value yourself!

  Supported location modes:
  - merge   : generate a parser which includes the default "@$ = merged(@1..@n);" location tracking action for every rule, i.e. the rule's production 'location' is the range spanning its terms.
  - classic : same as "merge" mode.
  - ast     : ditto.
  - none    : JISON will produce a slightly faster parser but then you are solely responsible for propagating rule action "@$" location results. The default rule location is still deterministic though, as it is set to "undefined": "@$ = undefined;"
  - skip    : same as "none" mode, except JISON does NOT INJECT a default location action ANYWHERE, hence rule location results are not deterministic when you do not properly manage the "@$" value yourself!

  Notes:
  - when you do specify a value default mode, but DO NOT specify a location value mode, the latter is assumed to be the same as the former. Hence:
        --default-action=ast
    equals:
        --default-action=ast,ast
  - when you do not specify an explicit default mode or only a "true"/"1" value, the default is assumed: "ast,merge".
  - when you specify "false"/"0" as an explicit default mode, "none,none" is assumed. This produces the fastest deterministic parser.

  Default setting:
      [classic,merge]
@GerHobbelt
Copy link
Owner

release 0.6.0-191 has reverted to default 'classic' default behaviour ($$ = $1; @$ = merged;) and supports other default action modes (see jison --help for more on the new --default-action option).

This should fix this issue.


Notes (about $$ and @$ use in jison grammar actions since release 0.6.0-191)

Also note that jison now has better/tighter user-written action code analysis to help ensure a proper rule value result $$ and location result @$ are produced:

  1. when the user action code is absent OR does not set any of these ($$ and @$), then default action code will be injected for every action block = every rule production which doesn't have these assignments.

In other words: if your code doesn't provide a $$ and/or @$ value, then jison will. (Unless your default mode has been configured as skip for the one(s) missing.)

  1. If the action code analyzer determines that you happen to access (read) the $$ and/or @$ values before assigning a value to them in the same action block, then the relevant default action will be injected before your action code, again depending on selected --default-action mode.

In other words: if your code doesn't provide a $$ and/or @$ value to work with, then jison will. (Unless your default mode has been configured as skip for the one(s) missing.)

Note that from now on $$ is fundamentally different from $0 and happens to be more in line with the way yacc/bison treat(ed) this.

Also do note that read-accessing $$ and @$ before writing (assigning a value) to it in an action code block is very peculiar coding behaviour and certainly non-portable, not even in the yacc/bison world outside jison! (The same goes for $0 references, which are discouraged action code behaviour even though it is documented in a corner of the bison documentation. 😉 )

  1. If the action code analyzer determines that you happen to write-access (assign a value) to $$ and/or @$ values before you happen to read-access the same variable in the same action code block (which makes sense if your action code is performing non-trivial operations), jison WILL NOT inject the default action for that action as it would be unused anyway. Hence the analyzer attempts to produce the best performant parser code.

Note that the analyzer is currently still a rather 'dumb' analyzer in that it analyzes your action code using regexes rather than a full JavaScript parse (and AST scan), hence "wicked" action code like this will confuse the analyzer into not injecting default actions before the user-written action code:

function nasty(z) {
  $$ = z;  // this is analyzed as 'assignment before use' despite actual code flow!
}
var v = $$; // actual code flow: read before write!
nasty(v + $1);

@GerHobbelt
Copy link
Owner

@nwhetsell: ping?

Jison should be backwards compatible again since build 0.6.0-191; see also the description of treatment of $$ (and @$) in the comment above in this issue.

@nwhetsell
Copy link
Author

nwhetsell commented Oct 3, 2017

Yep, this appears to be fixed, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants