Parser: Build system to compare alternative parser implementations #6831

dmsnell · 2018-05-18T14:43:18Z

Please keep open - this has a purpose but not a very good home in the actual code repo yet

Description

This patch introduces a stupid benchmark to compare two
different implementations of a parser for the Gutenberg
grammer.

The purpose is to aid development and optimization of
parsers and to support a competitive and lively third-party
parser ecosystem, or to be used by core devs trying to
refine what becomes the final default parser.

This is meant to be served as a static HTML page and
performs naive benchmarks and gathers statistics about
random runs of each parser over preselected documents.
It starts by comparing the outputs to see if they are
structurally the same, then it runs random parses and
measures the time spent parsing random documents.

Individual parsers expose themselves as web services
which take an input document as a POST body and they
return the actual parse and metrics about parsing the
document in their response.

The comparator runs against a library of pre-selected
documents and will continue to gather data until stopped
by pressing ESC

Testing

URL for static-hosted comparator https://comparator-ywayevznec.now.sh/
URL for spec-PHP parser https://comparator-aiiktzpxbm.now.sh
URL for spec-JS parser https://comparator-mqdcztyzlm.now.sh
URL for default JS parser https://default-js-olratolpxh.now.sh

I could use your help!

Obviously this is somewhat junky. I need help with a few
things:

it would be nice to get styles working
I think we want this in the official docs somewhere? we
can host it as a static build so that only the web services
need to be started or selected
the interface is awful, which is okay for dev work, but
there is so much that could be done to improve it
this is a first iteration that only looks at parser speed, we
can expand that as we see fit

How has this been tested?

This is a static HTML file that doesn't enter the main
project's build. It's meant for documentation only.

Types of changes

Tool to compare competing parser implementation equality
Tool to compare competing parser implementation performance

Example output run after many iterations

This example uses the URLs provided above.

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.

mcsf

Thanks for the PR!

I'm inclined to keep momentum and have it merged relatively close to its current state—leaving refactorings, stylings, etc. for other stages—but a couple of things should be dealt with now.

mcsf · 2018-06-01T14:33:10Z

docs/parser/browser-comparator/index.js

+        () => compareOutputs( 
+        () => benchmark( 
+        () => l( 'Done' ) 
+    ) ) ) ) )();


wow, such lisp!

edit: to be clear, not a piece of feedback :)

mcsf · 2018-06-01T14:39:44Z

docs/parser/browser-comparator/defaults.js

+);
+
+document.getElementById( 'next-parser-content' ).value = (
+    "/*\n * Generated by PEG.js 0.10.0.\n *\n * http://pegjs.org/\n */\n(function() {\n  \"use strict\";\n\n  function peg$subclass(child, parent) {\n    function ctor() { this.constructor = child; }\n    ctor.prototype = parent.prototype;\n    child.prototype = new ctor();\n  }\n\n  function peg$SyntaxError(message, expected, found, location) {\n    this.message  = message;\n    this.expected = expected;\n    this.found    = found;\n    this.location = location;\n    this.name     = \"SyntaxError\";\n\n    if (typeof Error.captureStackTrace === \"function\") {\n      Error.captureStackTrace(this, peg$SyntaxError);\n    }\n  }\n\n  peg$subclass(peg$SyntaxError, Error);\n\n  peg$SyntaxError.buildMessage = function(expected, found) {\n    var DESCRIBE_EXPECTATION_FNS = {\n          literal: function(expectation) {\n            return \"\\\"\" + literalEscape(expectation.text) + \"\\\"\";\n          },\n\n          \"class\": function(expectation) {\n            var escapedParts = \"\",\n                i;\n\n            for (i = 0; i < expectation.parts.length; i++) {\n              escapedParts += expectation.parts[i] instanceof Array\n                ? classEscape(expectation.parts[i][0]) + \"-\" + classEscape(expectation.parts[i][1])\n                : classEscape(expectation.parts[i]);\n            }\n\n            return \"[\" + (expectation.inverted ? \"^\" : \"\") + escapedParts + \"]\";\n          },\n\n          any: function(expectation) {\n            return \"any character\";\n          },\n\n          end: function(expectation) {\n            return \"end of input\";\n          },\n\n          other: function(expectation) {\n            return expectation.description;\n          }\n        };\n\n    function hex(ch) {\n      return ch.charCodeAt(0).toString(16).toUpperCase();\n    }\n\n    function literalEscape(s) {\n      return s\n        .replace(/\\\\/g, '\\\\\\\\')\n        .replace(/\"/g,  '\\\\\"')\n        .replace(/\\0/g, '\\\\0')\n        .replace(/\\t/g, '\\\\t')\n        .replace(/\\n/g, '\\\\n')\n        .replace(/\\r/g, '\\\\r')\n        .replace(/[\\x00-\\x0F]/g,          function(ch) { return '\\\\x0' + hex(ch); })\n        .replace(/[\\x10-\\x1F\\x7F-\\x9F]/g, function(ch) { return '\\\\x'  + hex(ch); });\n    }\n\n    function classEscape(s) {\n      return s\n        .replace(/\\\\/g, '\\\\\\\\')\n        .replace(/\\]/g, '\\\\]')\n        .replace(/\\^/g, '\\\\^')\n        .replace(/-/g,  '\\\\-')\n        .replace(/\\0/g, '\\\\0')\n        .replace(/\\t/g, '\\\\t')\n        .replace(/\\n/g, '\\\\n')\n        .replace(/\\r/g, '\\\\r')\n        .replace(/[\\x00-\\x0F]/g,          function(ch) { return '\\\\x0' + hex(ch); })\n        .replace(/[\\x10-\\x1F\\x7F-\\x9F]/g, function(ch) { return '\\\\x'  + hex(ch); });\n    }\n\n    function describeExpectation(expectation) {\n      return DESCRIBE_EXPECTATION_FNS[expectation.type](expectation);\n    }\n\n    function describeExpected(expected) {\n      var descriptions = new Array(expected.length),\n          i, j;\n\n      for (i = 0; i < expected.length; i++) {\n        descriptions[i] = describeExpectation(expected[i]);\n      }\n\n      descriptions.sort();\n\n      if (descriptions.length > 0) {\n        for (i = 1, j = 1; i < descriptions.length; i++) {\n          if (descriptions[i - 1] !== descriptions[i]) {\n            descriptions[j] = descriptions[i];\n            j++;\n          }\n        }\n        descriptions.length = j;\n      }\n\n      switch (descriptions.length) {\n        case 1:\n          return descriptions[0];\n\n        case 2:\n          return descriptions[0] + \" or \" + descriptions[1];\n\n        default:\n          return descriptions.slice(0, -1).join(\", \")\n            + \", or \"\n            + descriptions[descriptions.length - 1];\n      }\n    }\n\n    function describeFound(found) {\n      return found ? \"\\\"\" + literalEscape(found) + \"\\\"\" : \"end of input\";\n    }\n\n    return \"Expected \" + describeExpected(expected) + \" but \" + describeFound(found) + \" found.\";\n  };\n\n  function peg$parse(input, options) {\n    options = options !== void 0 ? options : {};\n\n    var peg$FAILED = {},\n\n        peg$startRuleFunctions = { Block_List: peg$parseBlock_List },\n        peg$startRuleFunction  = peg$parseBlock_List,\n\n        peg$c0 = peg$anyExpectation(),\n        peg$c1 = function(pre, b, html) { /** <?php return array( $b, $html ); ?> **/ return [ b, html ] },\n        peg$c2 = function(pre, bs, post) { /** <?php return peg_join_blocks( $pre, $bs, $post ); ?> **/\n            return joinBlocks( pre, bs, post );\n          },\n        peg$c3 = \"<!--\",\n        peg$c4 = peg$literalExpectation(\"<!--\", false),\n        peg$c5 = \"wp:\",\n        peg$c6 = peg$literalExpectation(\"wp:\", false),\n        peg$c7 = function(blockName, a) {\n            /** <?php return $a; ?> **/\n            return a;\n          },\n        peg$c8 = \"/-->\",\n        peg$c9 = peg$literalExpectation(\"/-->\", false),\n        peg$c10 = function(blockName, attrs) {\n            /** <?php\n            return array(\n              'blockName'  => $blockName,\n              'attrs'      => $attrs,\n              'innerBlocks' => array(),\n              'innerHTML' => '',\n            );\n            ?> **/\n\n            return {\n              blockName: blockName,\n              attrs: attrs,\n              innerBlocks: [],\n              innerHTML: ''\n            };\n          },\n        peg$c11 = function(s, children, e) {\n            /** <?php\n            list( $innerHTML, $innerBlocks ) = peg_array_partition( $children, 'is_string' );\n\n            return array(\n              'blockName'  => $s['blockName'],\n              'attrs'      => $s['attrs'],\n              'innerBlocks'  => $innerBlocks,\n              'innerHTML'  => implode( '', $innerHTML ),\n            );\n            ?> **/\n\n            var innerContent = partition( function( a ) { return 'string' === typeof a }, children );\n            var innerHTML = innerContent[ 0 ];\n            var innerBlocks = innerContent[ 1 ];\n\n            return {\n              blockName: s.blockName,\n              attrs: s.attrs,\n              innerBlocks: innerBlocks,\n              innerHTML: innerHTML.join( '' )\n            };\n          },\n        peg$c12 = \"-->\",\n        peg$c13 = peg$literalExpectation(\"-->\", false),\n        peg$c14 = function(blockName, attrs) {\n            /** <?php\n            return array(\n              'blockName' => $blockName,\n              'attrs'     => $attrs,\n            );\n            ?> **/\n\n            return {\n              blockName: blockName,\n              attrs: attrs\n            };\n          },\n        peg$c15 = \"/wp:\",\n        peg$c16 = peg$literalExpectation(\"/wp:\", false),\n        peg$c17 = function(blockName) {\n            /** <?php\n            return array(\n              'blockName' => $blockName,\n            );\n            ?> **/\n\n            return {\n              blockName: blockName\n            };\n          },\n        peg$c18 = \"/\",\n        peg$c19 = peg$literalExpectation(\"/\", false),\n        peg$c20 = function(type) {\n            /** <?php return \"core/$type\"; ?> **/\n            return 'core/' + type;\n          },\n        peg$c21 = /^[a-z]/,\n        peg$c22 = peg$classExpectation([[\"a\", \"z\"]], false, false),\n        peg$c23 = /^[a-z0-9_\\-]/,\n        peg$c24 = peg$classExpectation([[\"a\", \"z\"], [\"0\", \"9\"], \"_\", \"-\"], false, false),\n        peg$c25 = \"{\",\n        peg$c26 = peg$literalExpectation(\"{\", false),\n        peg$c27 = \"}\",\n        peg$c28 = peg$literalExpectation(\"}\", false),\n        peg$c29 = \"\",\n        peg$c30 = function(attrs) {\n            /** <?php return json_decode( $attrs, true ); ?> **/\n            return maybeJSON( attrs );\n          },\n        peg$c31 = /^[ \\t\\r\\n]/,\n        peg$c32 = peg$classExpectation([\" \", \"\\t\", \"\\r\", \"\\n\"], false, false),\n\n        peg$currPos          = 0,\n        peg$savedPos         = 0,\n        peg$posDetailsCache  = [{ line: 1, column: 1 }],\n        peg$maxFailPos       = 0,\n        peg$maxFailExpected  = [],\n        peg$silentFails      = 0,\n\n        peg$result;\n\n    if (\"startRule\" in options) {\n      if (!(options.startRule in peg$startRuleFunctions)) {\n        throw new Error(\"Can't start parsing from rule \\\"\" + options.startRule + \"\\\".\");\n      }\n\n      peg$startRuleFunction = peg$startRuleFunctions[options.startRule];\n    }\n\n    function text() {\n      return input.substring(peg$savedPos, peg$currPos);\n    }\n\n    function location() {\n      return peg$computeLocation(peg$savedPos, peg$currPos);\n    }\n\n    function expected(description, location) {\n      location = location !== void 0 ? location : peg$computeLocation(peg$savedPos, peg$currPos)\n\n      throw peg$buildStructuredError(\n        [peg$otherExpectation(description)],\n        input.substring(peg$savedPos, peg$currPos),\n        location\n      );\n    }\n\n    function error(message, location) {\n      location = location !== void 0 ? location : peg$computeLocation(peg$savedPos, peg$currPos)\n\n      throw peg$buildSimpleError(message, location);\n    }\n\n    function peg$literalExpectation(text, ignoreCase) {\n      return { type: \"literal\", text: text, ignoreCase: ignoreCase };\n    }\n\n    function peg$classExpectation(parts, inverted, ignoreCase) {\n      return { type: \"class\", parts: parts, inverted: inverted, ignoreCase: ignoreCase };\n    }\n\n    function peg$anyExpectation() {\n      return { type: \"any\" };\n    }\n\n    function peg$endExpectation() {\n      return { type: \"end\" };\n    }\n\n    function peg$otherExpectation(description) {\n      return { type: \"other\", description: description };\n    }\n\n    function peg$computePosDetails(pos) {\n      var details = peg$posDetailsCache[pos], p;\n\n      if (details) {\n        return details;\n      } else {\n        p = pos - 1;\n        while (!peg$posDetailsCache[p]) {\n          p--;\n        }\n\n        details = peg$posDetailsCache[p];\n        details = {\n          line:   details.line,\n          column: details.column\n        };\n\n        while (p < pos) {\n          if (input.charCodeAt(p) === 10) {\n            details.line++;\n            details.column = 1;\n          } else {\n            details.column++;\n          }\n\n          p++;\n        }\n\n        peg$posDetailsCache[pos] = details;\n        return details;\n      }\n    }\n\n    function peg$computeLocation(startPos, endPos) {\n      var startPosDetails = peg$computePosDetails(startPos),\n          endPosDetails   = peg$computePosDetails(endPos);\n\n      return {\n        start: {\n          offset: startPos,\n          line:   startPosDetails.line,\n          column: startPosDetails.column\n        },\n        end: {\n          offset: endPos,\n          line:   endPosDetails.line,\n          column: endPosDetails.column\n        }\n      };\n    }\n\n    function peg$fail(expected) {\n      if (peg$currPos < peg$maxFailPos) { return; }\n\n      if (peg$currPos > peg$maxFailPos) {\n        peg$maxFailPos = peg$currPos;\n        peg$maxFailExpected = [];\n      }\n\n      peg$maxFailExpected.push(expected);\n    }\n\n    function peg$buildSimpleError(message, location) {\n      return new peg$SyntaxError(message, null, null, location);\n    }\n\n    function peg$buildStructuredError(expected, found, location) {\n      return new peg$SyntaxError(\n        peg$SyntaxError.buildMessage(expected, found),\n        expected,\n        found,\n        location\n      );\n    }\n\n    function peg$parseBlock_List() {\n      var s0, s1, s2, s3, s4, s5, s6, s7, s8, s9;\n\n      s0 = peg$currPos;\n      s1 = peg$currPos;\n      s2 = [];\n      s3 = peg$currPos;\n      s4 = peg$currPos;\n      peg$silentFails++;\n      s5 = peg$parseBlock();\n      peg$silentFails--;\n      if (s5 === peg$FAILED) {\n        s4 = void 0;\n      } else {\n        peg$currPos = s4;\n        s4 = peg$FAILED;\n      }\n      if (s4 !== peg$FAILED) {\n        if (input.length > peg$currPos) {\n          s5 = input.charAt(peg$currPos);\n          peg$currPos++;\n        } else {\n          s5 = peg$FAILED;\n          if (peg$silentFails === 0) { peg$fail(peg$c0); }\n        }\n        if (s5 !== peg$FAILED) {\n          s4 = [s4, s5];\n          s3 = s4;\n        } else {\n          peg$currPos = s3;\n          s3 = peg$FAILED;\n        }\n      } else {\n        peg$currPos = s3;\n        s3 = peg$FAILED;\n      }\n      while (s3 !== peg$FAILED) {\n        s2.push(s3);\n        s3 = peg$currPos;\n        s4 = peg$currPos;\n        peg$silentFails++;\n        s5 = peg$parseBlock();\n        peg$silentFails--;\n        if (s5 === peg$FAILED) {\n          s4 = void 0;\n        } else {\n          peg$currPos = s4;\n          s4 = peg$FAILED;\n        }\n        if (s4 !== peg$FAILED) {\n          if (input.length > peg$currPos) {\n            s5 = input.charAt(peg$currPos);\n            peg$currPos++;\n          } else {\n            s5 = peg$FAILED;\n            if (peg$silentFails === 0) { peg$fail(peg$c0); }\n          }\n          if (s5 !== peg$FAILED) {\n            s4 = [s4, s5];\n            s3 = s4;\n          } else {\n            peg$currPos = s3;\n            s3 = peg$FAILED;\n          }\n        } else {\n          peg$currPos = s3;\n          s3 = peg$FAILED;\n        }\n      }\n      if (s2 !== peg$FAILED) {\n        s1 = input.substring(s1, peg$currPos);\n      } else {\n        s1 = s2;\n      }\n      if (s1 !== peg$FAILED) {\n        s2 = [];\n        s3 = peg$currPos;\n        s4 = peg$parseBlock();\n        if (s4 !== peg$FAILED) {\n          s5 = peg$currPos;\n          s6 = [];\n          s7 = peg$currPos;\n          s8 = peg$currPos;\n          peg$silentFails++;\n          s9 = peg$parseBlock();\n          peg$silentFails--;\n          if (s9 === peg$FAILED) {\n            s8 = void 0;\n          } else {\n            peg$currPos = s8;\n            s8 = peg$FAILED;\n          }\n          if (s8 !== peg$FAILED) {\n            if (input.length > peg$currPos) {\n              s9 = input.charAt(peg$currPos);\n              peg$currPos++;\n            } else {\n              s9 = peg$FAILED;\n              if (peg$silentFails === 0) { peg$fail(peg$c0); }\n            }\n            if (s9 !== peg$FAILED) {\n              s8 = [s8, s9];\n              s7 = s8;\n            } else {\n              peg$currPos = s7;\n              s7 = peg$FAILED;\n            }\n          } else {\n            peg$currPos = s7;\n            s7 = peg$FAILED;\n          }\n          while (s7 !== peg$FAILED) {\n            s6.push(s7);\n            s7 = peg$currPos;\n            s8 = peg$currPos;\n            peg$silentFails++;\n            s9 = peg$parseBlock();\n            peg$silentFails--;\n            if (s9 === peg$FAILED) {\n              s8 = void 0;\n            } else {\n              peg$currPos = s8;\n              s8 = peg$FAILED;\n            }\n            if (s8 !== peg$FAILED) {\n              if (input.length > peg$currPos) {\n                s9 = input.charAt(peg$currPos);\n                peg$currPos++;\n              } else {\n                s9 = peg$FAILED;\n                if (peg$silentFails === 0) { peg$fail(peg$c0); }\n              }\n              if (s9 !==


Can we move these to WordPress/gutenberg?

mcsf · 2018-06-01T14:43:09Z

docs/parser/browser-comparator/defaults.js

+);
+
+document.getElementById( 'next-parser-content' ).value = (
+    "/*\n * Generated by PEG.js 0.10.0.\n *\n * http://pegjs.org/\n */\n(function() {\n  \"use strict\";\n\n  function peg$subclass(child, parent) {\n    function ctor() { this.constructor = child; }\n    ctor.prototype = parent.prototype;\n    child.prototype = new ctor();\n  }\n\n  function peg$SyntaxError(message, expected, found, location) {\n    this.message  = message;\n    this.expected = expected;\n    this.found    = found;\n    this.location = location;\n    this.name     = \"SyntaxError\";\n\n    if (typeof Error.captureStackTrace === \"function\") {\n      Error.captureStackTrace(this, peg$SyntaxError);\n    }\n  }\n\n  peg$subclass(peg$SyntaxError, Error);\n\n  peg$SyntaxError.buildMessage = function(expected, found) {\n    var DESCRIBE_EXPECTATION_FNS = {\n          literal: function(expectation) {\n            return \"\\\"\" + literalEscape(expectation.text) + \"\\\"\";\n          },\n\n          \"class\": function(expectation) {\n            var escapedParts = \"\",\n                i;\n\n            for (i = 0; i < expectation.parts.length; i++) {\n              escapedParts += expectation.parts[i] instanceof Array\n                ? classEscape(expectation.parts[i][0]) + \"-\" + classEscape(expectation.parts[i][1])\n                : classEscape(expectation.parts[i]);\n            }\n\n            return \"[\" + (expectation.inverted ? \"^\" : \"\") + escapedParts + \"]\";\n          },\n\n          any: function(expectation) {\n            return \"any character\";\n          },\n\n          end: function(expectation) {\n            return \"end of input\";\n          },\n\n          other: function(expectation) {\n            return expectation.description;\n          }\n        };\n\n    function hex(ch) {\n      return ch.charCodeAt(0).toString(16).toUpperCase();\n    }\n\n    function literalEscape(s) {\n      return s\n        .replace(/\\\\/g, '\\\\\\\\')\n        .replace(/\"/g,  '\\\\\"')\n        .replace(/\\0/g, '\\\\0')\n        .replace(/\\t/g, '\\\\t')\n        .replace(/\\n/g, '\\\\n')\n        .replace(/\\r/g, '\\\\r')\n        .replace(/[\\x00-\\x0F]/g,          function(ch) { return '\\\\x0' + hex(ch); })\n        .replace(/[\\x10-\\x1F\\x7F-\\x9F]/g, function(ch) { return '\\\\x'  + hex(ch); });\n    }\n\n    function classEscape(s) {\n      return s\n        .replace(/\\\\/g, '\\\\\\\\')\n        .replace(/\\]/g, '\\\\]')\n        .replace(/\\^/g, '\\\\^')\n        .replace(/-/g,  '\\\\-')\n        .replace(/\\0/g, '\\\\0')\n        .replace(/\\t/g, '\\\\t')\n        .replace(/\\n/g, '\\\\n')\n        .replace(/\\r/g, '\\\\r')\n        .replace(/[\\x00-\\x0F]/g,          function(ch) { return '\\\\x0' + hex(ch); })\n        .replace(/[\\x10-\\x1F\\x7F-\\x9F]/g, function(ch) { return '\\\\x'  + hex(ch); });\n    }\n\n    function describeExpectation(expectation) {\n      return DESCRIBE_EXPECTATION_FNS[expectation.type](expectation);\n    }\n\n    function describeExpected(expected) {\n      var descriptions = new Array(expected.length),\n          i, j;\n\n      for (i = 0; i < expected.length; i++) {\n        descriptions[i] = describeExpectation(expected[i]);\n      }\n\n      descriptions.sort();\n\n      if (descriptions.length > 0) {\n        for (i = 1, j = 1; i < descriptions.length; i++) {\n          if (descriptions[i - 1] !== descriptions[i]) {\n            descriptions[j] = descriptions[i];\n            j++;\n          }\n        }\n        descriptions.length = j;\n      }\n\n      switch (descriptions.length) {\n        case 1:\n          return descriptions[0];\n\n        case 2:\n          return descriptions[0] + \" or \" + descriptions[1];\n\n        default:\n          return descriptions.slice(0, -1).join(\", \")\n            + \", or \"\n            + descriptions[descriptions.length - 1];\n      }\n    }\n\n    function describeFound(found) {\n      return found ? \"\\\"\" + literalEscape(found) + \"\\\"\" : \"end of input\";\n    }\n\n    return \"Expected \" + describeExpected(expected) + \" but \" + describeFound(found) + \" found.\";\n  };\n\n  function peg$parse(input, options) {\n    options = options !== void 0 ? options : {};\n\n    var peg$FAILED = {},\n\n        peg$startRuleFunctions = { Block_List: peg$parseBlock_List },\n        peg$startRuleFunction  = peg$parseBlock_List,\n\n        peg$c0 = peg$anyExpectation(),\n        peg$c1 = function(pre, b, html) { /** <?php return array( $b, $html ); ?> **/ return [ b, html ] },\n        peg$c2 = function(pre, bs, post) { /** <?php return peg_join_blocks( $pre, $bs, $post ); ?> **/\n            return joinBlocks( pre, bs, post );\n          },\n        peg$c3 = \"<!--\",\n        peg$c4 = peg$literalExpectation(\"<!--\", false),\n        peg$c5 = \"wp:\",\n        peg$c6 = peg$literalExpectation(\"wp:\", false),\n        peg$c7 = function(blockName, a) {\n            /** <?php return $a; ?> **/\n            return a;\n          },\n        peg$c8 = \"/-->\",\n        peg$c9 = peg$literalExpectation(\"/-->\", false),\n        peg$c10 = function(blockName, attrs) {\n            /** <?php\n            return array(\n              'blockName'  => $blockName,\n              'attrs'      => $attrs,\n              'innerBlocks' => array(),\n              'innerHTML' => '',\n            );\n            ?> **/\n\n            return {\n              blockName: blockName,\n              attrs: attrs,\n              innerBlocks: [],\n              innerHTML: ''\n            };\n          },\n        peg$c11 = function(s, children, e) {\n            /** <?php\n            list( $innerHTML, $innerBlocks ) = peg_array_partition( $children, 'is_string' );\n\n            return array(\n              'blockName'  => $s['blockName'],\n              'attrs'      => $s['attrs'],\n              'innerBlocks'  => $innerBlocks,\n              'innerHTML'  => implode( '', $innerHTML ),\n            );\n            ?> **/\n\n            var innerContent = partition( function( a ) { return 'string' === typeof a }, children );\n            var innerHTML = innerContent[ 0 ];\n            var innerBlocks = innerContent[ 1 ];\n\n            return {\n              blockName: s.blockName,\n              attrs: s.attrs,\n              innerBlocks: innerBlocks,\n              innerHTML: innerHTML.join( '' )\n            };\n          },\n        peg$c12 = \"-->\",\n        peg$c13 = peg$literalExpectation(\"-->\", false),\n        peg$c14 = function(blockName, attrs) {\n            /** <?php\n            return array(\n              'blockName' => $blockName,\n              'attrs'     => $attrs,\n            );\n            ?> **/\n\n            return {\n              blockName: blockName,\n              attrs: attrs\n            };\n          },\n        peg$c15 = \"/wp:\",\n        peg$c16 = peg$literalExpectation(\"/wp:\", false),\n        peg$c17 = function(blockName) {\n            /** <?php\n            return array(\n              'blockName' => $blockName,\n            );\n            ?> **/\n\n            return {\n              blockName: blockName\n            };\n          },\n        peg$c18 = \"/\",\n        peg$c19 = peg$literalExpectation(\"/\", false),\n        peg$c20 = function(type) {\n            /** <?php return \"core/$type\"; ?> **/\n            return 'core/' + type;\n          },\n        peg$c21 = /^[a-z]/,\n        peg$c22 = peg$classExpectation([[\"a\", \"z\"]], false, false),\n        peg$c23 = /^[a-z0-9_\\-]/,\n        peg$c24 = peg$classExpectation([[\"a\", \"z\"], [\"0\", \"9\"], \"_\", \"-\"], false, false),\n        peg$c25 = \"{\",\n        peg$c26 = peg$literalExpectation(\"{\", false),\n        peg$c27 = \"}\",\n        peg$c28 = peg$literalExpectation(\"}\", false),\n        peg$c29 = \"\",\n        peg$c30 = function(attrs) {\n            /** <?php return json_decode( $attrs, true ); ?> **/\n            return maybeJSON( attrs );\n          },\n        peg$c31 = /^[ \\t\\r\\n]/,\n        peg$c32 = peg$classExpectation([\" \", \"\\t\", \"\\r\", \"\\n\"], false, false),\n\n        peg$currPos          = 0,\n        peg$savedPos         = 0,\n        peg$posDetailsCache  = [{ line: 1, column: 1 }],\n        peg$maxFailPos       = 0,\n        peg$maxFailExpected  = [],\n        peg$silentFails      = 0,\n\n        peg$result;\n\n    if (\"startRule\" in options) {\n      if (!(options.startRule in peg$startRuleFunctions)) {\n        throw new Error(\"Can't start parsing from rule \\\"\" + options.startRule + \"\\\".\");\n      }\n\n      peg$startRuleFunction = peg$startRuleFunctions[options.startRule];\n    }\n\n    function text() {\n      return input.substring(peg$savedPos, peg$currPos);\n    }\n\n    function location() {\n      return peg$computeLocation(peg$savedPos, peg$currPos);\n    }\n\n    function expected(description, location) {\n      location = location !== void 0 ? location : peg$computeLocation(peg$savedPos, peg$currPos)\n\n      throw peg$buildStructuredError(\n        [peg$otherExpectation(description)],\n        input.substring(peg$savedPos, peg$currPos),\n        location\n      );\n    }\n\n    function error(message, location) {\n      location = location !== void 0 ? location : peg$computeLocation(peg$savedPos, peg$currPos)\n\n      throw peg$buildSimpleError(message, location);\n    }\n\n    function peg$literalExpectation(text, ignoreCase) {\n      return { type: \"literal\", text: text, ignoreCase: ignoreCase };\n    }\n\n    function peg$classExpectation(parts, inverted, ignoreCase) {\n      return { type: \"class\", parts: parts, inverted: inverted, ignoreCase: ignoreCase };\n    }\n\n    function peg$anyExpectation() {\n      return { type: \"any\" };\n    }\n\n    function peg$endExpectation() {\n      return { type: \"end\" };\n    }\n\n    function peg$otherExpectation(description) {\n      return { type: \"other\", description: description };\n    }\n\n    function peg$computePosDetails(pos) {\n      var details = peg$posDetailsCache[pos], p;\n\n      if (details) {\n        return details;\n      } else {\n        p = pos - 1;\n        while (!peg$posDetailsCache[p]) {\n          p--;\n        }\n\n        details = peg$posDetailsCache[p];\n        details = {\n          line:   details.line,\n          column: details.column\n        };\n\n        while (p < pos) {\n          if (input.charCodeAt(p) === 10) {\n            details.line++;\n            details.column = 1;\n          } else {\n            details.column++;\n          }\n\n          p++;\n        }\n\n        peg$posDetailsCache[pos] = details;\n        return details;\n      }\n    }\n\n    function peg$computeLocation(startPos, endPos) {\n      var startPosDetails = peg$computePosDetails(startPos),\n          endPosDetails   = peg$computePosDetails(endPos);\n\n      return {\n        start: {\n          offset: startPos,\n          line:   startPosDetails.line,\n          column: startPosDetails.column\n        },\n        end: {\n          offset: endPos,\n          line:   endPosDetails.line,\n          column: endPosDetails.column\n        }\n      };\n    }\n\n    function peg$fail(expected) {\n      if (peg$currPos < peg$maxFailPos) { return; }\n\n      if (peg$currPos > peg$maxFailPos) {\n        peg$maxFailPos = peg$currPos;\n        peg$maxFailExpected = [];\n      }\n\n      peg$maxFailExpected.push(expected);\n    }\n\n    function peg$buildSimpleError(message, location) {\n      return new peg$SyntaxError(message, null, null, location);\n    }\n\n    function peg$buildStructuredError(expected, found, location) {\n      return new peg$SyntaxError(\n        peg$SyntaxError.buildMessage(expected, found),\n        expected,\n        found,\n        location\n      );\n    }\n\n    function peg$parseBlock_List() {\n      var s0, s1, s2, s3, s4, s5, s6, s7, s8, s9;\n\n      s0 = peg$currPos;\n      s1 = peg$currPos;\n      s2 = [];\n      s3 = peg$currPos;\n      s4 = peg$currPos;\n      peg$silentFails++;\n      s5 = peg$parseBlock();\n      peg$silentFails--;\n      if (s5 === peg$FAILED) {\n        s4 = void 0;\n      } else {\n        peg$currPos = s4;\n        s4 = peg$FAILED;\n      }\n      if (s4 !== peg$FAILED) {\n        if (input.length > peg$currPos) {\n          s5 = input.charAt(peg$currPos);\n          peg$currPos++;\n        } else {\n          s5 = peg$FAILED;\n          if (peg$silentFails === 0) { peg$fail(peg$c0); }\n        }\n        if (s5 !== peg$FAILED) {\n          s4 = [s4, s5];\n          s3 = s4;\n        } else {\n          peg$currPos = s3;\n          s3 = peg$FAILED;\n        }\n      } else {\n        peg$currPos = s3;\n        s3 = peg$FAILED;\n      }\n      while (s3 !== peg$FAILED) {\n        s2.push(s3);\n        s3 = peg$currPos;\n        s4 = peg$currPos;\n        peg$silentFails++;\n        s5 = peg$parseBlock();\n        peg$silentFails--;\n        if (s5 === peg$FAILED) {\n          s4 = void 0;\n        } else {\n          peg$currPos = s4;\n          s4 = peg$FAILED;\n        }\n        if (s4 !== peg$FAILED) {\n          if (input.length > peg$currPos) {\n            s5 = input.charAt(peg$currPos);\n            peg$currPos++;\n          } else {\n            s5 = peg$FAILED;\n            if (peg$silentFails === 0) { peg$fail(peg$c0); }\n          }\n          if (s5 !== peg$FAILED) {\n            s4 = [s4, s5];\n            s3 = s4;\n          } else {\n            peg$currPos = s3;\n            s3 = peg$FAILED;\n          }\n        } else {\n          peg$currPos = s3;\n          s3 = peg$FAILED;\n        }\n      }\n      if (s2 !== peg$FAILED) {\n        s1 = input.substring(s1, peg$currPos);\n      } else {\n        s1 = s2;\n      }\n      if (s1 !== peg$FAILED) {\n        s2 = [];\n        s3 = peg$currPos;\n        s4 = peg$parseBlock();\n        if (s4 !== peg$FAILED) {\n          s5 = peg$currPos;\n          s6 = [];\n          s7 = peg$currPos;\n          s8 = peg$currPos;\n          peg$silentFails++;\n          s9 = peg$parseBlock();\n          peg$silentFails--;\n          if (s9 === peg$FAILED) {\n            s8 = void 0;\n          } else {\n            peg$currPos = s8;\n            s8 = peg$FAILED;\n          }\n          if (s8 !== peg$FAILED) {\n            if (input.length > peg$currPos) {\n              s9 = input.charAt(peg$currPos);\n              peg$currPos++;\n            } else {\n              s9 = peg$FAILED;\n              if (peg$silentFails === 0) { peg$fail(peg$c0); }\n            }\n            if (s9 !== peg$FAILED) {\n              s8 = [s8, s9];\n              s7 = s8;\n            } else {\n              peg$currPos = s7;\n              s7 = peg$FAILED;\n            }\n          } else {\n            peg$currPos = s7;\n            s7 = peg$FAILED;\n          }\n          while (s7 !== peg$FAILED) {\n            s6.push(s7);\n            s7 = peg$currPos;\n            s8 = peg$currPos;\n            peg$silentFails++;\n            s9 = peg$parseBlock();\n            peg$silentFails--;\n            if (s9 === peg$FAILED) {\n              s8 = void 0;\n            } else {\n              peg$currPos = s8;\n              s8 = peg$FAILED;\n            }\n            if (s8 !== peg$FAILED) {\n              if (input.length > peg$currPos) {\n                s9 = input.charAt(peg$currPos);\n                peg$currPos++;\n              } else {\n                s9 = peg$FAILED;\n                if (peg$silentFails === 0) { peg$fail(peg$c0); }\n              }\n              if (s9 !==


What are these defaults? Is one of them the current parser in core? What is the other?

Hywan · 2018-07-04T08:31:36Z

docs/parser/comparator/runner-php-sync.php

+header( 'Access-Control-Allow-Origin: *' );
+echo json_encode( [
+    'parse' => $parse,
+    'µs' => $runtime,


µs or us? cf https://github.com/WordPress/gutenberg/blob/c41a38fe1914de169aeaa25f62efbcd7e63cbcd3/docs/parser/comparator/README.md

good catch! I changed it because Visual Studio Code was inserting an unexpected character when I typed µ.

Hywan · 2018-07-04T08:32:22Z

docs/parser/comparator/README.md

+		beforeParserInit: [number] // bytes right before initializing parser
+		afterParserInit: [number] // bytes right after initializing parser
+		end: [number] // bytes after parsing document the requested number of times
+	}


What about sentinel?

it's only there to try and de-optimize the loop to make sure the compiler doesn't try and do anything funny by not running code

it's not important to the output format. I considered adding it in there but I ended up leaving it irrelevant

For some time we've needed a more performant PHP parser for the first stage of parsing the `post_content` document. See #1681 (early exploration) See #8044 (parser performance issue) See #1775 (parser performance, fixed in php-pegjs) I'm proposing this implementation of the spec parser as an alternative to the auto-generated parser from the PEG definition. This is not yet ready to go but I wanted to get the code in a branch so I can iterate on it and garner early feedback. This should eventually provide a setup fixture for #6831 wherein we are testing alternate parser implementations. - designed as a basic recursive-descent - but doesn't recurse on the call-stack, recurses via trampoline - moves linearly through document in one pass - relies on RegExp for tokenization - nested blocks include the nested content in their `innerHTML` this needs to go away - create test fixutre - figure out where to save this file

@Hywan

* Parser: Propose new hand-coded PHP parser For some time we've needed a more performant PHP parser for the first stage of parsing the `post_content` document. See #1681 (early exploration) See #8044 (parser performance issue) See #1775 (parser performance, fixed in php-pegjs) I'm proposing this implementation of the spec parser as an alternative to the auto-generated parser from the PEG definition. This is not yet ready to go but I wanted to get the code in a branch so I can iterate on it and garner early feedback. This should eventually provide a setup fixture for #6831 wherein we are testing alternate parser implementations. - designed as a basic recursive-descent - but doesn't recurse on the call-stack, recurses via trampoline - moves linearly through document in one pass - relies on RegExp for tokenization - nested blocks include the nested content in their `innerHTML` this needs to go away - create test fixutre - figure out where to save this file * Fix issue with containing the nested innerHTML * Also handle newlines as whitespace * Use classes for some static typing * add type hints * remove needless comment * space where space is due * meaningless rename * remove needless function call * harmonize with spec parser * don't forget freeform HTML before blocks * account for oddity in spec-parser * add some polish, fix a thing * comment it * add JS version too * Change `.` to `[^]` because `/s` isn't well supported in JS The `s` flag on the RegExp object informs the engine to treat a dot character as a class that includes the newline character. Without it newlines aren't considered in the dot. Since this flag is new to Javascript and not well supported in different browsers I have removed it in favor of an explicit class of characters that _does_ include the newline, namely the open exclusion of `[^]` which permits all input characters. Hat-top to @Hywan for finding this. * Move code into `/packages` directory, prepare for review * take out names from RegExp pattern to not fail tests * Fix bug in parser: store HTML soup in stack frames while parsing Previously we were sending all "HTML soup" segments of HTML between blocks to the output list before any blocks were processed. We should have been tracking these segments during the parsing and only spit them out when closing a block at the top level. This change stores the index into the input document at which that soup starts if it exists and then produces the freeform block when adding a block to the output from the parse frame stack. * fix whitespace * fix oddity in spec * match styles * use class name filter on server-side parser class * fix whitespace * Document extensibility * fix typo in example code * Push failing parsing test * fix lazy/greedy bug in parser regexp * Docs: Fix typos, links, tweak style. * update from PR feedback * trim docs * Load default block parser, replacing PEG-generated one * Expand `?:` shorthand for PHP 5.2 compat * add fixtures test for default parser * spaces to tabs * could we need no assoc? * fill out return array * put that assoc back in there * isometrize * rename and add 0 * Conditionally include the parser class * Add docblocks * Standardize the package configuration

gziolo · 2019-02-07T13:58:42Z

@youknowriad or @mcsf - what's the future of this effort? Do you plan to invest some time into having it merged into Gutenberg? Who could potentially help to review it?

kwight · 2019-03-09T19:07:35Z

@dmsnell I believe you want to keep this open and moving forward – what could next steps be, @youknowriad or @mcsf ?

dmsnell · 2019-03-09T19:08:03Z

This could probably turn into a package inside of the packages directory but it's still in a pretty weird state when thinking about merging. It's more of a separate application that runs independent from everything else here. I would welcome advice on where it should eventually live.

mcsf · 2019-03-11T12:33:36Z

This could probably turn into a package inside of the packages directory but it's still in a pretty weird state when thinking about merging. It's more of a separate application that runs independent from everything else here. I would welcome advice on where it should eventually live.

Some separation would be nice, but I'm also not sure whether a package would be the best solution. Maybe @gziolo knows best.

gziolo · 2019-03-12T08:13:47Z

Clone the Calypso repository and run npm install

I see this in the setup instructions. There are also some references to Calypso in code. Does it need to depend on Calypso. I think that would be the only reason why it wouldn't be ideal to have it located inside Gutenberg packages.

mcsf · 2019-03-13T11:40:39Z

I see this in the setup instructions. There are also some references to Calypso in code. Does it need to depend on Calypso. I think that would be the only reason why it wouldn't be ideal to have it located inside Gutenberg packages.

I didn't remember that. Indeed, if it turns out to be a necessary dependency, I'd move this out of Gutenberg.

paaljoachim · 2019-08-30T23:09:26Z

It would be great with a status update of this PR, as well as how we can move this forward. Thanks.
@mcsf @gziolo @dmsnell

mcsf · 2019-09-13T14:32:40Z

It would be great with a status update of this PR, as well as how we can move this forward.

Right now all items in the WP 5.3 cycle have priority. We can keep this issue open for sure, but I don't see us investing in this soon.

youknowriad · 2020-03-18T10:31:08Z

Trying to triage PRs today. Given that we don't plan to invest time here soon. I'm going to close this PR for now. We can reopen if there's a change in priorities. Thanks all for your efforts.

dmsnell · 2020-03-18T15:39:16Z

Fair enough. Thanks for holding it open so long. At some point when/if we come back to it we can probably start anew without much loss.

dmsnell added [Type] Developer Documentation Documentation for developers [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels May 18, 2018

dmsnell requested review from mcsf, mtias and aduth May 18, 2018 14:43

mcsf reviewed Jun 1, 2018

View reviewed changes

mcsf mentioned this pull request Jul 2, 2018

Packages: Create new spec-parser package #7664

Merged

4 tasks

dmsnell force-pushed the parser/add-browser-comparator branch 2 times, most recently from 8d0ec56 to c41a38f Compare July 3, 2018 20:17

dmsnell changed the title ~~Parser: Add simple in-browser JS parser comparator~~ Parser: Add simple in-browser parser comparator Jul 3, 2018

Hywan reviewed Jul 4, 2018

View reviewed changes

dmsnell mentioned this pull request Jul 20, 2018

Parser: Propose new hand-coded parser #8083

Merged

4 tasks

dmsnell changed the title ~~Parser: Add simple in-browser parser comparator~~ Parser: Build system to compare alternative parser implementations Jul 24, 2018

mcsf mentioned this pull request Jul 27, 2018

Overview of Short-term Parsing Enhancements #8244

Closed

11 tasks

swissspidy mentioned this pull request Oct 21, 2018

Extensive articles with over 1700 words and various blocks make Gutenberg very slow. #10418

Closed

dmsnell force-pushed the parser/add-browser-comparator branch from c41a38f to 4995c94 Compare November 1, 2018 18:01

This was referenced Nov 1, 2018

Parser: Use a non-greedy matcher to extract the block args #11355

Closed

Parser: Optimize JSON-attribute parsing #11369

Merged

dmsnell force-pushed the parser/add-browser-comparator branch 6 times, most recently from 1c9ddc8 to 0ef9850 Compare November 10, 2018 01:51

dmsnell mentioned this pull request Nov 15, 2018

SDK: Add generic builder script Automattic/wp-calypso#28471

Merged

dmsnell force-pushed the parser/add-browser-comparator branch from 0ef9850 to f989286 Compare November 27, 2018 00:56

rebase and flatten branch

84f3c87

dmsnell force-pushed the parser/add-browser-comparator branch from f989286 to 84f3c87 Compare November 29, 2018 22:23

gziolo added the Needs Decision Needs a decision to be actionable or relevant label Feb 7, 2019

youknowriad closed this Mar 18, 2020

youknowriad deleted the parser/add-browser-comparator branch May 27, 2020 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: Build system to compare alternative parser implementations #6831

Parser: Build system to compare alternative parser implementations #6831

dmsnell commented May 18, 2018 •

edited

Loading

mcsf left a comment

mcsf Jun 1, 2018 •

edited

Loading

mcsf Jun 1, 2018

mcsf Jun 1, 2018

Hywan Jul 4, 2018

dmsnell Jul 4, 2018

Hywan Jul 4, 2018

dmsnell Jul 4, 2018

gziolo commented Feb 7, 2019

kwight commented Mar 9, 2019

dmsnell commented Mar 9, 2019

mcsf commented Mar 11, 2019

gziolo commented Mar 12, 2019

mcsf commented Mar 13, 2019

paaljoachim commented Aug 30, 2019 •

edited

Loading

mcsf commented Sep 13, 2019

youknowriad commented Mar 18, 2020

dmsnell commented Mar 18, 2020

Parser: Build system to compare alternative parser implementations #6831

Parser: Build system to compare alternative parser implementations #6831

Conversation

dmsnell commented May 18, 2018 • edited Loading

Description

I could use your help!

How has this been tested?

Types of changes

Example output run after many iterations

Checklist:

mcsf left a comment

Choose a reason for hiding this comment

mcsf Jun 1, 2018 • edited Loading

Choose a reason for hiding this comment

mcsf Jun 1, 2018

Choose a reason for hiding this comment

mcsf Jun 1, 2018

Choose a reason for hiding this comment

Hywan Jul 4, 2018

Choose a reason for hiding this comment

dmsnell Jul 4, 2018

Choose a reason for hiding this comment

Hywan Jul 4, 2018

Choose a reason for hiding this comment

dmsnell Jul 4, 2018

Choose a reason for hiding this comment

gziolo commented Feb 7, 2019

kwight commented Mar 9, 2019

dmsnell commented Mar 9, 2019

mcsf commented Mar 11, 2019

gziolo commented Mar 12, 2019

mcsf commented Mar 13, 2019

paaljoachim commented Aug 30, 2019 • edited Loading

mcsf commented Sep 13, 2019

youknowriad commented Mar 18, 2020

dmsnell commented Mar 18, 2020

dmsnell commented May 18, 2018 •

edited

Loading

mcsf Jun 1, 2018 •

edited

Loading

paaljoachim commented Aug 30, 2019 •

edited

Loading