Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jsinterp] Actual JS interpreter #11272

Closed
wants to merge 127 commits into from
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
d328b8c
[jsinterp] Actual parsing
sulyi Nov 23, 2016
2c85715
[jsinterp] Handling comments
sulyi Nov 23, 2016
cc895cd
[jsinterp] Parsing expr (cleanup needed)
sulyi Nov 24, 2016
8c87a18
[jsinterp] Calling field and test
sulyi Nov 24, 2016
2076b0b
[jsinterp] Clean up
sulyi Nov 25, 2016
da73cd9
[jsinterp] Quick regex fixes (thx to yan12125)
sulyi Nov 25, 2016
71a485f
[jsinterp] Complex call test (thx to yan12125)
sulyi Nov 25, 2016
8842f08
[jsinterp] String literal regex change
sulyi Nov 26, 2016
c485fe7
[jsinterp] Reject method call when name is empty (+reminder TOTOs)
sulyi Nov 26, 2016
ba5a400
[jsinterp] Simpler regex regex (+more TOTO)
sulyi Nov 26, 2016
b089388
[jsinterp] Lexer overhaul
sulyi Nov 28, 2016
9bd5dee
[jsinterp] Value parsing
sulyi Nov 28, 2016
aa7eb3d
[jsinterp] No OrderedDict
sulyi Nov 30, 2016
a0fa6bf
[jsinterp] Parser mock up
sulyi Nov 30, 2016
67d5653
[jsinterp] Minor quick fixes
sulyi Nov 30, 2016
a89d490
[jsinterp] TokenStream, expression mock up
sulyi Dec 3, 2016
f6005dc
[jsinterp] Adding _operator_expression using reversed polish notation
sulyi Dec 3, 2016
f605783
[jsinterp] Parser - take one (untested)
sulyi Dec 4, 2016
f6ad8db
[jsinterp] Refactoring and minor fixes
sulyi Dec 4, 2016
7864078
[jsinterp] Preliminary fixes after some testing of ast
sulyi Dec 5, 2016
d422aef
[jsinterp] Very basic interpreter
sulyi Dec 6, 2016
ce4a616
[jsinterp] Token class for tokens
sulyi Dec 7, 2016
c426efd
[jsinterp] More tokens
sulyi Dec 7, 2016
c2f280d
[jsinterp] Compatibility fix
sulyi Dec 7, 2016
8ff8a70
[jsinterp] Str tokens are easier to deal with
sulyi Dec 7, 2016
599b9db
[jsinterp] First parser tests
sulyi Dec 8, 2016
70a5e31
[jsinterp] Parentheses fix (test and parser)
sulyi Dec 8, 2016
4999fcc
[jsinterp] More test and str fix
sulyi Dec 8, 2016
651a1e7
[jsinterp] Coding convention fixes
sulyi Dec 8, 2016
dd6a2b5
[jsinterp] Clean up
sulyi Dec 9, 2016
c5c1273
Merge branch 'master' into jsinterp
sulyi Dec 9, 2016
6fa4eb6
[jsinterp] Fixing compatibility
sulyi Dec 9, 2016
a9c7310
[jsinterp] Adding context handling
sulyi Dec 10, 2016
e392f78
[jsinterp] Formatting code
sulyi Dec 10, 2016
88d2a4e
[jsinterp] Unittest2 in reqs
sulyi Dec 10, 2016
200903c
[jsinterp] Fixing py3 zip generator issues in parser tests
sulyi Dec 10, 2016
9d1f756
[jsinterp] Fixing deep copy zip in test_jsinterp_parse
sulyi Dec 10, 2016
f942bb3
[jsinterp] Refactoring getvalue and putvalue
sulyi Dec 10, 2016
9b5e55a
[jsinterp] Mozilla-central test first try
sulyi Dec 10, 2016
aa6e752
[jsinterp] Fixing Reference repr
sulyi Dec 10, 2016
86de1e8
[jsinterp] Adding function declaration and fixing block statement parser
sulyi Dec 10, 2016
4f55fe7
[jsinterp] Adding if parser (test needed)
sulyi Dec 11, 2016
57c8ccb
[jsinterp] Re-prioritising TODOs
sulyi Dec 11, 2016
ad49621
[jsinterp] Adding with and switch parser and fixes (tests needed)
sulyi Dec 11, 2016
c2e6ca5
[jsinterp] Adding code to if and switch test
sulyi Dec 11, 2016
ad288aa
[jsinterp] Parser test code fixes
sulyi Dec 11, 2016
48aaa41
[jsinterp] Finished parser if test
sulyi Dec 11, 2016
dedb6ee
[jsinterp] Added try parser (test needed)
sulyi Dec 11, 2016
bae3166
[jsinterp] Added debugger and throw parser (test needed)
sulyi Dec 11, 2016
96e5068
[jsinterp] Adding parser for label statement and function expression
sulyi Dec 11, 2016
f24cafe
[jsinterp] Adding parser object literal
sulyi Dec 11, 2016
a8a445f
[jsinterp] Fixing TokenStrem pop, label statement, function body
sulyi Dec 11, 2016
253e326
[jsinterp] Adding do and while parser
sulyi Dec 12, 2016
3ba28c6
[jsinterp] Adding for parser
sulyi Dec 12, 2016
cc9cb30
[jsinterp] Reprioritizing TODOs in test_jsinterp_parser.py
sulyi Dec 12, 2016
007f19e
[jsinterp] Adding code to parser tests
sulyi Dec 12, 2016
cf4c9c3
[jsinterp] Adding switch ast to parser test
sulyi Dec 12, 2016
558290d
[jsinterp] Adding object ast to parser test
sulyi Dec 12, 2016
f7993a1
[jsinterp] Refactor
sulyi Dec 12, 2016
2533dc4
[jsinterp] Adding ast to test_function_expression
sulyi Dec 12, 2016
fe141c4
[jsinterp] Refactor _object_literal
sulyi Dec 12, 2016
a2e42ed
[jsinterp] Adding ast to do parser test
sulyi Dec 12, 2016
4b8754c
[jsinterp] Adding ast to while parser test
sulyi Dec 12, 2016
b397ea2
[jsinterp] Adding ast to for parser test
sulyi Dec 12, 2016
cd0bb42
[jsinterp] Adding ast to for empty and for in parser test
sulyi Dec 12, 2016
ab37e2b
[test] Adding jstests test suite
sulyi Dec 14, 2016
c4c2aa2
[test] Adding support for signed values (hopefully)
sulyi Dec 15, 2016
e1444dd
[test] Adding support for signed values
sulyi Dec 15, 2016
0e4dd1a
[test, jsinterp] Adding sign test and refactor and fixing interpretation
sulyi Dec 15, 2016
d7443e1
[jsinterp] Adding interpreter support for pre- and postfix expressions
sulyi Dec 15, 2016
cd2bf30
[test] Adding logging to TestJSInterpreterParse
sulyi Dec 15, 2016
5238ed1
[test] Adding logging to TestJSInterpreter
sulyi Dec 15, 2016
1716801
[jsinterp] Adding interpreter support to get field
sulyi Dec 15, 2016
fce5722
[jsinterp] Adding error handling to global variable init
sulyi Dec 15, 2016
ee3dc29
[jsinterp] Adding interpreter support for set field
sulyi Dec 16, 2016
4e6f689
[jsinterp] Fixing set field
sulyi Dec 16, 2016
dca2e9e
[jsinterp] Fixing compat import
sulyi Dec 16, 2016
3b53669
[jsinterp] Adding function declaration and call
sulyi Dec 17, 2016
3f075d8
[test] jstest fixes
sulyi Dec 27, 2016
3d0252a
[jsinterp] Refactoring jsparser
sulyi Dec 28, 2016
a5e7022
[jstests] Ordering imports in __init__
sulyi Dec 29, 2016
bddf482
[jstests] Doc, dynamic import
sulyi Dec 29, 2016
41596ff
[jsbuilt-ins] jsbuilt_ins mock up
sulyi Dec 28, 2016
6f2ac27
[jsbuilt-ins] Table of content of the book of black magic
sulyi Jan 21, 2017
1725514
[jsinterp] super object in subclasses __init__
sulyi Jan 22, 2017
0eef083
[jsbuilt-ins] a riddle wrapped in mystery inside an enigma
sulyi Jan 23, 2017
484a7d2
[jsbuilt-ins] adding _type and JSObject constructor
sulyi Jan 23, 2017
65e9b0b
[jsbuilt-ins] adding Function and Array constructors
sulyi Jan 24, 2017
2dd9864
[jsbuilt-ins] minor props fix
sulyi Jan 27, 2017
a500c34
[jsbuilt-ins] major props fix
sulyi Jan 27, 2017
598f5f2
[jsbuilt-ins] String mock up Function constructor fix, to_string plac…
sulyi Jan 28, 2017
56cecdd
[jsbuilt-ins] fixing to_string
sulyi Jan 29, 2017
9ead39c
[jsbuilt-ins] fixing numerical stability of to_string
sulyi Jan 30, 2017
8733120
[jsbuilt-ins] implementing Boolean object
sulyi Feb 1, 2017
8729fe6
[jsbuilt-ins] adding type conversions (to number )
sulyi Feb 18, 2017
ec79b14
[jsbuilt-ins] adding Number class and prototype
sulyi Feb 20, 2017
dbedff2
[jsbuilt-ins] global object properties mock up
sulyi Feb 20, 2017
4d386f0
[jsbuilt-ins] major refactor
sulyi Feb 21, 2017
0136be4
[jsbuilt-ins] fixing constructors
sulyi Mar 2, 2017
49dba39
Merge branch 'master' into jsinterp
sulyi May 30, 2018
1126698
[jsinterp] Renaming `jsinterp` to jsinterp2
sulyi May 30, 2018
e44a252
[jsinterp] Using unicode literals
sulyi May 31, 2018
53f8eff
[jsbuilt_ins] Fixing circular imports
sulyi May 31, 2018
61fe8d2
[jsbuilt-ins] premerge
sulyi Jan 22, 2017
b856d55
Merge branch 'jsbuilt-ins' into jsinterp
sulyi Jun 1, 2018
70ac98a
[jsinterp] Fixing missed unicode support (yet again)
sulyi Jun 1, 2018
1f40e3e
[jsinterp] Test suit update
sulyi Jun 2, 2018
db0dc7b
[jsinterp] Fixing typos and code style
sulyi Jun 2, 2018
d977e93
[jsinterp] Fixing test skip messages
sulyi Jun 3, 2018
38b2602
[jsinterp] Complying with PEP 479
sulyi Jun 3, 2018
b9061d6
[jsinterp] Fixing TODOs and comments
sulyi Jun 4, 2018
2ce996c
[jsinterp] Unicode docstring hack
sulyi Jun 4, 2018
70d9194
[jsinterp] Multi level logging in tests
sulyi Jun 4, 2018
1b9d883
[jsinterp] Faking `Logger.getChild` for py2.6
sulyi Jun 4, 2018
327bb2d
[jsinterp] Fixing code style
sulyi Jun 4, 2018
f9f030a
[jsinterp] Implementing String split
sulyi Jun 9, 2018
db44dee
[jsinterp] Renaming tests
sulyi Jun 9, 2018
105faaf
[jsinterp] Revert `youtube-dl/youtube_dl/extractor/youtube.py`
sulyi Jun 9, 2018
b8a1742
[jsinterp] Rename `js2test` to `jstests`
sulyi Jun 10, 2018
848aa79
[jsinterp] Fixing incomplete refactor
sulyi Jun 10, 2018
bbea188
[jsinterp] revert `youtube_dl/extractor/youtube.py` (yet again)
sulyi Jun 10, 2018
37d6306
[jsinterp] Adding `JSArrayPrototype#_slice`
sulyi Jun 10, 2018
8060889
[jsinterp] TODOs in `JSStringPrototype#_split`
sulyi Jun 10, 2018
a8c640e
[jsinterp] Fixing broken Assignment Expression
sulyi Jun 10, 2018
a33b47e
[jsinterp] Adding handling lineterminator
sulyi Jun 10, 2018
93c0bb5
[jsinterp] Fixing types and operators
sulyi Jun 11, 2018
c0ef911
[jsinterp] Adding delete and void operators
sulyi Jun 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions test/test_jsinterp.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def test_assignments(self):

def test_comments(self):
'Skipping: Not yet fully implemented'
return
# return
jsi = JSInterpreter('''
function x() {
var x = /* 1 + */ 2;
Expand Down Expand Up @@ -111,7 +111,19 @@ def test_call(self):
function z() { return y(3); }
''')
self.assertEqual(jsi.call_function('z'), 5)

jsi = JSInterpreter('function x(a) { return a.split(""); }', objects={'a': 'abc'})
self.assertEqual(jsi.call_function('x'), ["a", "b", "c"])
return
jsi = JSInterpreter('''
function a(x) { return x; }
function b(x) { return x; }
function c() { return [a, b][0](0); }
''')
self.assertEqual(jsi.call_function('c'), 0)

def test_getfield(self):
jsi = JSInterpreter('function c() { return a.var; }', objects={'a': {'var': 3}})
self.assertEqual(jsi.call_function('c'), 3)

if __name__ == '__main__':
unittest.main()
235 changes: 233 additions & 2 deletions youtube_dl/jsinterp.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,122 @@
ExtractorError,
)

__DECIMAL_RE = r'(?:[1-9][0-9]*)|0'
__OCTAL_RE = r'0[0-7]+'
__HEXADECIMAL_RE = r'0[xX][0-9a-fA-F]+'
__ESC_UNICODE_RE = r'u[0-9a-fA-F]{4}'
__ESC_HEX_RE = r'x[0-9a-fA-F]{2}'

_OPERATORS = [
('|', operator.or_),
('^', operator.xor),
('&', operator.and_),
('>>', operator.rshift),
('<<', operator.lshift),
('>>>', lambda cur, right: cur >> right if cur >= 0 else (cur + 0x100000000) >> right),
('-', operator.sub),
('+', operator.add),
('%', operator.mod),
('/', operator.truediv),
('*', operator.mul),
('*', operator.mul)
]
_ASSIGN_OPERATORS = [(op + '=', opfunc) for op, opfunc in _OPERATORS]
_ASSIGN_OPERATORS.append(('=', lambda cur, right: right))

# TODO flow control and others probably
_RESERVED_WORDS = ['function', 'var', 'const', 'return']

_NAME_RE = r'[a-zA-Z_$][a-zA-Z_$0-9]*'

# non-escape char also can be escaped, but line continuation and quotes has to be
# XXX unicode and hexadecimal escape sequences should be validated
_SINGLE_QUOTED_RE = r"""'(?:(?:\\'|\n)|[^'\n])*'"""
_DOUBLE_QUOTED_RE = r'''"(?:(?:\\"|\n)|[^"\n])*"'''
_STRING_RE = r'(?:%s)|(?:%s)' % (_SINGLE_QUOTED_RE, _DOUBLE_QUOTED_RE)

_INTEGER_RE = r'(?:%(hex)s)|(?:%(dec)s)|(?:%(oct)s)' % {'hex': __HEXADECIMAL_RE, 'dec': __DECIMAL_RE, 'oct': __OCTAL_RE}
_FLOAT_RE = r'(?:(?:%(dec)s\.[0-9]*)|(?:\.[0-9]+))(?:[eE][+-]?[0-9]+)?' % {'dec': __DECIMAL_RE}

_BOOL_RE = r'true|false'
_NULL_RE = r'null'

# XXX early validation might needed
# r'''/(?!\*)
# (?:(?:\\(?:[tnvfr0.\\+*?^$\[\]{}()|/]|[0-7]{3}|x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|c[A-Z]|))|[^/\n])*
# /(?:(?![gimy]*(?P<flag>[gimy])[gimy]*(?P=flag))[gimy]{0,4}\b|\s|$)'''
_REGEX_FLAGS_RE = r'(?![gimy]*(?P<reflag>[gimy])[gimy]*(?P=reflag))(?P<reflags>[gimy]{0,4}\b)'
_REGEX_RE = r'/(?!\*)(?P<rebody>(?:[^/\n]|(?:\\/))*)/(?:(?:%s)|(?:\s|$))' % _REGEX_FLAGS_RE

re.compile(_REGEX_RE)

_TOKENS = [
('id', _NAME_RE),
('null', _NULL_RE),
('bool', _BOOL_RE),
('str', _STRING_RE),
('int', _INTEGER_RE),
('float', _FLOAT_RE),
('regex', _REGEX_RE)
]

_RELATIONS = {
'lt': '<',
'gt': '>',
'le': '<=',
'ge': '>=',
'eq': '==',
'ne': '!=',
'seq': '===',
'sne': '!=='
}

_PUNCTUATIONS = {
'copen': '{',
'cclose': '}',
'popen': '(',
'pclose': ')',
'sopen': '[',
'sclose': ']',
'dot': '.',
'end': ';',
'comma': ',',
'inc': '++',
'dec': '--',
'not': '!',
'bnot': '~',
'and': '&&',
'or': '||',
'hook': '?',
'colon': ':'
}

token_ids = dict((token[0], i) for i, token in enumerate(_TOKENS))
op_ids = dict((op[0], i) for i, op in enumerate(_OPERATORS))
aop_ids = dict((aop[0], i)for i, aop in enumerate(_ASSIGN_OPERATORS))

_COMMENT_RE = r'(?P<comment>/\*(?:(?!\*/)(?:\n|.))*\*/)'
_TOKENS_RE = r'|'.join('(?P<%(id)s>%(value)s)' % {'id': name, 'value': value}
for name, value in _TOKENS)
_RESERVED_WORDS_RE = r'(?:(?P<rsv>%s)\b)' % r'|'.join(_RESERVED_WORDS)
_PUNCTUATIONS_RE = r'|'.join(r'(?P<%(id)s>%(value)s)' % {'id': name, 'value': re.escape(value)}
for name, value in _PUNCTUATIONS.items())
_RELATIONS_RE = r'|'.join(r'(?P<%(id)s>%(value)s)' % {'id': name, 'value': re.escape(value)}
for name, value in _RELATIONS.items())
_OPERATORS_RE = r'(?P<op>%s)' % r'|'.join(re.escape(op) for op, opfunc in _OPERATORS)
_ASSIGN_OPERATORS_RE = r'(?P<assign>%s)' % r'|'.join(re.escape(op) for op, opfunc in _ASSIGN_OPERATORS)

input_element = re.compile(r'''\s*(?:%(comment)s|%(rsv)s|%(token)s|%(punct)s|%(assign)s|%(op)s|%(rel)s)\s*''' % {
'comment': _COMMENT_RE,
'rsv': _RESERVED_WORDS_RE,
'token': _TOKENS_RE,
'punct': _PUNCTUATIONS_RE,
'assign': _ASSIGN_OPERATORS_RE,
'op': _OPERATORS_RE,
'rel': _RELATIONS_RE
})

undefined = object()


class JSInterpreter(object):
def __init__(self, code, objects=None):
Expand All @@ -34,6 +133,138 @@ def __init__(self, code, objects=None):
self._functions = {}
self._objects = objects

@staticmethod
def _next_statement(code, pos=0, stack_size=100):
def next_statement(lookahead, stack_top=100):
# TODO migrate interpretation
statement = []
feed_m = None
while lookahead < len(code):
feed_m = input_element.match(code, lookahead)
if feed_m is not None:
token_id = feed_m.lastgroup
if token_id in ('pclose', 'sclose', 'cclose', 'comma', 'end'):
return statement, lookahead, feed_m.end()
token_value = feed_m.group(token_id)
lookahead = feed_m.end()
if token_id == 'comment':
pass
elif token_id == 'rsv':
# XXX backward compatibility till parser migration
statement.append((token_id, token_value + ' '))
if token_value == 'return':
expressions, lookahead, _ = next_statement(lookahead, stack_top - 1)
statement.extend(expressions)
elif token_id in ('id', 'op', 'dot'):
if token_id == 'id':
# TODO handle label
pass
statement.append((token_id, token_value))
elif token_id in token_ids:
# TODO date
# TODO error handling
if token_id == 'null':
statement.append((token_id, None))
elif token_id == 'bool':
statement.append((token_id, {'true': True, 'false': False}[token_value]))
elif token_id == 'str':
statement.append((token_id, token_value))
elif token_id == 'int':
statement.append((token_id, int(token_value)))
elif token_id == 'float':
statement.append((token_id, float(token_value)))
elif token_id == 'regex':
regex = re.compile(feed_m.group('rebody'))
statement.append((token_id, {'re': regex, 'flags': feed_m.group('reflags')}))
elif token_id in ('assign', 'popen', 'sopen'):
statement.append((token_id, token_value))
while lookahead < len(code):
expressions, lookahead, _ = next_statement(lookahead, stack_top - 1)
statement.extend(expressions)
peek = input_element.match(code, lookahead)
if peek is not None:
peek_id = peek.lastgroup
peek_value = peek.group(peek_id)
if ((token_id == 'popen' and peek_id == 'pclose') or
(token_id == 'sopen' and peek_id == 'sclose')):
statement.append((peek_id, peek_value))
lookahead = peek.end()
break
elif peek_id == 'comma':
statement.append((peek_id, peek_value))
lookahead = peek.end()
elif peek_id == 'end':
break
else:
raise ExtractorError('Unexpected character %s at %d' % (
peek_value, peek.start(peek_id)))
else:
raise ExtractorError("Not yet implemented")
else:
raise ExtractorError("Not yet implemented")
return statement, lookahead, lookahead if feed_m is None else feed_m.end()

while pos < len(code):
stmt, _, pos = next_statement(pos, stack_size)
# XXX backward compatibility till parser migration
yield ''.join(str(value) for _, value in stmt)
raise StopIteration

@staticmethod
def _interpret_statement(stmt, local_vars, stack_size=100):
while stmt:
token_id, token_value = stmt.pop(0)
if token_id == 'copen':
# TODO block
pass
elif token_id == 'rsv':
if token_value == 'var':
has_another = True
while has_another:
next_token_id, next_token_value = stmt.pop(0)
if next_token_id in ('sopen', 'copen'):
pass
elif next_token_id != 'id':
raise ExtractorError('Missing variable name')
local_vars[token_value] = undefined

if stmt[0][0] == 'assign':
pass

if stmt[0][0] != 'comma':
break
elif token_value == 'function':
pass
elif token_value == 'if':
pass
elif token_value in ('break', 'continue'):
pass
elif token_value == 'return':
pass
elif token_value == 'with':
pass
elif token_value == 'switch':
pass
elif token_value == 'throw':
pass
elif token_value == 'try':
pass
elif token_value == 'debugger':
pass
elif token_id == 'label':
pass
elif token_id == 'id':
pass
else:
# lefthand-side_expr -> new_expr | call_expr
# call_expr -> member_expr args | call_expr args | call_expr [ expr ] | call_expr . id_name
# new_expr -> member_expr | new member_expr
# member_expr -> prime_expr | func_expr |
# member_expr [ expr ] | member_expr . id_name | new member_expr args
pass

# empty statement goes straight here

def interpret_statement(self, stmt, local_vars, allow_recursion=100):
if allow_recursion < 0:
raise ExtractorError('Recursion limit reached')
Expand Down Expand Up @@ -250,7 +481,7 @@ def call_function(self, funcname, *args):
def build_function(self, argnames, code):
def resf(args):
local_vars = dict(zip(argnames, args))
for stmt in code.split(';'):
for stmt in self._next_statement(code):
res, abort = self.interpret_statement(stmt, local_vars)
if abort:
break
Expand Down