-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New JSInterpreter Features #11292
Comments
Actual parsing is complicated as Javascript is not a good language (neither is Python lol) Is there a need? Personally I'm against such changes as that sounds like re-inventing wheels |
Yes there is a need. I'd like to do it. |
No. I was referring to SpiderMonkey of Gecko, V8 of Blink, JavascriptCore of Webkit and ChakraCore of IE/Edge, and maybe other JS engines used in popular browsers.
Could you give some concrete examples (website, etc.). I'm not sure what should be concerned in #11272. Without limited targets, I'll review it as a real complete JS engine. Well, at least obfuscated codes like #8489 (comment) (from openload) and iqiyi's login SDK should be supported:
|
I'll need to look into your suggestions, @yan12125, more carefully, but I don't think any of the engines you mentioned is written in py. Jaspyon can be found at https://bitbucket.org/santagada/jaspyon/ and pynarcissus at https://github.com/jtolds/pynarcissus, these are the JavaScript interpreters in py that I found. As the scope I'd say at first I'd aim to support all the code supported right now. That should be plenty to see how it goes. And a good base to build on and widen the support if it succeeds. My ultimate goal is indeed a complete JS engine, but saying out loud that does sound a bit overly ambitious at the moment. |
Yes they are implemented with C/C++/Objective-C. If your goal is a complete JS engine, they are excellent choices. Python wrappers for those existing engines are much easier to write and maintain in comparison with a new Python implementation from scratch. Could you share the idea for why a pure Python implementation is necessary? |
At the end of the day, it's up to the maintainers to decide if this is able to serve their needs or not. As far as platform independence goes it's better not having native code or third party modules. Implementing an JavaScript interpreter seams rather a whole lot of fun/challenge/practice/good reference... Also why not? |
@siddht1 What @sulyi wants is a Javascript interpreter in youtube-dl that handles scrips on web pages, not youtube downloaders written in Javascript. They are different.
That depends. If there are already excellent solutions for a complex task, few people will re-implement them beyond the scope of toy projects. If you need a library not in your target language, a binding/wrapper is the most common way. For example, HTTP and TLS protocols are easy, so there are python-requests and python-tls. On the other hand, GUI systems are complex, so there are PyQt, PyGTK, and CPython's builtin tkinter module, but I have never seen a GUI toolkit in pure Python. At first
Changes in youtube-dl should solve real problems. Of course it's fun, but this is not the place.
I have no plan to reject your pull request. I'm asking your goal so that I can determine whether a pull request is ready or not. As you've said, your goal is a complete JS engine, so you can continue your work. When it's almost done (for example pass most of Spidermonkey's test suite), we can come back and start reviewing it. |
@yan that's great by the way , i just suggested different approach of the fix . I would be ready to help if required , engine for mozilla is not gecko anymore , the engine could be written in node js as it support nAtive apps too. |
Thanks, @siddht1 it was actually helpful. After studying narcissus and other engines, and reading the specs I've realized I can't continue without first designing this. So after doing that right now I need to tear almost everything down and redo it. Hopefully it'll go faster second time. |
@siddht1 Sry, neither can I. |
Js2Py is another option which claims to support ECMA 5
|
Js2Py looks so far so good. The only two issues I found:
|
` Js2Py is another option which claims to support ECMA 5
Js2Py looks so far so good. The only two issues I found:
` according to me this is just the part of the puzzle , maybe an engine should be develop first similar to creating cobbler in java. js2py lacks many components but other similar project can help to lessen the gap. what everbody fails to see is that the flow will break youtube-dl -----> in js engine -------> out js engine -----> site site ----> js engine (fails to parse if gets inbound , where it`s suppose to get outbound) fix youtube-dl <-----> js engine #1 (inbound) <----------------------------- site (Send) |
sorry the diagram didnt came out as expected , wait this is what i wanted to suggest |
Funny thing, starting this I's hoping to discuss how to do this not why I shouldn't. Yes, existing solutions can be very help full, but only to a certain point. Right now I need to implement the parser, using the tokens. Yet, I might do some further minor changes on the lexer first, e.g. reserved words can get their collective token id since value and id would be an injective relation otherwise. Hopefully after I start working on the parser it'll start to show some promising signs. |
Can some one enlighten me what this means in the specs:
Is this just trying to say that parentheses are resolved? Because I couldn't find any clue to that. |
Did you mean section 11.14? That describes the order of evaluation of expressions involving a comma operator. |
Yes, among others, like VariableDeclarationNoIn and RelationalExpressionNoIn, later has an even more confusing note:
|
I guess NoIn variants are for simpler grammars - you don't need to peek so many tokens when building a lookahead LL parser. |
I'm thinking shunting-yard algorithm for assignment/conditional expression just to simply reduce number of methods. Looking at other interpreters' source it seams ast is preferred. I'd love to hear pros and cons. |
That sounds fine. The only concern is error reporting for invalid expressions. In youtube-dl it's OK to assume all inputs are valid. |
Grouping will still be handled by expression and conditional expression also will have it's own token. Therefore error handling can be done properly, in my opinion. |
Another thing, I'm thinking about refactoring. A separate grammar/tstream module would be nice in it's own package, like:
and in
I believe that would be backward compatible too. |
Refactoring is the way to go. But is there a need to expose TokenStream? I thought it's used internally in JSInterpreter only. |
Sure, that's a valid question I haven't think I got an answer to. My thoughts were, that one might want to have a tokenizer for the grammar. Other thing I was thinking is error handling. I don't see it as a single function, therefore I don't know how to not do it. If that makes any sense. |
As a partial parser in place there's only the interpreter left to implement in order to get to the milestone I've mentioned before. I might still under estimate the complexity of the remaining work, but I've started thinking about testing and adding dynamically SpiderMonkey's test to the testcases. |
Thanks for that! Could you add some more tests first? Now #11272 is big enough and I guess it's fragile to refactoring. |
Probably it could be made much more robust by introducing a Token and/or an ASTree class. There were some testcases (e.g. instantiation, from top of my head), I was wondering, while implementing things, how it would perform against. I would love some suggestions, thou. There's quite some TODO before it passes the current testcases, I've just wanted to put out the idea of dynamic testcases. I'm not sure how to do it, but I hope reading the SpiderMonkey documentation will help. |
I've just added parser test. Subsequently got stuck with interpreter and as result made a mess.
Reference._parent would be either local_vars (top-level) or array or object literal and Reference._key would be identifier, index, and property or method name respectably. Instead of storing values they would store Reference instances. |
Any idea how to get comparing zip objects work in python3 (in a nested list by uinttest)? |
Is there anything wrong in python3's zip? |
I see. As performance is not critical in tests, you may want to just transform everything into |
I think that'd have issue with |
Oops. Just ignore my previous comment |
|
|
Do you want to clone the whole mozilla-central source tree? Please don't do that. There's no need to keep up-to-date with Firefox (For example it's going to support ES2017 while there's no need to support it in youtube-dl now.), so just copy files is fine. |
No, I definitely don't want to clone the whole mozilla-central. So, you say I shouldn't add anything dynamically? Just cherry-pick some of them and dump them in test_jsinterp_parser? |
IIRC unlike SVN, mercurial does not allow cloning partial files. What youtube-dl needs are those tests. If you know a way to sync only those test files, go ahead. Note that extra dependencies should affect only tests/, not youtube_dl/ |
How about a linux native spider like |
I guess you want to download Mozilla's test suite each time test_jsinterp invoked? I don't think it's a good idea as there are thousands of files. |
I don't know. I'd much rather extract some useful test from the testcases and run those through unittest, but I don't think I can do that. The only other option I see is to add tests to existing ones manually based on mozilla's tests. |
Oops I should leave this comment here: #11272 (comment) |
Dumping a list of links to mozilla testcases in a file on building tests might also be a good practice. |
As long as it does not take too long time, it fine. |
I've
|
Does it take 16 minutes every time when running |
If anybody is interested I'd like to share some side product, I've created while studying the specs. Can't attach them for some reason, so: |
I'm wondering what @phihag thinks about this. |
I've reworked the test suite. It does not support adding tests from any other suites, but hopefully it's pretty straight forward adding new ones and use them to test either or both interpretation and parsing. |
I'm a little bit stuck with designing built-ins. |
In the parsing stage I guess built-ins are not different than other things? |
Sry, haven't seen your post, till now and I also needed a break from it to rehash a bit. I've passed parsing. Parsing is done. Might need some loving later, minor features and possible refracting into it's own class (and module). I've redone the test suite in order to be able to run tests of parsing an interpreting flexibly on the same script codes, because I had to move on implementing the interpreter at the first place. Otherwise it would have got ugly fast if I hadn't sorted that out, before starting to work on that. As the built-ins I've been thinking and I'll probably keep the I've started working on function calls, but I think the context stack is not working properly. Doing |
It's probably quite mute, yet I have to point out that the fix for #11663 and #11664 committed by @dstftw is a bit baffling to me. |
Please follow the guide below
x
into all the boxes [ ] relevant to your issue (like that [x])Make sure you are using the latest version: run
youtube-dl --version
and ensure your version is 2016.11.22. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
Description of your issue, suggested solution and other information
I think JSInterpreter has rather limited features. I've started to implement some new ones at #11272.
To do so I'm using a syntax grammar and actual parsing.
I wouldn't mind some feed back, help and testcases or just merely have a discussion about it.
The text was updated successfully, but these errors were encountered: