Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New JSInterpreter Features #11292

Closed
4 of 8 tasks
sulyi opened this issue Nov 24, 2016 · 60 comments
Closed
4 of 8 tasks

New JSInterpreter Features #11292

sulyi opened this issue Nov 24, 2016 · 60 comments

Comments

@sulyi
Copy link

sulyi commented Nov 24, 2016

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like that [x])
  • Use Preview tab to see how your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2016.11.22. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2016.11.22

Before submitting an issue make sure you have:

  • At least skimmed through README and most notably FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

Description of your issue, suggested solution and other information

I think JSInterpreter has rather limited features. I've started to implement some new ones at #11272.
To do so I'm using a syntax grammar and actual parsing.
I wouldn't mind some feed back, help and testcases or just merely have a discussion about it.

@yan12125
Copy link
Collaborator

Actual parsing is complicated as Javascript is not a good language (neither is Python lol) Is there a need?

Personally I'm against such changes as that sounds like re-inventing wheels

@sulyi
Copy link
Author

sulyi commented Nov 25, 2016

Yes there is a need. I'd like to do it.
When you say wheel are you referring to jaspyon or pynarcissus?
I think building an interpreter from scratch has it's own benefits and really not that complicated.
--- edit ---
I'm rather worried how efficient it can be made. With some clever solutions it'll be ok hopefully.

@yan12125
Copy link
Collaborator

When you say wheel are you referring to jaspyon or pynarcissus?

No. I was referring to SpiderMonkey of Gecko, V8 of Blink, JavascriptCore of Webkit and ChakraCore of IE/Edge, and maybe other JS engines used in popular browsers.

Yes there is a need

Could you give some concrete examples (website, etc.). I'm not sure what should be concerned in #11272. Without limited targets, I'll review it as a real complete JS engine.

Well, at least obfuscated codes like #8489 (comment) (from openload) and iqiyi's login SDK should be supported:

$ curl "http://kylin.iqiyi.com/get_token" | jq .sdk
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6309    0  6309    0     0  30814      0 --:--:-- --:--:-- --:--:-- 30775
"eval(function(p,a,c,k,e,r){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--)r[e(c)]=k[c]||e(c);k=[function(e){return r[e]}];e=function(){return'\\\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\\\b'+e(c)+'\\\\b','g'),k[c]);return p}('j J=q(){j r=q(r,n){B r<<n|r>>>32-n};j n=q(r,n){j t,a,o,e,u;o=r&1j;e=n&1j;t=r&1c;a=n&1c;u=(r&1u)+(n&1u);O(t&a){B u^1j^o^e}O(t|a){O(u&1c){B u^2R^o^e}Q{B u^1c^o^e}}Q{B u^o^e}};j t=q(r,n,t){B r&n|~r&t};j a=q(r,n,t){B r&t|n&~t};j o=q(r,n,t){B r^n^t};j e=q(r,n,t){B n^(r|~t)};j u=q(a,o,e,u,i,v,c){a=n(a,n(n(t(o,e,u),i),c));B n(r(a,v),o)};j i=q(t,o,e,u,i,v,c){t=n(t,n(n(a(o,e,u),i),c));B n(r(t,v),o)};j v=q(t,a,e,u,i,v,c){t=n(t,n(n(o(a,e,u),i),c));B n(r(t,v),a)};j c=q(t,a,o,u,i,v,c){t=n(t,n(n(e(a,o,u),i),c));B n(r(t,v),a)};j f=q(r){j n;j t=r.E;j a=t+8;j o=(a-a%1r)/1r;j e=(o+1)*16;j u=1d(e-1);j i=0;j v=0;2E(v<t){n=(v-v%4)/4;i=v%4*8;u[n]=u[n]|r.1n(v)<<i;v++}n=(v-v%4)/4;i=v%4*8;u[n]=u[n]|19<<i;u[e-2]=t<<3;u[e-1]=t>>>29;B u};j d=q(r){j n=\"\",t=\"\",a,o;K(o=0;o<=3;o++){a=r>>>o*8&2M;t=\"0\"+a.1s(16);n=n+t.2W(t.E-2,2)}B n};j g=q(r){r=r.1D(/\\\\1H\\\\2C/g,\"\\\\n\");j n=\"\";K(j t=0;t<r.E;t++){j a=r.1n(t);O(a<19){n+=N.M(a)}Q O(a>2I&&a<2K){n+=N.M(a>>6|2N);n+=N.M(a&1h|19)}Q{n+=N.M(a>>12|2X);n+=N.M(a>>6&1h|19);n+=N.M(a&1h|19)}}B n};B q(r){r+=\"\";j t=1d();j a,o,e,l,p,s,h,m,C;j k=7,z=12,Z=17,b=22;j I=5,S=9,w=14,A=20;j R=4,U=11,y=16,x=23;j D=6,L=10,T=15,1a=21;r=g(r);t=f(r);s=2Z;h=30;m=31;C=35;K(a=0;a<t.E;a+=16){o=s;e=h;l=m;p=C;s=u(s,h,m,C,t[a+0],k,39);C=u(C,s,h,m,t[a+1],z,1v);m=u(m,C,s,h,t[a+2],Z,1w);h=u(h,m,C,s,t[a+3],b,1x);s=u(s,h,m,C,t[a+4],k,1y);C=u(C,s,h,m,t[a+5],z,1z);m=u(m,C,s,h,t[a+6],Z,1A);h=u(h,m,C,s,t[a+7],b,1B);s=u(s,h,m,C,t[a+8],k,1C);C=u(C,s,h,m,t[a+9],z,3f);m=u(m,C,s,h,t[a+10],Z,1E);h=u(h,m,C,s,t[a+11],b,1F);s=u(s,h,m,C,t[a+12],k,1G);C=u(C,s,h,m,t[a+13],z,1I);m=u(m,C,s,h,t[a+14],Z,1J);h=u(h,m,C,s,t[a+15],b,1K);s=i(s,h,m,C,t[a+1],I,1L);C=i(C,s,h,m,t[a+6],S,1M);m=i(m,C,s,h,t[a+11],w,1N);h=i(h,m,C,s,t[a+0],A,1O);s=i(s,h,m,C,t[a+5],I,1P);C=i(C,s,h,m,t[a+10],S,1Q);m=i(m,C,s,h,t[a+15],w,1R);h=i(h,m,C,s,t[a+4],A,1S);s=i(s,h,m,C,t[a+9],I,1T);C=i(C,s,h,m,t[a+14],S,1U);m=i(m,C,s,h,t[a+3],w,1V);h=i(h,m,C,s,t[a+8],A,1W);s=i(s,h,m,C,t[a+13],I,1X);C=i(C,s,h,m,t[a+2],S,1Y);m=i(m,C,s,h,t[a+7],w,1Z);h=i(h,m,C,s,t[a+12],A,24);s=v(s,h,m,C,t[a+5],R,25);C=v(C,s,h,m,t[a+8],U,26);m=v(m,C,s,h,t[a+11],y,27);h=v(h,m,C,s,t[a+14],x,28);s=v(s,h,m,C,t[a+1],R,2a);C=v(C,s,h,m,t[a+4],U,2b);m=v(m,C,s,h,t[a+7],y,2c);h=v(h,m,C,s,t[a+10],x,2d);s=v(s,h,m,C,t[a+13],R,2e);C=v(C,s,h,m,t[a+0],U,2f);m=v(m,C,s,h,t[a+3],y,2g);h=v(h,m,C,s,t[a+6],x,2h);s=v(s,h,m,C,t[a+9],R,2i);C=v(C,s,h,m,t[a+12],U,2j);m=v(m,C,s,h,t[a+15],y,2k);h=v(h,m,C,s,t[a+2],x,2l);s=c(s,h,m,C,t[a+0],D,2m);C=c(C,s,h,m,t[a+7],L,2n);m=c(m,C,s,h,t[a+14],T,2o);h=c(h,m,C,s,t[a+5],1a,2p);s=c(s,h,m,C,t[a+12],D,2q);C=c(C,s,h,m,t[a+3],L,2r);m=c(m,C,s,h,t[a+10],T,2s);h=c(h,m,C,s,t[a+1],1a,2t);s=c(s,h,m,C,t[a+8],D,2u);C=c(C,s,h,m,t[a+15],L,2v);m=c(m,C,s,h,t[a+6],T,2w);h=c(h,m,C,s,t[a+13],1a,2x);s=c(s,h,m,C,t[a+4],D,2y);C=c(C,s,h,m,t[a+11],L,2z);m=c(m,C,s,h,t[a+2],T,2A);h=c(h,m,C,s,t[a+9],1a,2B);s=n(s,o);h=n(h,e);m=n(m,l);C=n(C,p)}j X=d(s)+d(h)+d(m)+d(C);B X.2D()}}();j F={G:q(r){B q(r,n){B q(r){B{H:r}}(q(t){j a,o=0;K(j e=r;o<t[\"E\"];o++){j u=n(t,o);a=o===0?u:a^u}B a?e:!e})}(q(n,t,a,o){j e=2G;j u=o(t,a)-n(r,e);B 2H}(18,2J,q(r){B(\"\"+r)[\"P\"](1,(r+\"\")[\"E\"]-1)}(\"2L\"),q(r,n){B(1p r)[n]()}),q(r,n){j t=18(r[\"1q\"](n),16)[\"1s\"](2);B t[\"1q\"](t[\"E\"]-1)})}(\"2O\")};j 2P=q(r){j n=1p 1d;j t;O(r&&r.E>0){j a=r.W(\"*\");K(t=0;t<a.E-1;t++){2Q(t%3){1e 0:n+=N.M(18(a[t],8));Y;1e 1:n+=N.M(18(a[t],10));Y;1e 2:n+=N.M(18(a[t],16));Y}}B n}Q{B\"\"}};q 1m(r,n){j t=q(){r=J(r)};j a=q(){o(r.E,32)?2S():\"\";j t=F.G.H(\"2T\")?n.W(\".\"):4;j a=q(){r=J(r)};j e=F.G.H(\"2U\")?[]:0;j u=F.G.H(\"2V\")?0:8};j o=q(r,n){B r!=n};r+=\"\";1f{j e=q(){r=J(r)};1l&&1b?e():\"\"}1i(u){r=r+\"2Y\"}o(r.E,32)?t():\"\";j i=F.G.H(\"4\")?n.W(\".\"):\"\";K(j v=0;v<i.E;v++){r+=i[v]%7}B r}V=1m(V,1k);q 1t(r,n){j t=q(){r=J(r)};j a=q(r,n){B r!=n};r+=\"\";j o=q(){c+=r.P(g,r.E)};1f{j e=q(){r=J(r)};1l&&1b?e():\"\"}1i(u){r=r+\"1b\"}a(r.E,32)?t():\"\";j i=F.G.H(\"33\")?4:n.W(\".\");j v=F.G.H(\"34\")?\"1g\":[];j c=F.G.H(\"36\")?\"\":\".\";j f=F.G.H(\"37\")?32:0;K(j d=0;d<i.E;d++){v.38(i[d]%10)}K(j g=0;g<r.E;g+=5){j l=q(r,n){B r<n};O(l(f,4)){c+=r.P(g,g+5)+v[f];f++}Q{c+=r.P(g,r.E);Y}}B c;j p=q(){c+=r.P(g,r.E);j n=q(){r=J(r)};r=J(r)}}V=1t(V,1k);q 1o(r,n){j t=q(){r=J(r)};j a=q(r,n){B r!=n};r+=\"\";1f{j o=q(){r=J(r)};3a&&3b.1g?o():\"\"}1i(e){r=r+\"1g\"}j u=q(){r=r+\"1b\";1l&&1b?3c():\"\";a(r.E,32)?t():\"\"};a(r.E,32)?t():\"\";j i=F.G.H(\"3d\")?\"\":n.W(\".\");j v=F.G.H(\"3e\")?\"\":2F;j c=F.G.H(\"c\")?0:\".\";K(j f=0;f<r.E;f+=4){j d=q(r,n){B r<n};O(d(c,4)){v+=r.P(f,f+4)+i[c];c++}Q{v+=r.P(f,r.E);Y}}B v}V=1o(V,1k);',62,202,'|||||||||||||||||||var|||||||function|||||||||||return|||length|k0touZ|z0|p0||md5|for||fromCharCode|String|if|substring|else|||||input|split||break||||||||||parseInt|128|_|navigator|1073741824|Array|case|try|decodeURI|63|catch|2147483648|ip|location|mod7|charCodeAt|split4|new|charAt|64|toString|split5|1073741823|3905402710|606105819|3250441966|4118548399|1200080426|2821735955|4249261313|1770035416|replace|4294925233|2304563134|1804603682|x0d|4254626195|2792965006|1236535329|4129170786|3225465664|643717713|3921069994|3593408605|38016083|3634488961|3889429448|568446438|3275163606|4107603335|1163531501|2850285829|4243563512|1735328473|||||2368359562|4294588738|2272392833|1839030562|4259657740||2763975236|1272893353|4139469664|3200236656|681279174|3936430074|3572445317|76029189|3654602809|3873151461|530742520|3299628645|4096336452|1126891415|2878612391|4237533241|1700485571|2399980690|4293915773|2240044497|1873313359|4264355552|2734768916|1309151649|4149444226|3174756917|718787259|3951481745|x0a|toLowerCase|while|100|785|true|127|Date|2048|_getTime2|255|192|ecg6mf6ar|Decode|switch|3221225472|m1hIQ|b93|947d|e36|substr|224|locationnavigator|1732584193|4023233417|2562383102||c8|f4c3|271733878|7167|7f|push|3614090360|document|window|ZifVJ|253e|8af6|2336552879'.split('|'),0,{}));"

@sulyi
Copy link
Author

sulyi commented Nov 26, 2016

I'll need to look into your suggestions, @yan12125, more carefully, but I don't think any of the engines you mentioned is written in py.

Jaspyon can be found at https://bitbucket.org/santagada/jaspyon/ and pynarcissus at https://github.com/jtolds/pynarcissus, these are the JavaScript interpreters in py that I found.

As the scope I'd say at first I'd aim to support all the code supported right now. That should be plenty to see how it goes. And a good base to build on and widen the support if it succeeds. My ultimate goal is indeed a complete JS engine, but saying out loud that does sound a bit overly ambitious at the moment.

@yan12125
Copy link
Collaborator

I don't think any of the engines you mentioned is written in py

Yes they are implemented with C/C++/Objective-C. If your goal is a complete JS engine, they are excellent choices. Python wrappers for those existing engines are much easier to write and maintain in comparison with a new Python implementation from scratch. Could you share the idea for why a pure Python implementation is necessary?

@sulyi
Copy link
Author

sulyi commented Nov 26, 2016

At the end of the day, it's up to the maintainers to decide if this is able to serve their needs or not.

As far as platform independence goes it's better not having native code or third party modules.

Implementing an JavaScript interpreter seams rather a whole lot of fun/challenge/practice/good reference...

Also why not?

@yan12125
Copy link
Collaborator

@siddht1 What @sulyi wants is a Javascript interpreter in youtube-dl that handles scrips on web pages, not youtube downloaders written in Javascript. They are different.

As far as platform independence goes it's better not having native code or third party modules.

That depends. If there are already excellent solutions for a complex task, few people will re-implement them beyond the scope of toy projects. If you need a library not in your target language, a binding/wrapper is the most common way. For example, HTTP and TLS protocols are easy, so there are python-requests and python-tls. On the other hand, GUI systems are complex, so there are PyQt, PyGTK, and CPython's builtin tkinter module, but I have never seen a GUI toolkit in pure Python. At first JSInterpreter is implemented for handling signature decryption functions on YouTube, which are naive Javascript codes. As it's simple, a pure Python implementation is the best choice. On the other hand, bridging existing JS engines is better if you need all Javascript features, in terms of development and maintaining difficulty as well as performance.

Implementing an JavaScript interpreter seams rather a whole lot of fun/challenge/practice/good reference...

Changes in youtube-dl should solve real problems. Of course it's fun, but this is not the place.

Also why not?

I have no plan to reject your pull request. I'm asking your goal so that I can determine whether a pull request is ready or not. As you've said, your goal is a complete JS engine, so you can continue your work. When it's almost done (for example pass most of Spidermonkey's test suite), we can come back and start reviewing it.

@siddht4
Copy link

siddht4 commented Nov 27, 2016

@yan that's great by the way , i just suggested different approach of the fix . I would be ready to help if required , engine for mozilla is not gecko anymore , the engine could be written in node js as it support nAtive apps too.

@sulyi
Copy link
Author

sulyi commented Nov 28, 2016

Thanks, @siddht1 it was actually helpful. After studying narcissus and other engines, and reading the specs I've realized I can't continue without first designing this. So after doing that right now I need to tear almost everything down and redo it. Hopefully it'll go faster second time.
--- edit ---
Basic concept:
The tokens of lexical grammar will be a dictionary of regular expression strings, the production of syntactic grammar will be described by another dictionary containing lists of token ids. The lexer would use both dictionaries compile the actual regex used to match tokens, and return them by statements as a list. Than the parser would use the second dictionary to decide how should be these tokens interpreted according to the syntactic grammar.

@siddht4
Copy link

siddht4 commented Nov 28, 2016

@yan @sulyi i am unable to view my first comment. if you have it kindly repost it or atleast mail me

@sulyi
Copy link
Author

sulyi commented Nov 28, 2016

@siddht1 Sry, neither can I.

@mozbugbox
Copy link
Contributor

Js2Py is another option which claims to support ECMA 5

Pure Python JavaScript Translator/Interpreter

Everything is done in 100% pure Python so it's extremely easy to install and use. Supports Python 2 & 3. Full support for ECMAScript 5.1, ECMA 6 support is still experimental.

@yan12125
Copy link
Collaborator

yan12125 commented Nov 28, 2016

Js2Py looks so far so good. The only two issues I found:

  1. No support for defining functions like what I can do in Spidermonkey. pyimport seems a way but I'd like cleaner approaches. It's better to have for PAC support (how to configure youtebe-dl when using PAC on windows 7 #8278)
  2. It generates Python codes and run it. Sounds like a security weakness

@siddht4
Copy link

siddht4 commented Nov 28, 2016

`

Js2Py is another option which claims to support ECMA 5

Pure Python JavaScript Translator/Interpreter

Everything is done in 100% pure Python so it's extremely easy to install and use. Supports Python 2 & 3. Full support for ECMAScript 5.1, ECMA 6 support is still experimental.

Js2Py looks so far so good. The only two issues I found:

No support for defining functions like what I can do in Spidermonkey. pyimport seems a way but I'd like cleaner approaches. It's better to have for PAC support (#8278)
It generates Python codes and run it. Sounds like a security weakness

`

according to me this is just the part of the puzzle , maybe an engine should be develop first similar to creating cobbler in java.

js2py lacks many components but other similar project can help to lessen the gap.

what everbody fails to see is that the flow will break

youtube-dl -----> in js engine -------> out js engine -----> site

site ----> js engine (fails to parse if gets inbound , where it`s suppose to get outbound)

fix

youtube-dl <-----> js engine #1 (inbound) <----------------------------- site (Send)
| /
| /
-------> js engine #2- (outbound)----> site (GET) /

@siddht4
Copy link

siddht4 commented Nov 28, 2016

sorry the diagram didnt came out as expected , wait this is what i wanted to suggest

@sulyi
Copy link
Author

sulyi commented Nov 29, 2016

Funny thing, starting this I's hoping to discuss how to do this not why I shouldn't. Yes, existing solutions can be very help full, but only to a certain point. Right now I need to implement the parser, using the tokens. Yet, I might do some further minor changes on the lexer first, e.g. reserved words can get their collective token id since value and id would be an injective relation otherwise. Hopefully after I start working on the parser it'll start to show some promising signs.

@sulyi
Copy link
Author

sulyi commented Nov 30, 2016

Can some one enlighten me what this means in the specs:

The ExpressionNoIn production is evaluated in the same manner as the Expression production except that the contained ExpressionNoIn and AssignmentExpressionNoIn are evaluated instead of the contained Expression and AssignmentExpression, respectively.

Is this just trying to say that parentheses are resolved? Because I couldn't find any clue to that.

@yan12125
Copy link
Collaborator

Did you mean section 11.14? That describes the order of evaluation of expressions involving a comma operator.

@sulyi
Copy link
Author

sulyi commented Nov 30, 2016

Yes, among others, like VariableDeclarationNoIn and RelationalExpressionNoIn, later has an even more confusing note:

The “NoIn” variants are needed to avoid confusing the in operator in a relational expression with the in operator in a for statement.

@yan12125
Copy link
Collaborator

I guess NoIn variants are for simpler grammars - you don't need to peek so many tokens when building a lookahead LL parser.

@sulyi
Copy link
Author

sulyi commented Dec 3, 2016

I'm thinking shunting-yard algorithm for assignment/conditional expression just to simply reduce number of methods. Looking at other interpreters' source it seams ast is preferred. I'd love to hear pros and cons.

@yan12125
Copy link
Collaborator

yan12125 commented Dec 3, 2016

That sounds fine. The only concern is error reporting for invalid expressions. In youtube-dl it's OK to assume all inputs are valid.

@sulyi
Copy link
Author

sulyi commented Dec 3, 2016

Grouping will still be handled by expression and conditional expression also will have it's own token. Therefore error handling can be done properly, in my opinion.
Having implemented it, I think it's kinda' the same, only difference is that shunting-yard uses a local operator stack instead the callstack. Interpretation might have to be a little bit different, tho. At the current state that's broken and I'm a bit nervous about it, but it's getting there.

@sulyi
Copy link
Author

sulyi commented Dec 3, 2016

Another thing, I'm thinking about refactoring. A separate grammar/tstream module would be nice in it's own package, like:

youtube_dl
 |
 +-- jsinterp
      |
      +-- __init__.py
      |
      +-- jsinterp.py
      |
      +-- <grammar.py>
      |
      +-- tstream.py 

and in __init__.py:

from .jsinterp import JSInterpreter
from .tstream import TokenStream

 __all__ = ['JSInterpreter','TokenStream']

I believe that would be backward compatible too.

@yan12125
Copy link
Collaborator

yan12125 commented Dec 4, 2016

Refactoring is the way to go. But is there a need to expose TokenStream? I thought it's used internally in JSInterpreter only.

@sulyi
Copy link
Author

sulyi commented Dec 4, 2016

Sure, that's a valid question I haven't think I got an answer to. My thoughts were, that one might want to have a tokenizer for the grammar.

Other thing I was thinking is error handling. I don't see it as a single function, therefore I don't know how to not do it. If that makes any sense.
Anyway, might worth discussing more how error-reporting should happen.

@sulyi
Copy link
Author

sulyi commented Dec 5, 2016

As a partial parser in place there's only the interpreter left to implement in order to get to the milestone I've mentioned before. I might still under estimate the complexity of the remaining work, but I've started thinking about testing and adding dynamically SpiderMonkey's test to the testcases.

@yan12125
Copy link
Collaborator

yan12125 commented Dec 5, 2016

Thanks for that! Could you add some more tests first? Now #11272 is big enough and I guess it's fragile to refactoring.

@sulyi
Copy link
Author

sulyi commented Dec 6, 2016

Probably it could be made much more robust by introducing a Token and/or an ASTree class.

There were some testcases (e.g. instantiation, from top of my head), I was wondering, while implementing things, how it would perform against. I would love some suggestions, thou.

There's quite some TODO before it passes the current testcases, I've just wanted to put out the idea of dynamic testcases. I'm not sure how to do it, but I hope reading the SpiderMonkey documentation will help.

@sulyi
Copy link
Author

sulyi commented Dec 8, 2016

I've just added parser test. Subsequently got stuck with interpreter and as result made a mess.
I'm thinking a Reference helper class and a context stack for JSInterpreter would be helpful.

class Reference(object):
    def __init__(self, name, value, parent_key=None):
        self._type = name
        self._value = value
        if parent_key is not None:
            self._parent, self._key = parent_key
       else:
            self._parent, self._key = None, None
   
    def getvalue(self):
        return self._value

    def putvalue(self, value):
        if self._parent is not None and self._key is not None:
            self._value = value
            self._parent.__setitem__(self._key, value)
        else:
            raise ExtractorError('''Reference type %s is read-only''' % self._type)

     def delete(self):
         self._value = undefined
         self._type = None
         if self._parent is not None and self._key is not None:
             self._parent.__delitem__(self._key)
         # No need for error report here!

Reference._parent would be either local_vars (top-level) or array or object literal and Reference._key would be identifier, index, and property or method name respectably.

Instead of storing values they would store Reference instances.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

Any idea how to get comparing zip objects work in python3 (in a nested list by uinttest)?
[...]/test_jsinterp_parser.py#L106
[...]/test_jsinterp_parser.py#L170
[...]/test_jsinterp_parser.py#L310
[...]/test_jsinterp_parser.py#L371
...

@yan12125
Copy link
Collaborator

yan12125 commented Dec 10, 2016

Is there anything wrong in python3's zip?

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

Solved it, kinda! With traverse. Yet, it's still a generator.
I'm not very keen on type(o) == zip check, thou.

@yan12125
Copy link
Collaborator

I see. As performance is not critical in tests, you may want to just transform everything into lists.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

I think that'd have issue with operators and generators still get empty Even with itertools.tee.

@yan12125
Copy link
Collaborator

Oops. Just ignore my previous comment

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

copy.deepcopy is the only way to compare a generator twice, without converting it to list before hand.
I think I like that a lot.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

For getting SpiderMonkey tests can I use hglib package? It seams that it's for local repos. More over hg does not support narrow clone. I think a small spider needs to be implemented to ad these tests dynamically.
-- edit --
Looking at the test, those does not seam very well applicable.

@yan12125
Copy link
Collaborator

Do you want to clone the whole mozilla-central source tree? Please don't do that. There's no need to keep up-to-date with Firefox (For example it's going to support ES2017 while there's no need to support it in youtube-dl now.), so just copy files is fine.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

No, I definitely don't want to clone the whole mozilla-central. So, you say I shouldn't add anything dynamically? Just cherry-pick some of them and dump them in test_jsinterp_parser?

@yan12125
Copy link
Collaborator

IIRC unlike SVN, mercurial does not allow cloning partial files. What youtube-dl needs are those tests. If you know a way to sync only those test files, go ahead. Note that extra dependencies should affect only tests/, not youtube_dl/

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

How about a linux native spider like curl or wget?
-- edit --
This seams to work fine:
wget -np -r -e robots=off --accept='*.js' https://hg.mozilla.org/mozilla-central/file/tip/js/src/tests/ecma_5/
Adding a -nd even dumps it in a single directory but there's alot of shell.js and browser.js (due to their testing framework) those get an extra number extension in their names.

@yan12125
Copy link
Collaborator

I guess you want to download Mozilla's test suite each time test_jsinterp invoked? I don't think it's a good idea as there are thousands of files.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

I don't know. I'd much rather extract some useful test from the testcases and run those through unittest, but I don't think I can do that. The only other option I see is to add tests to existing ones manually based on mozilla's tests.

@yan12125
Copy link
Collaborator

Oops I should leave this comment here: #11272 (comment)

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

Dumping a list of links to mozilla testcases in a file on building tests might also be a good practice.

@yan12125
Copy link
Collaborator

As long as it does not take too long time, it fine.

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

I've time it the wget crawl at home:

real	16m52.314s
user	3m38.500s
sys	0m1.940s

@yan12125
Copy link
Collaborator

Does it take 16 minutes every time when running python test/test_jsinterp.py or there's a local copy and it's necessary only when syncing tests from Mozilla?

@sulyi
Copy link
Author

sulyi commented Dec 10, 2016

If anybody is interested I'd like to share some side product, I've created while studying the specs.
These are the grammar I've extracted in EBNF notation and some syntax diagrams created by http://www.bottlecaps.de/rr/ui from it.

Can't attach them for some reason, so:
https://gist.github.com/sulyi/15674f4802503d81711b015a05faae46

@sulyi
Copy link
Author

sulyi commented Dec 11, 2016

I'm wondering what @phihag thinks about this.

@sulyi
Copy link
Author

sulyi commented Dec 15, 2016

I've reworked the test suite. It does not support adding tests from any other suites, but hopefully it's pretty straight forward adding new ones and use them to test either or both interpretation and parsing.
If anybody planing on adding new tests to help out which would be nice, please check it out or make a better one, just also integrate the current ones.

@sulyi
Copy link
Author

sulyi commented Dec 17, 2016

I'm a little bit stuck with designing built-ins.

@yan12125
Copy link
Collaborator

In the parsing stage I guess built-ins are not different than other things?

@sulyi
Copy link
Author

sulyi commented Dec 19, 2016

Sry, haven't seen your post, till now and I also needed a break from it to rehash a bit.

I've passed parsing. Parsing is done. Might need some loving later, minor features and possible refracting into it's own class (and module).

I've redone the test suite in order to be able to run tests of parsing an interpreting flexibly on the same script codes, because I had to move on implementing the interpreter at the first place. Otherwise it would have got ugly fast if I hadn't sorted that out, before starting to work on that.

As the built-ins I've been thinking and I'll probably keep the Reference class and use it as a "wrapping", and recreate the inheritance tree of javascript objects in a separate module and use those as values. There might be a need for a dict to lookup JS properties in each class, but first I'll try to do it using hasattr or __dict__.

I've started working on function calls, but I think the context stack is not working properly. Doing update on globals with local_vars when context_push, and remove the difference on context_pop might solve it, but that likely has an issue with shadowing names.

@sulyi
Copy link
Author

sulyi commented Jan 25, 2017

It's probably quite mute, yet I have to point out that the fix for #11663 and #11664 committed by @dstftw is a bit baffling to me.
Let's take the following three independent js expressions:
a = 42
this.a = 42
var a = 42
first two are equivalent, the third, beside that it sets the return value to undefined while the others to 42, only differs that the [[Configurable]] internal property of a is false instead of true.
The practical consequence of this is, when running delete a in the first two cases the result is true while in the third it's false, but not much else.
In all three cases this.a === a is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants