Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we discuss how to update test262? #817

Closed
morfav opened this issue Jan 1, 2021 · 29 comments · Fixed by #930
Closed

Can we discuss how to update test262? #817

morfav opened this issue Jan 1, 2021 · 29 comments · Fixed by #930

Comments

@morfav
Copy link

morfav commented Jan 1, 2021

Dear maintainers,

I thought it may be a good idea to update test262 to its current point.

It's taken me a lot longer (many hours) than I thought it would. The following issues cropped up:

  1. test262 has introduced extra "helper" (harness) files which use syntax that rhino does not understand (3-4 such files). It's necessary to create a custom version of these that rhino would run (or use a previous version of these files if it's an existing one). Do you think it's viable that these custom versions will be maintained with potentially further divergence in future?
  2. There are a lot of new, and some changed tests. Some of these pass, while others fail. It's a VERY time-consuming task to update the test262.properties file. I added all the new tests from git history to be excluded, however there were also quite a few (order of 1000) that pass, so need to be located and removed from the exclusions.
  3. A couple of new tests seem to never complete. To overcome this, I added a new status of "hard" exclude, which prevents them from being run (since currently tests marked to be excluded with an exclamation mark are still run, which causes the tests to hang).

I feel like it may make this whole process easier if we change the approach to running these tests. Namely, keep a version-controlled list of tests that PASS (rather than what is currently done, which is keep a list of failing tests).
When fixing bugs, it would be possible to add to the list of passing tests.
When syncing with the test262 repo, we have a different mode which runs all the non-passing tests to see which ones pass and can be (automatically) included (we still need a hard exclude to prevent running the ones that never complete). This mode can also be triggered to search for tests that are passing following commits to rhino. One potential complication would be that currently we can include whole directories, so would need to see how this fits with the new approach.

Please let me know what you think about the above, and whether there is an easier way to maintain test262 in sync.

Thanks!

@tuchida
Copy link
Contributor

tuchida commented Jan 28, 2021

Could you please show me the rhino branch with the updated test262?
I'm currently implementing this in BigInt for ES2020, but I'm having trouble with a lack of test cases.
(I make it as a hobby, so don't expect it.)
master...tuchida:es2020-bigint

@gbrail
Copy link
Collaborator

gbrail commented Jan 28, 2021 via email

@tuchida
Copy link
Contributor

tuchida commented Feb 12, 2021

I implemented the Exponentiation Operator (**) and test262 is now very slow.
The reason is that the array length limit check test is now enabled.
https://github.com/tc39/test262/blob/041da54c02ae7d17edb8c134ab7691c4f643bafe/test/built-ins/Array/prototype/reverse/length-exceeding-integer-limit-with-object.js#L27

Some of this is due to bugs in Rhino.
For example:

if (doubleLen > NativeNumber.MAX_SAFE_INTEGER) {
if (throwIfTooLarge) {
String msg = ScriptRuntime.getMessageById("msg.arraylength.bad");
throw ScriptRuntime.rangeError(msg);
}
return (int) NativeNumber.MAX_SAFE_INTEGER;

MAX_SAFE_INTEGER (2 ** 53 - 1) is casted as int (max 2 ** 31 - 1).

It is usually useful to have the option to skip these.

@morfav
Copy link
Author

morfav commented Feb 22, 2021

@tuchida I don't have a branch yet, as the task to update to the latest tests version blew up a bit, hence why I decided to post a question here.

@gbrail I found a few harness files (which are part of test262 project and are used within the tests) which don't work:

  1. sta.js
    Due to varargs:
Test262Error.thrower = (...args) => {
  throw new Test262Error(...args);
};
  1. deepEqual.js
    Due to nested literals I think:
      if (typeof Set !== "undefined" && value instanceof Set) {
        return `Set {${Array.from(value).map(value => assert.deepEqual.format(value, seen)).join(', ')}}${usage.used ? ` as #${usage.id}` : ''}`;
      }
  1. compareArray.js
    Due to default method argument value:
assert.compareArray = function(actual, expected, message = '') {
  1. nativeFunctionMatcher.js
    Due to RegExp that I guess rhino does not support
  2. regExpUtils.js
    Due to for const..of loop:
    for (const symbol of string) {

These problems can be overcome by having rhino-specific copies, although this would increase the work-load to support these copies in future (and we may need to test these custom versions?). Also, while 1, 3 and 5 can be overcome quite easily, and perhaps 2, I wasn't quite sure what to do with 4.

So this, coupled with my other points in the original post made me pause to ask what other contributors think. Please let me know any thoughts.

Thanks!

@tuchida
Copy link
Contributor

tuchida commented Mar 9, 2021

How about using Babel or something to transpile /harness/*.js to the ES5 equivalent?
https://babeljs.io/

@rbri
Copy link
Collaborator

rbri commented Apr 13, 2021

In-between the link is moved forward to the head an also the properties file is adjusted. Will have a look at your other points.

@rbri
Copy link
Collaborator

rbri commented Apr 29, 2021

Just saw there are more recent versions available in the repo. Will have a look if i can move this even more.

@p-bakker
Copy link
Collaborator

Instead of moving from exclusion list to a PASS list, what about somehow using (the output of) Test262SuiteTest to automatically update test262.properties to make all the necessary exclusions and remove obsolete exclusions?

@p-bakker
Copy link
Collaborator

p-bakker commented May 21, 2021

As I've created a Template Literal implementation,. I tried running the latest test262 suite, but it hangs on this test: https://github.com/tc39/test262/blob/main/test/built-ins/Array/prototype/concat/arg-length-near-integer-limit.js

Also when I try this code in the Rhino Shell it hangs. Note that in FF it properly throws, in GC it doesn't.

I tried excluding the test in test262.properties, however, the exclusion mechanism through test262 only takes care of not failing the testrun, but it still runs the test (I guess in order to report excluded tests that are passing)

This means I really have to delete the .js file in order to progress in testing. Do we need a stronger exclusion mechanism, that doesn't run the test if excluded?

EDIT: the good news is that it's only 1 .js file in the current test262 master that causes a hang: if deleted, the entire suite runs with 36364 tests completed, 572 failed as result :-)

@gbrail
Copy link
Collaborator

gbrail commented May 21, 2021 via email

@p-bakker
Copy link
Collaborator

Traced that bug a bit, created a case: #906

@p-bakker
Copy link
Collaborator

p-bakker commented May 21, 2021

Another question: if I run the testsuite, I get this failure:

org.mozilla.javascript.tests.MozillaSuiteTest > runMozillaTest[717, js=testsrc\tests\ecma\String\15.5.4.12-3.js, opt=-1] STANDARD_OUT
    In "testsrc\tests\ecma\String\15.5.4.12-3.js":
     FAILED! var s = new String( String.fromCharCode(4304) ); s.toUpperCase().charCodeAt(0) = 7312 expected: 4304
     FAILED! var s = new String( String.fromCharCode(4305) ); s.toUpperCase().charCodeAt(0) = 7313 expected: 4305
     FAILED! var s = new String( String.fromCharCode(4306) ); s.toUpperCase().charCodeAt(0) = 7314 expected: 4306
     FAILED! var s = new String( String.fromCharCode(4307) ); s.toUpperCase().charCodeAt(0) = 7315 expected: 4307
     FAILED! var s = new String( String.fromCharCode(4308) ); s.toUpperCase().charCodeAt(0) = 7316 expected: 4308
     FAILED! var s = new String( String.fromCharCode(4309) ); s.toUpperCase().charCodeAt(0) = 7317 expected: 4309
     FAILED! var s = new String( String.fromCharCode(4310) ); s.toUpperCase().charCodeAt(0) = 7318 expected: 4310
     FAILED! var s = new String( String.fromCharCode(4311) ); s.toUpperCase().charCodeAt(0) = 7319 expected: 4311
     FAILED! var s = new String( String.fromCharCode(4312) ); s.toUpperCase().charCodeAt(0) = 7320 expected: 4312
     FAILED! var s = new String( String.fromCharCode(4313) ); s.toUpperCase().charCodeAt(0) = 7321 expected: 4313
     FAILED! var s = new String( String.fromCharCode(4314) ); s.toUpperCase().charCodeAt(0) = 7322 expected: 4314
     FAILED! var s = new String( String.fromCharCode(4315) ); s.toUpperCase().charCodeAt(0) = 7323 expected: 4315
     FAILED! var s = new String( String.fromCharCode(4316) ); s.toUpperCase().charCodeAt(0) = 7324 expected: 4316
     FAILED! var s = new String( String.fromCharCode(4317) ); s.toUpperCase().charCodeAt(0) = 7325 expected: 4317
     FAILED! var s = new String( String.fromCharCode(4318) ); s.toUpperCase().charCodeAt(0) = 7326 expected: 4318
     FAILED! var s = new String( String.fromCharCode(4319) ); s.toUpperCase().charCodeAt(0) = 7327 expected: 4319
     FAILED! var s = new String( String.fromCharCode(4320) ); s.toUpperCase().charCodeAt(0) = 7328 expected: 4320
     FAILED! var s = new String( String.fromCharCode(4321) ); s.toUpperCase().charCodeAt(0) = 7329 expected: 4321
     FAILED! var s = new String( String.fromCharCode(4322) ); s.toUpperCase().charCodeAt(0) = 7330 expected: 4322
     FAILED! var s = new String( String.fromCharCode(4323) ); s.toUpperCase().charCodeAt(0) = 7331 expected: 4323
     FAILED! var s = new String( String.fromCharCode(4324) ); s.toUpperCase().charCodeAt(0) = 7332 expected: 4324
     FAILED! var s = new String( String.fromCharCode(4325) ); s.toUpperCase().charCodeAt(0) = 7333 expected: 4325
     FAILED! var s = new String( String.fromCharCode(4326) ); s.toUpperCase().charCodeAt(0) = 7334 expected: 4326
     FAILED! var s = new String( String.fromCharCode(4327) ); s.toUpperCase().charCodeAt(0) = 7335 expected: 4327
     FAILED! var s = new String( String.fromCharCode(4328) ); s.toUpperCase().charCodeAt(0) = 7336 expected: 4328
     FAILED! var s = new String( String.fromCharCode(4329) ); s.toUpperCase().charCodeAt(0) = 7337 expected: 4329
     FAILED! var s = new String( String.fromCharCode(4330) ); s.toUpperCase().charCodeAt(0) = 7338 expected: 4330
     FAILED! var s = new String( String.fromCharCode(4331) ); s.toUpperCase().charCodeAt(0) = 7339 expected: 4331
     FAILED! var s = new String( String.fromCharCode(4332) ); s.toUpperCase().charCodeAt(0) = 7340 expected: 4332
     FAILED! var s = new String( String.fromCharCode(4333) ); s.toUpperCase().charCodeAt(0) = 7341 expected: 4333
     FAILED! var s = new String( String.fromCharCode(4334) ); s.toUpperCase().charCodeAt(0) = 7342 expected: 4334
     FAILED! var s = new String( String.fromCharCode(4335) ); s.toUpperCase().charCodeAt(0) = 7343 expected: 4335
     FAILED! var s = new String( String.fromCharCode(4336) ); s.toUpperCase().charCodeAt(0) = 7344 expected: 4336
     FAILED! var s = new String( String.fromCharCode(4337) ); s.toUpperCase().charCodeAt(0) = 7345 expected: 4337
     FAILED! var s = new String( String.fromCharCode(4338) ); s.toUpperCase().charCodeAt(0) = 7346 expected: 4338
     FAILED! var s = new String( String.fromCharCode(4339) ); s.toUpperCase().charCodeAt(0) = 7347 expected: 4339
     FAILED! var s = new String( String.fromCharCode(4340) ); s.toUpperCase().charCodeAt(0) = 7348 expected: 4340
     FAILED! var s = new String( String.fromCharCode(4341) ); s.toUpperCase().charCodeAt(0) = 7349 expected: 4341
     FAILED! var s = new String( String.fromCharCode(4342) ); s.toUpperCase().charCodeAt(0) = 7350 expected: 4342
     FAILED! var s = new String( String.fromCharCode(4343) ); s.toUpperCase().charCodeAt(0) = 7351 expected: 4343
     FAILED! var s = new String( String.fromCharCode(4344) ); s.toUpperCase().charCodeAt(0) = 7352 expected: 4344
     FAILED! var s = new String( String.fromCharCode(4345) ); s.toUpperCase().charCodeAt(0) = 7353 expected: 4345
     FAILED! var s = new String( String.fromCharCode(4346) ); s.toUpperCase().charCodeAt(0) = 7354 expected: 4346
     FAILED! var s = new String( String.fromCharCode(4349) ); s.toUpperCase().charCodeAt(0) = 7357 expected: 4349
     FAILED! var s = new String( String.fromCharCode(4350) ); s.toUpperCase().charCodeAt(0) = 7358 expected: 4350
     FAILED! var s = new String( String.fromCharCode(4351) ); s.toUpperCase().charCodeAt(0) = 7359 expected: 4351


org.mozilla.javascript.tests.MozillaSuiteTest > runMozillaTest[717, js=testsrc\tests\ecma\String\15.5.4.12-3.js, opt=-1] FAILED
    java.lang.AssertionError at MozillaSuiteTest.java:208

If I try the first line of code in Chrome, I get the same result:

> var s = new String( String.fromCharCode(4304) ); s.toUpperCase().charCodeAt(0)
7312
> s
String {"ა"}

Can it be that this is an issue when running the tests on Windows? I'm running the tests on Windows in Git Bash.

The problem btw is the .toUpperCase(): if that is removed, all is fine

@p-bakker
Copy link
Collaborator

With regards to the above comment: I tested in Ubuntu (through WSL on Windows) and then I'm not getting the failures in testsrc\tests\ecma\String\15.5.4.12-3.js, which leads me to believe the test might be invalid

Another strange thing: if I run ./gradlew test -Dquick on Rhino master, I get the following:

    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T1.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T2.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T3.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T4.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T5.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T6.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T7.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A3_T8.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A5_T4.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A7_T1.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A7_T2.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A7_T5.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\apply\S15.3.4.3_A7_T7.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T1.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T2.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T3.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T4.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T5.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T6.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T7.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A3_T8.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A5_T4.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A6_T1.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A6_T2.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A6_T5.js
    Test is marked as failing but it does not: test262\test\built-ins\Function\prototype\call\S15.3.4.4_A6_T7.js
    Test is marked as failing but it does not: test262\test\built-ins\GeneratorPrototype\next\from-state-executing.js
    Test is marked as failing but it does not: test262\test\built-ins\GeneratorPrototype\return\from-state-executing.js
    Test is marked as failing but it does not: test262\test\built-ins\GeneratorPrototype\throw\from-state-executing.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-16-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-17-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-3-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-4-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-5-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-6-s.js
    Test is marked as failing but it does not: test262\test\language\directive-prologue\14.1-7-s.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\dstr\ary-name-iter-val.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\dstr\ary-ptrn-elem-id-iter-done.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\dstr\ary-ptrn-elem-id-iter-val.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\dstr\obj-ptrn-id-trailing-comma.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\dstr\obj-ptrn-prop-ary-trailing-comma.js
    Test is marked as failing but it does not: test262\test\language\expressions\generators\scope-name-var-close.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-86-s.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-86gs.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-87-s.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-87gs.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-90-s.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-90gs.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-91-s.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-91gs.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-92-s.js
    Test is marked as failing but it does not: test262\test\language\function-code\10.4.3-1-92gs.js
    Test is marked as failing but it does not: test262\test\language\statements\generators\dstr\ary-name-iter-val.js
    Test is marked as failing but it does not: test262\test\language\statements\generators\dstr\ary-ptrn-elem-id-iter-done.js
    Test is marked as failing but it does not: test262\test\language\statements\generators\dstr\ary-ptrn-elem-id-iter-val.js
    Test is marked as failing but it does not: test262\test\language\statements\generators\dstr\obj-ptrn-id-trailing-comma.js
    Test is marked as failing but it does not: test262\test\language\statements\generators\dstr\obj-ptrn-prop-ary-trailing-comma.js

However, when I run just ./gradlew test I'm not getting these at all. Exact same source code, just limiting the tests to one optLevel

@rbri
Copy link
Collaborator

rbri commented May 24, 2021

As a guess, these tests are only failing with one opt level - if i'm correct this is a major problem with out implementation.
Will have a look.

@p-bakker
Copy link
Collaborator

Somewhere over the weekend I had the realization that that might be the case, so I quickly ran the tests three times, once with optLevel -1, once with optlevel 1 and once with optLevel 9, but, judging quickly, it didn't seem that all the tests that were reported as not breaking on optLevel -1 were failing on one of the other two optLevels.

But then again: I only quickly glanced at the result, as it was weekend afterall :-)

@rbri
Copy link
Collaborator

rbri commented May 24, 2021

Yes - will provide a first patch that will output details about this tests and the optlevel/strict for these. Then we have to have a look at this.

@rbri
Copy link
Collaborator

rbri commented May 24, 2021

Have opened a new issue to track these kind of tests #909

@p-bakker
Copy link
Collaborator

Any thoughts on the breaking tests on windows mentioned in #817 (comment)?

Also wondering whether it might be Java version related. Am running against Java 15 (Zulu).

Which also brings the question of which Java version(s) Rhino supports and against which versions it is tested. Based on release notes I gather Rhino requires Java 8 and up?

@gbrail
Copy link
Collaborator

gbrail commented May 26, 2021 via email

@p-bakker
Copy link
Collaborator

Tnx @gbrail, I'll create a PR to add to the readme's about the Java version supported by Rhino and about which version to test on

@p-bakker
Copy link
Collaborator

Regarding my previous comment about some tests failing on windows (#817 (comment)):
Looks like windows adheres to a newer version of the Unicode standard: in Unicode 11 the uppercase behavior was changed for certain characters, which are the characters I see failing tests for on Windows (search the page I linked for version 11 for Casing Issues).

The EcmaScript spec has this to say about it:

A conforming implementation of ECMAScript must interpret source text input in conformance with the latest version of the Unicode Standard and ISO/IEC 10646.

So, this leads me to conclude that the testcode is faulty. Shall I go ahead and remove the tests that are failing on Windows?

@p-bakker
Copy link
Collaborator

Come to think of it: isn't the whole testsrc\tests\ecma\String\15.5.4.12-3.js file a bit pointless?

As what it is testing is basically if Java on the OS on which it's running is using Unicode correctly.

@p-bakker
Copy link
Collaborator

p-bakker commented Jun 1, 2021

Created #920 for the comments above

@tonygermano
Copy link
Contributor

@gbrail What is gained by building with Java 9 and targeting Java 8? I tried to search around, but couldn't find much. I did find this, but I'm not sure if this is what you were referring to https://github.com/bsideup/jabel

@gbrail
Copy link
Collaborator

gbrail commented Jun 1, 2021 via email

@p-bakker
Copy link
Collaborator

FYI now that Template Literals are in, we can update to the latest commit on master for the test262 submodule.

Just to let all know: am working on updating test262.properties accordingly, hopefully making the process of doing to easier for the future

@p-bakker
Copy link
Collaborator

p-bakker commented Jun 14, 2021

Sigh... still cant update to the latest commit on test262's main branch: as of last September the harness code depends on rest parameters...

Anyone already tried to take a stab at that? Doesn't look like it: #652

@rbri
Copy link
Collaborator

rbri commented Jun 15, 2021

Having rest parameter support will be great.

@p-bakker
Copy link
Collaborator

p-bakker commented Jun 16, 2021

Having rest parameter support will be great.

:-) for sure, but which missing feature wouldn't?

Did have a look, the semantics of rest params aren't that complicated, just trying to figure out where to convert the last + excess argument values on a function call to a rest param array. Tried to do this in

as that would be the most generic place imho, but it would require modifying the stack array and not sure if that is the way to go

The other place that might be a possibility would be

where the actual function call is made. Not sure if ALL function calls go through there though

Any thoughts on which would be best (or if none of those: how else to go about it)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants