Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL: test the host parser #4504

Closed
wants to merge 8 commits into from
Closed

URL: test the host parser #4504

wants to merge 8 commits into from

Conversation

annevk
Copy link
Member

@annevk annevk commented Jan 4, 2017

See whatwg/url#159 for context.

@domenic
Copy link
Member

domenic commented Jan 4, 2017

Most of these fail in jsdom/whatwg-url too. Is this with a spec update that we haven't applied?

It would be ideal if these could be added to urltestdata.json and setters_tests.json, as then other testing environments (whatwg-url, Node.js's parser) would easily be able to track them.

The fact that Chrome and Firefox generally fail most of these tests is a bit worrying.

@annevk
Copy link
Member Author

annevk commented Jan 4, 2017

This is the specification as-is. I also wanted to test various schemes, which would *n the number of tests. I guess I could attempt to output JSON, but seems weird without source script.

@domenic
Copy link
Member

domenic commented Jan 4, 2017

Including a source script that outputs JSON would work great for us.

It sounds like then we might have some serious bugs in the host parser in whatwg-url. I'll try to diagnose some time.

@annevk
Copy link
Member Author

annevk commented Jan 4, 2017

Might have bugs, but Safari TP passes close to all.

@domenic
Copy link
Member

domenic commented Jan 5, 2017

So here is an example. For (new URL("https://\u001Fx")).hostname, jsdom/whatwg-url gives a host of "xn--\u001fx-" instead of the apparently correct "\u001fx".

Apparently our interpretation of

Let asciiDomain be the result of running domain to ASCII on domain.

When applied to the string "\u001fx" gives "xn--\u001fx-". Is that not correct? Just trying to nail down where the problem is at this point :)

/cc @Sebmaster

@annevk
Copy link
Member Author

annevk commented Jan 6, 2017

No idea, but http://www.unicode.org/reports/tr46/ has the answer I suppose. We need to use the mapping table. http://www.unicode.org/reports/tr46/#IDNA_Mapping_Table. Per http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt U+001F is disallowed_STD3_valid. So it's valid for our purposes since we pass UseSTD3ASCIIRules=false. Then we see in step 3 of http://www.unicode.org/reports/tr46/#ToASCII that applying Punycode processing is not appropriate here.

mathiasbynens/punycode.js#12 might be part of the issue here?

@domenic
Copy link
Member

domenic commented Jan 6, 2017

OK, tracked things down in jsdom/whatwg-url. After jsdom/tr46#7, the failures are:

 1) Host parser tests -:
     TypeError: Invalid URL

  2) Host parser tests host/hostname setter for -:

      AssertionError: 'x' === '-'
      + expected - actual

      -x
      +-

  3) Host parser tests host/hostname setter for .:

      AssertionError: '0.0.0.0' === '.'
      + expected - actual

      -0.0.0.0
      +.

(here "expected" = by this PR, "actual" = produced by jsdom/whatwg-url)

Mind giving those three a closer look to see where the bug is, @annevk?

@annevk
Copy link
Member Author

annevk commented Jan 9, 2017

The answer to the "-" question by the way is that apparently user agents (except Firefox) only enforce that requirement for ToASCII when the input contains non-ASCII code points.

I'm not sure about the "." issue. As far as I can tell the IPv4 parser should return input in step 6.1.

@annevk
Copy link
Member Author

annevk commented Jan 9, 2017

There's over 3000 IDNA tests, which times 9 makes for well over 30000 tests. They can load and execute in a browser (many will fail), but maybe there's a better way to do this?

@annevk
Copy link
Member Author

annevk commented Jan 10, 2017

@jgraham I'm getting "The log length has exceeded the limit of 4 MB (this usually means that the test suite is raising the same exception over and over)." I guess I could split this up and instead run ~3000 tests per resource (instead of doing all APIs in one resource). Would that fit?

@annevk
Copy link
Member Author

annevk commented Feb 9, 2017

So host-parser-reduced is somewhat useful in that it gives faster results, but there's so many IDNA tests (and browsers fail so many!) that it makes it hard to analyze things.

@annevk
Copy link
Member Author

annevk commented Feb 9, 2017

One thing I did found out is that Safari still doesn't implement Nontransitional_Processing.

@GPHemsley
Copy link
Contributor

@annevk @domenic This PR has been sitting a while. Is merging still blocked on something? Has this been abandoned?

@annevk
Copy link
Member Author

annevk commented Jun 4, 2018

This needs to be updated to use a newer IdnaTest.txt, it should use @zcorpan's work on splitting the test so we don't run too many tests at once, and the results need to be verified since it's unclear if IdnaTest.txt is a 100% accurate.

Since I've created a bunch of simpler IDNA tests and user agents aren't even fixing those this simply hasn't become a priority.

@gsnedders gsnedders closed this Jan 24, 2020
@gsnedders gsnedders deleted the annevk/url-host-parser branch January 24, 2020 18:01
@gsnedders gsnedders restored the annevk/url-host-parser branch January 24, 2020 18:44
@Hexcles Hexcles reopened this Jan 24, 2020
@strager
Copy link

strager commented May 9, 2020

I am interested in picking these patches up and pushing them over the finish line. In particular, I noticed inconsistencies in how browsers parse domains, and I want these inconsistencies documented. For example:

new URL('http://hello^/')  // throws in Gecko and Blink; succeeds in EdgeHTML
new URL('http://hello</')  // throws in Gecko; succeeds in EdgeHTML and Blink
  • How should I credit @annevk? Do I preserve the Git authorship metadata?
  • Should I commit tests that fail in every browser? Or should these be disabled?
  • Should I commit tests that fail in any browser? Or should these be disabled?
  • Should I make a separate pull request, or hijack this PR?

@wpt-pr-bot wpt-pr-bot requested a deployment to wpt-preview-4504 May 9, 2020 04:58 Abandoned
@web-platform-tests web-platform-tests deleted a comment May 9, 2020
@annevk
Copy link
Member Author

annevk commented May 9, 2020

@strager that's great, a new PR would be best I think. whatwg/url#459 is likely of interest given your examples. If you could start with writing tests for that PR that'd be terrific.

And yeah, it doesn't matter if the tests fail, but when that happens we do ought to evaluate whether a change to https://url.spec.whatwg.org/ is warranted.

@annevk
Copy link
Member Author

annevk commented Jan 20, 2023

I finally got around to creating a successor for IdnaTextV2.txt: #38080.

@annevk annevk closed this Jan 20, 2023
@annevk annevk deleted the annevk/url-host-parser branch January 20, 2023 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants