-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relaxed parsing mode #483
Comments
The spec for parsing hosts says:
The spec for the domain to ASCII algorithm says:
The spec for Unicode ToASCII refers to the validity criteria section which says:
But as mentioned earlier, the Unicode ToASCII algorithm is to be invoked with CheckHyphens set to Seems like a |
Lines 253 to 262 in a1d8c88
|
Ouch. |
I suggest we add a type We can then reimplement pub fn to_ascii(domain: &str, flags: Flags) -> Result<String, Errors> {
Config::from(flags).to_ascii(domain)
} |
It is increasingly time for |
Hi, the changes fixed the reported issue, all our unit tests are passing. Thank you very much for your extremely quick feedback! Do you have any ETA for release of the crate?
|
In general, I think we probably shouldn’t have parsing modes. If we ever do, they should be precisely specified. Just calling it “relaxed” doesn’t say what exactly is accepted or not. This library is intended to be used (among others) in browser a implementation, so if its behavior differs from interoperable behavior in other browsers, that’s a bug either in this implementation on in the specification https://url.spec.whatwg.org/. In this case, it looks like the specification has changed and we hadn’t been keeping up. #484 fixes this. |
Hello
I understand the library tries to follow the standard as close as possible.
However, there are URLs out there in the wild that exist, work and are rejected by this library as invalid. As an example,
http://canada-region-70-24-.static-apple-com.center/
(rejected because of the trailing-
) ‒ if you point a browser there, you'll get a content (not a very useful one, granted).Would it be possible to have some relaxed parsing mode (if there is, I haven't found it in the documentation) where I would still get the methods to get the host and path and query out of the URL and canonicalize it, while allowing for certain violations against how a good URL looks like? I have little choice over the URLs that I need to handle ‒ basically, whatever happens in the wild might come my way and I have to do something meaningful with it.
The text was updated successfully, but these errors were encountered: