-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MOLD of URL containing unicode chars is invalid #2379
Comments
fixes: metaeducation/rebol-issues#2379 (This was actually a lexer error)
I don't observe the same behaviour in Ren-C (Web or Mac Terminal). See also: Ren-C Pull 655. |
Good for Ren-C than... I have a little bit older Ren-C version. |
@rgchris regarding the mentioned Ren-C's pull request, I prefer my version (compatible with Red), where it is like:
versus Ren-C's:
|
@rgchris Also I prefer:
versus Ren-C's:
|
Good point to raise... But, I do think we're on the right track following along on @rgchris's take that you should be able to round-trip URLs that are copied to and from the address bar of your browser. To the extent it's what's in the viewer's consciousness that is the "source code". I think that's what makes the URL type valuable, more than any automatic escaping does. Browser rules are weird, though. There's some writing about it in The URL Standard:
(See also URL Escape Guidelines) For this to work, Chrome has to assume any percents that appear are escaping-percents. So I presume that READ or other URL operations would do the same. This doesn't give easy or obvious answers to building up URLs programmatically from strings, when those strings aren't escaped. I've been thinking that URLs would be immutable, so you couldn't end up in situations like:
With immutability and being forced to use JOIN, there's a moment you can check for badly formed URLs. And I'd suggest that noticing % that weren't %-escapes, or stray spaces would be disallowed:
It implies that when you're building up a URL out of string components that are arbitrary text (not known-good characters for a browser-ready URL) and using URL-ENCODE on those bits, you might do more escaping than necessary. Hence you might need some kind of CANONIZE-URL to bring it in line with what Chrome does in the address bar. That probably should not be automatic. Curiously, this URL shows with quotes in Chrome's address bar, but you get %22 when you copy it to the clipboard:
Lots to think about here. |
@rgchris Actually, it looks like the copy-to-clipboard in Chrome escapes this as well:
I was previously under the impression it did not. That influences my thinking on this a bit, in light of the space issue. (It actually seems to only do the escaping if you have the whole URL selected when you copy, not just part.) Can you give an updated outline of your philosophy here? The main thing I guess is just that it seems that if URL! is going to be a generically useful type in the system with custom schemes, it seems a waste to force them all through the very ugly percent-encoding, which seems very much an archaic legacy-type thing. But maybe it's still acceptable to say that the URL rule is that the only percents you can have are for the purposes of hex-byte-character encodings. That encoding is apparently not just in the URL encoding standard but for any URI. |
Regarding this:
I think that result should be:
|
I used to think along these lines, that escaping and generality of forms was important. But there is Freedom To and Freedom From. "Freedom To" store arbitrary strings and flavor them as URL! is robbing you of your "Freedom From" being passed a URL that has no scheme and is not URL-like whatsoever. You effectively know nothing about its form. Also obviously you wind up with these not-very-appealing construction syntaxes. I feel like the part could have more value by giving it a few more guarantees. If those guarantees don't suit you then you always can convert to a string and work with that. And if you find yourself wanting to save URL!s in a file that aren't LOAD-able, you should be the one coming up with the notation for that...because it's going to be you who's responsible for building that valid URL later. Of course this is new and so there's a lot of testing and figuring. The closest historically would be the rules on WORD! and how their immutability and creation-at-one-moment lets you impose rules on what letters are allowed. (That's another idea I've changed my feelings on, that we don't necessarily make the total world simpler by letting you have escaped forms.) https://forum.rebol.info/t/any-word-and-any-string-the-limits-of-unification/1127 |
If people are sharing the incorrect version (i.e. not correctly escaped) then it would be preferable to support it within reason. Chrome (and Firefox) does indeed appear to escape when copying, Safari does not. Also, you can put the unescaped version in a link and browsers will do the translation. There's a bit of a human element to this too. If you're composing a URL in a text file, which is more natural to write?
|
Fun—Github's markdown automatically escaped the linked URL to |
Apparently this is only technically legal since the RFCs related to HTML5.
But in this context we have another problem: HTML escaping. If a URL contains quotes, then how to put it in quotes, etc. The example in the answer at the bottom of the above SO question shows some of the complexities: Unescaped:
"Legit URL"
The variation that is suitable to put into an
And Rebol can't LOAD the last one as a URL, because it has semicolons in it, so they are cut off as comments. :-/ So that's a good point on why you can't copy and paste an arbitrary If I had to pick my own moral-of-the-story, it would be that text is a terrible medium for building structured documents. A tree/graph data structure represented unambiguously via a binary format would be so...much...better! @rgchris points out that Safari doesn't escape the URL when you copy/paste. Maybe that lines up with the evolution of the RFCs allowing the non-ASCII characters in href...that the long tail is going to be the browsers aim to give you a readable link, and everything is under the hood. Wouldn't Rebol scripts want to be showing what you see on the screen in your source? The alternative is that Rebol push back and become the biggest W3C stickler in the world, as a way to "sort out the mess". But I think my general feeling is like @rgchris's--that it is swimming upstream. The strict format is more likely to frustrate people trying to use the URL! type how they want to in their source and dialects, as opposed to be appreciated for its limitations. |
For some definition of "handle"... the browser knows to turn the In any case, it's not able to LOAD it (unless you use the |
I don't say that
that was fixed and I have my script working as I wanted. The rest is unrelated. If you want to handle |
Check this:
In comparison, Ren-C is also wrong (in different way):
Rebol2 and Red are OK:
The text was updated successfully, but these errors were encountered: