-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spaces removed from links (all URLs) #72
Comments
https://tools.ietf.org/html/rfc3986 Actually the URI spec says that those spaces should be ignored. Do those Read the Docs pages work well on browsers? |
I see your point. I will change the behaviour of But I still don't understand the spec of URLs and HTML actually. The both
<html>
<body>
<a href="/a b.html">foo</a>
<a href="/a%20b.html">bar</a>
</body>
</html>
<html>
<body>
<a href="/">back</a>
</body>
</html> Am I missing something??? Anyway, thanks for the report! |
Ah, I think I know where the misunderstanding comes from:
Leading and trailing spaces should most certainly be stripped, however, internal spaces should not, and instead be URI-encoded (percent-encoded). So, |
I did |
No worries. It works well as is, just trying to be helpful as I learn more Go myself! |
Related to #44, however, now I've come across a site (being published on Read the Docs) which has many spaces in the filenames. Though the IDs are fine, the spaces get removed from the path, which breaks the links and results in many false 404's.
Perhaps instead of removing spaces, using the
net/url
Parse function only? Or removing spaces only from theURL.Fragment
?muffet/scraper.go
Lines 48 to 54 in 4998c9b
muffet/scraper.go
Lines 82 to 90 in 4998c9b
https://play.golang.org/p/42kUw1Rg23m
The text was updated successfully, but these errors were encountered: