-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape special HTML chars coming from SITEURL in the internal link replacer #2260
base: main
Are you sure you want to change the base?
Conversation
So it can be used from outside.
33ae2ac
to
4462d84
Compare
Didn't find any utility that could do both HTML escape and unescape in Pelican codebase, so it's using either |
My admittedly cursory scan of this pull request didn't yield any glaring problems. Any @getpelican/reviewers care to comment before this is merged? |
@avaris: Any thoughts on this one? |
I'm a little confused here. Why wouldn't you set SITEURL = 'http://my.site/?app="blog"&path=' |
Oh, sorry for the late reply, the notification got somehow drowned among all others :/ Encoding the In general, for consistency, I think the settings should not require the user to pre-encode any of the values (unless it's already a HTML snippet or something). Pelican and the theme should transparently handle all encoding to avoid double encoding issues like this. |
Well, there is |
Okay, um, then the behavior is a matter of opinion, I guess:
I'm personally in favor of the second option, it looks cleaner and more consistent to me. |
Well, then there is the inconsistency of If we're to do escaping on |
But that's because the internal replacements work on the final HTML output, no? That's expected.
I'm aware of that. OTOH, I don't think there are many sites out there with special characters in the URL that would break on this. It's more about the consistency. I have a test like this for my theme, where basically everything coming from the settings is expected to be a plain text and the theme does proper escaping on everything that's not already HTML. I'm sure that, when this PR is in, a theme can be authored so that there are no issues with double escaping or missing escaping -- and none of the settings has to be explicitly pre-escaped. Doesn't that sound reasonable? The other extreme is to require all the settings to be already escaped by the user (so, to be consistent, even |
Now, why would you escape I'm not against this change. I'm just pointing out potential issues (like the possibility of backward incompatibility). Sure, you can author a theme that will work with this. You can also author themes that doesn't need this and expects pre-escaped settings. You can author plugins that will work on escaped In general I'm OK with this, but at the very least this also requires some documentation about how P.S.: I'm also somewhat puzzled about why you'd need such |
I'll happily provide extensive documentation for this, but not sure what's the right place. Or does some such documentation exist already? Point me to where I can add it :)
Ha, yeah :) It's not really |
I imagine the following is the most appropriate place for @mosra / @avaris: Aside from documentation, what else is left to do regarding this endeavor? Just wanted to bring it up since we're getting close to release time. We can, of course, always defer to the next release if needed. 😊 |
@justinmayer I don't think this is so important to have in for 3.8, so I'm okay with defering until later. At the moment I don't have any released functionality depending on this behavior, just some proofs of concept. OTOH, #2439 is important for 3.8 ;) |
When
SITEURL
(or any other URL such asPAGE_URL
,ARTICLE_URL
etc.) contains special HTML characters such as&
, they were not properly escaped when going through the link replacer.A convoluted example: when
SITEURL = 'http://my.site/?app="blog"&path='
and there's the following input:The following gets rendered:
Which is wrong and should be instead
While the Jinja2 template itself usually takes care of escaping
SITEURL
where it is used in the template, this link replacement is done after that and thus there's no way around it.The patch takes care of properly unescaping the other URL parts and then escaping them again so they get preserved. There's a test case checking that as well.
Similarly to #2196, this is based on 1f30306, which just makes the internal link replacer function public. See #2164 for the original reason behind this change.
Thanks in advance! :)