-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backslash escape not unescaped in ID attributes #864
Comments
Yep, that's a bug. Thanks for the report. To handle escaped characters, we covert the escaped character to its Unicode code point surrounded by "START OF TEXT" (STX) and "END OF TEXT" (ETX) Unicode characters. STX ( Apparently we aren't doing those replacements for the content of |
@waylan Thanks for taking a look, and your hard work on this! |
@waylan - just to make sure my issue report is complete: at least the way |
@waylan - anywhere specific I should look to start poking in the code and expedite this process? |
The text used for the ID attribute is extracted at markdown/extensions/toc.py#L244. I would do the unescaping there before the text is processed any further. Normally, the unescaping is done by the markdown.postprocessors.UnescapePostprocessor class. Looking at this I was wondering why the postprocessor wasn't addressing the issue. After all postprocessors run on the HTML as text, so there is no distinction between between attributes or any other text of the HTML. Then I realized that the STX and ETX are being removed by the |
The slugify function will stript the STX and ETX characters from placeholders for backslash excaped characters. Therefore, we need to unescape any text before passing it to slugify. Fixes Python-Markdown#864.
The slugify function will stript the STX and ETX characters from placeholders for backslash excaped characters. Therefore, we need to unescape any text before passing it to slugify. Fixes #864.
I'm using the
TocExtension
to generate table of contens for my `.md. files (which could be relevant/irrelevant to the issue below).In my original text, I'm escaping the single underscore
_
with a backslash:### select\_related
After running the text through
markdown
, this is the HTML result:And the generated TOC points to the ID above as well. I'm wondering if this is the expected behaviour? I would expect
markdown
to simply render:At least according to one online converter, the result should be:
<...id="select_related"...>
Am I missing something?
The text was updated successfully, but these errors were encountered: