-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
toc: Unusual characters in heading ids not well supported #1493
Comments
Might be related to #864. |
I'm not sure I follow. Could you please provide sample input, actual output and expected output as 3 separate code blocks?
That is possible. As explained in that discussion, the various parts of the parser rely on each other and require that the parts run in a certain order. I know you are often recalling parts of the parser in an unusual order, which could result in some unescaping being missed or some similar issue. That's why I ask for input, actual output and expected output. |
Of course, here are some sample inputs/outputs 🙂
|
The reason why I'm escaping <h2 id="foo-idfoo"><code>*Foo*</code> { id="<em>Foo</em>" }<a class="headerlink" href="#foo-idfoo" title="Permanent link">¶</a></h2> |
This issue might actually not arise from inside mkdocstrings, since we output HTML directly (so |
I can confirm that this is indeed a bug. >>> import markdown
>>> md = markdown.Markdown(extensions=['toc', 'attr_list'])
>>> md.convert('## `*Foo*` { id="\*Foo\*" }')
'<h2 id="*Foo*"><code>*Foo*</code></h2>'
>>> md.toc_tokens
[{'level': 2, 'id': '\x0242\x03Foo\x0242\x03', 'name': '*Foo*', 'html': '<code>*Foo*</code>', 'data-toc-label': '', 'children': []}]
>>> md.toc
'<div class="toc">\n<ul>\n<li><a href="#\x0242\x03Foo\x0242\x03">*Foo*</a></li>\n</ul>\n</div>\n' Presumably, this requires a similar fix to #864 when building the |
So, here's how we render HTML headings within mkdocstrings: {# just an example #}
{% filter heading(
heading_level + 1,
role="module",
id="*foo*",
class="doc doc-heading",
toc_label="*foo*",
) %}
Hello from Foo!
{% endfilter %} And here's the final output (heading then TOC): <h2 id="*foo*" class="doc doc-heading"> Hello from Foo!
<a href="#*foo*" class="headerlink" title="Permanent link">¤</a></h2> <a href="#*foo*" class="md-nav__link">
<span class="md-ellipsis">
*foo*
</span>
</a> (TOC styled by Material for MkDocs) As we can see mkdocstrings doesn't actually suffer from this, since we bypass Anyway, glad you were able to reproduce the issue! Fixing it will at least make it much easier for mkdocs-autorefs to test its upcoming features 👍 |
So, it appears this is an issue with attr_lists. If the id is defined outside of an attr_list, then it is fine. >>> md.convert('## \*Foo\*)')
'<h2 id="foo">*Foo*)</h2>'
>>> md.toc_tokens
[{'level': 2, 'id': 'foo', 'name': '*Foo*)', 'html': '*Foo*)', 'data-toc-label': '', 'children': []}] In fact, we already have a test for that. There are 2 ways we could fix this.
Option 1 fixes the instance issue while option 2 ensures any future issues are addressed as well. However, option 2 would be redundant in the exiting test. My inclination is to go with option 1 and expect each individual extension to properly unescape its own output. |
Option 1 sounds reasonable. I don't expect a lot of extensions to be affected by this. |
Actually, I couldn't create a test that failed on the |
I noticed that
toc
encodes characters like*
as\x0242\x03
, 42 being the index of*
in the ASCII table. This causes a discrepancy between the permalink of a heading and the link in the table of contents.index file:
mkdocs config:
Serve and observe the behavior described in the index page.
I'm not saying this is a bug. I'm just curious if this is expected, and whether there would be a way improve support for headings with such "unusual" ids. This would help for the work I'm doing with mkdocstrings, where we try to expand our languages support, and some languages might use uncommon characters in object identifiers. Not only
toc
would have to work, but also mkdocs-autorefs, which picks up ids from the table of contents when registering URLs and anchors to objects.I believe HTML5 supports any kind of characters in ids. Some of them just cause a bit of pain, like
.
or#
, because they then need to be escaped in CSS selectors.The text was updated successfully, but these errors were encountered: