-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
md_in_html: broken code span
#1068
Comments
Interestingly this seem to work correctly on >>> import markdown
>>> markdown.markdown('<div markdown="1">\n`<h1>escaped</h1>`\n</div>', extensions=["markdown.extensions.md_in_html"])
'<div>\n<p><code><h1>escaped</h1></code></p>\n</div> |
Btw I super appreciate the work you guys are doing here ❤️ I'm using Python-Markdown for realpython.com (also your amazing PyMdown extensions @facelessuser) and it's a pleasure using this library. Let me know if I can provide additional info here to help track this down 🙂 |
This is because a new HTML parser was introduced in 3.3 Though the first wave of bugs was the kind I expected, I'm starting to get a little concerned about the new parser. The handling of block elements in inline code is a little troubling, coupled with some of the recent bugs. As far as the code block part goes, that is one thing the old parser took into consideration. It understood that it didn't have all the context it needed as Python Markdown actually takes multiple passes while some other parsers tokenized everything in one pass. Unfortunately, since we do not tokenize everything in one pass, I really do think block HTML logic should only come into play when the block tag is at the start of a line. We should only process inline tags once we've processed a Markdown block with the code step. It may be that we pulled the trigger too soon on the HTML parser, but I understand why we did as at the time it was passing all the known tests. We are running into scenarios that we just didn't have tests for that we probably should have. I'm curious about @waylan's opinion on the recent issues, and how we should move forward with the latest HTML parser. |
@facelessuser, I agree with and share your concerns and assessment. I thought we had good test coverage. However, it is looking more and more like that is not the case.
While that would be an ideal approach, we can't tell the HTML parser to only parse this and ignore that. It parses everything we pass it. What we have done is then check each token and if it is not a block-level tag, handle it differently. Apparently, there are some cases we aren't covering. If we really want to not parse non-block level HTML at all at this stage, then we need to abandon use of |
BTW, I'm not seeing the reported behavior. Instead I get this output:
And without the
which is correct. My guess is that the extension fails to replicate the logic in the core which accounts for the tag not being at the start of the line. |
monospace
HTML escapingcode span
Okay, this is really weird. I'm getting different behavior from a string literal than from a normal string. >>> markdown.markdown('<div markdown="1">\n`<h1>escaped</h1>`\n</div>', extensions=["md_in_html"])
'<p><h1><div markdown="block">\n`escaped</h1>`\n</div></p>'
>>> src = """
... <div markdown="1">
... `<h1>escaped</h1>`
... </div>
... """
>>> markdown.markdown(src, extensions=["md_in_html"])
'<div>\n<p>`</p>\n<h1>escaped</h1>\n<p>`</p>\n</div>' Turns out the newline before the opening >>> markdown.markdown('\n<div markdown="1">\n`<h1>escaped</h1>`\n</div>', extensions=["md_in_html"])
'<div>\n<p>`</p>\n<h1>escaped</h1>\n<p>`</p>\n</div>' |
The bug which is mixing up the order of the elements was introduced in 2766698. Without that commit, we consistently get the output:
|
Ugh, did I break it? 😞 I'll have to take a look then and see where things went wrong. |
So this is directly related to the issue: markdown/markdown/extensions/md_in_html.py Lines 89 to 95 in 2766698
That if statement on lines 93 & 94 prevents the issue from happening in the case where a newline precedes the |
if not value and self.cleandoc and self.cleandoc[-1].endswith('\n'): I never understood why you added that line. And thinking about it now, it still doesn't make sense. For example, suppose a starttag is at the begging of the document. Then
|
I'll take a look and reevaluate. I'll have to refresh myself on the issue I was trying to avoid. |
While I don't have data right now, I do know I was seeing an issue. I think |
I worked out the issue. You were trying to account for tails. Rather than following the method used in the core, you devised a different approach. I have addressed both that and the present issue in #1069. Although, at present, there are still a few failing tests. |
Awesome. Yeah, there are still some things I wasn't sure about with the new parser. |
markdown.extensions.md_in_html
fails to escape HTML tags inmonospace
text placed inside amarkdown="1"
div wrapper:The inner
<code>
element should look like this:<code><h1>escaped</h1></code>
, but instead the h1 inside themonospace
text appears as an actual<h1>
tag in the output.Versions
The text was updated successfully, but these errors were encountered: