Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad parsing of code block inside quotes followed by a code block #325

Open
unix-world opened this issue Dec 2, 2024 · 13 comments
Open

Bad parsing of code block inside quotes followed by a code block #325

unix-world opened this issue Dec 2, 2024 · 13 comments

Comments

@unix-world
Copy link

unix-world commented Dec 2, 2024

Test

Blockquotes are very handy in email to emulate reply text.
This line is part of the same quote.

H6

one
two
three

ok

let a = 2;
const b = 3;
var a = 4;
class A {
  constructor() {
  }
}

ok

``` go
func fact(n int) int {
	if n <= 1 {
		return n
	}
	return n * fact(n-1)
}
```

Quote break.

This is a very long line that will still be quoted properly when it wraps. Oh boy let's keep writing to make sure this is long enough to actually wrap for everyone. Oh, you can put Markdown into a blockquote.

This is 2nd level

and this is 3rd

Preformatted blocks are useful for ASCII art:

             ,-.
    ,     ,-.   ,-.
   / \   (   )-(   )
   \ |  ,.>-(   )-<
    \|,' (   )-(   )
     Y ___`-'   `-'
     |/__/   `-'
     |
     |
     |    -hrr-
  ___|_____________

END

@unix-world
Copy link
Author

Please see the attached file
wrong-markdown.md

@L-four
Copy link

L-four commented Dec 4, 2024

I think my issue is related. A code block is causing a indefinite loop.

This minimal reproduction of the issue.

https://goplay.tools/snippet/JQkkXK22Y2S

package main

import "github.com/gomarkdown/markdown"

func main() {
	html := markdown.ToHTML([]byte("```\n: "), nil, nil)
	print(html)
}

GuyVivedus added a commit to GuyVivedus/markdown that referenced this issue Dec 4, 2024
@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

@L-four I fixed infinite loop at 87d094f

@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

@unix-world can you check using latest master?

If it still doesn't work, please:

@unix-world
Copy link
Author

unix-world commented Dec 4, 2024

Still does not work.
I am using these options:

extensions := mkparser.CommonExtensions | mkparser.HardLineBreak | mkparser.Attributes | mkparser.SuperSubscript
htmlFlags := mkhtml.SkipHTML | mkhtml.LazyLoadImages

This is how gomarkdown is parsing, wrong
Screenshot 2024-12-04 at 20-20-17 Markdown

This is how it should be, parsed by goldmark
image

@unix-world
Copy link
Author

unix-world commented Dec 4, 2024

I will attach you the test Markdown I have, I cannot use the online playground because the markdown contains both " and ` and is too complicate to escape them, so the content is within a file ...

PS: if you look in the previous message screenshots it has to do with a fenced code inside a blockquote.
The goldmark is playing nice there. The gomarkdown seems to have an issue, the fenced code still have the prefix > on each line.

markdown-test.md

@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

It's way too big.

If you can't reduce repro to less than 5 lines, I'm not going to wade through hundreds of lines trying to guess which part you think is parsed wrong.

I need:

  • markdown, share it as https://babelmark.github.io/ link (after you press convert, the url contains the code)
  • html that we render
  • html that you expect to render

@unix-world
Copy link
Author

unix-world commented Dec 4, 2024

I just isolated the case, this part of the code is rendered wrong, as you can see in the screenshots I atatched you in the previous message.
See the attachment md here, I provided you 2 test cases.
One works, the other have the described issue.

Looks like if I put a fenced code block inside a quoted block and after, somewhere below another fenced block will have this issue.

@unix-world
Copy link
Author

See the attachments...
markdown-wrong.md
markdown-ok.md

@unix-world
Copy link
Author

I got a fix for this issue, in the file parser/block.go
If I commented out that code it works ...
But now is another issue, a new line is not ending the block.
I don't know what the commented code does, why you compensate that with -1 :-) but at least it fixed the bug I reported here. Please see below:


// parse a blockquote fragment
func (p *Parser) quote(data []byte) int {
	var raw bytes.Buffer
	beg, end := 0, 0
	for beg < len(data) {
		end = beg
		// Step over whole lines, collecting them. While doing that, check for
		// fenced code and if one's found, incorporate it altogether,
		// irregardless of any contents inside it
		for end < len(data) && data[end] != '\n' {
			//-- fix: this portion of the code is breaking the fenced code inside blockquote with another fenced code after the entire bockquote
		//	if p.extensions&FencedCode != 0 {
		//		if i := p.fencedCodeBlock(data[end:], false); i > 0 {
		//			// -1 to compensate for the extra end++ after the loop:
		//			end += i - 1
		//			break
		//		}
		//	}
			//-- #end fix: https://github.com/gomarkdown/markdown/issues/325#issuecomment-2516332756
			end++
		}

@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

Can you confirm that this is the minimized repro:

> ~~~ javascript
> let a = 2;
> ~~~

	code block indent

~~~ go
// another code block
~~~

go playground
babelmark repro

It generates a single code block inside a blockquote.

It should generate 3 code blocks, one inside blockquote.

Other cases:

> ~~~ javascript
> let a = 2;
> ~~~

~~~ go
// another code block
~~~

The simplest case is really:

> ~~~
> // comment
> ~~~

~~~

The last ~~~ is interpreted as end of block code instead of of > ~~~.

@kjk kjk changed the title Bug - some code inside blockquote does not terminate well and is breaking the parser Bad parsing of code block inside quotes followed by a code block Dec 4, 2024
kjk added a commit that referenced this issue Dec 4, 2024
@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

Just removing fence code logic inside Parser.quote() breaks other test cases.

@kjk
Copy link
Contributor

kjk commented Dec 4, 2024

The problem is that fencedCodeBlock() called from quote() doesn't see > ~~~ as end of code block. Would probably have to pass insideQuote flag to fencedCodeBlock() and to isFencedLine().

Another case:

> ~~~ go
// comment
~~~

On babelmark those generate blockqote with codeblock, paragraph for "// comment" and another codeblock.

We have very uncommon interpretation of this as a single code block inside blockquote. But this logic breaks when there's a code block after blockquote.

We probably could fix this case without breaking those other test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants