Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning on TrackTrivia prevents EmphasisInline elements being created #561

Closed
nikkilocke opened this issue Jul 9, 2021 · 6 comments
Closed
Labels

Comments

@nikkilocke
Copy link

I am trying to create a converter which will convert Markdown to a form suitable for posting Telegram messages. I have got quite far, to the extent that I can parse some markdown, and turn it into a text string, with MessageEntity objects which show the offset, length and attributes (e.g. a Url for a link) - which is how Telegram does formatting.

Unfortunately the text string has the "insignificant" white space removed - for instance, my first test markdown is:

Test stuff
- **bold text**
- _italic text_
- ~~strikethrough text~~
- https://google.com?search=autolink
- [Full link](https://google.com)
- [**Bold full link**](https://google.com)
- **[Bold full link](https://google.com)**

I ran this through the Roundtrip renderer, and it came out as Test stuff-bold text-italic text-~~strikethrough text~~-https://google.com?search=autolink-Full link-Bold full link-Bold full link

My telegram renderer (which removes the markdown furniture) shows the same.

My renderer is a subclass of RoundtripRenderer, which extracts the bold, italic and url elements, and finds these Entities:

Type:Offset:Length:Text:Url
Bold:10:10:-bold text:
Italic:20:12:-italic text:
Url:90:10:-Full link:https://google.com
Url:100:15:-Bold full link:https://google.com
Bold:100:15:-Bold full link:
Bold:115:15:-Bold full link:
Url:115:15:-Bold full link:https://google.com

I need to preserve the white space, so I tried setting EnableTrackTrivia in the parser.

Unfortunately the document then has no EmphasisInline elements in it. The roundtrip output is (correctly):

Test stuff
- **bold text**
- _italic text_
- ~~strikethrough text~~
- https://google.com?search=autolink
- [Full link](https://google.com)
- [**Bold full link**](https://google.com)
- **[Bold full link](https://google.com)**

My Telegram renderer (which removes the markdown furniture for items it recognises) shows:

Test stuff
- **bold text**
- _italic text_
- ~~strikethrough text~~
- https://google.com?search=autolink
- Full link
- Bold full link
- **Bold full link**

but most of the inline emphasis entities are missing:

Type:Offset:Length:Text:Url
Url:112:9:Full link:https://google.com
Url:125:14:Bold full link:https://google.com
Bold:125:14:Bold full link:
Url:145:14:Bold full link:https://google.com

Should TrackTrivia turn off recognising inline emphasis? If so, is there another way to retain the newlines and spaces in the original markdown?

@nikkilocke
Copy link
Author

Just FYI, I have looked carefully at the code from NormalizeRenderer, and modified all my renderers to do what that does, and the output is now acceptable, although I would prefer it to match the input more exactly if possible.

So the problem is no longer serious for me, but you might find it intriguing, and worth investigating, as it may be a bug in the parser.

@xoofx xoofx added the bug label Aug 6, 2021
@xoofx
Copy link
Owner

xoofx commented Aug 27, 2021

Note that NormalizeRenderer should not be used with TrackTrivia but instead RoundtripRenderer. NormalizeRenderer might be deprecated at some point, as the normalize part should be better done as a modification of the AST that can be feed into the RountripRenderer

@generateui
Copy link

Can you provide a minimally viable test that fails on your input and include assertion? That's go a long way in fixing this.

@jo3w4rd
Copy link

jo3w4rd commented Mar 9, 2022

I notice this as well, just using Markdown.ToHtml().

The code:

        public static void TrackTrivia()
        {
            string filePath = "D:\\Repos\\MarkdownTest.md";
            string markdown = File.ReadAllText(filePath);
            Console.WriteLine(markdown);
            var pipeline = new MarkdownPipelineBuilder()
                .EnableTrackTrivia()
                .Build();
            Console.WriteLine(Markdown.ToHtml(markdown, pipeline));
        }

produces output:

# Look at emphasis

**bold** __bold, too__

*italic* _also italic_

`code`

Plain

<h1>Look at emphasis</h1>
<p>**bold** __bold, too__
</p>
<p>*italic* _also italic_
</p>
<p><code>code</code>
</p>
<p>Plain
</p>

If you remove the EnableTrackTrivia() call, the output is correct:

<h1>Look at emphasis</h1>
<p><strong>bold</strong> <strong>bold, too</strong></p>
<p><em>italic</em> <em>also italic</em></p>
<p><code>code</code></p>
<p>Plain</p>

@xoofx
Copy link
Owner

xoofx commented Mar 9, 2022

Yeah, if EnableTrackTrivia() is making such changes, than it's definitely a serious bug.

@xoofx
Copy link
Owner

xoofx commented Mar 11, 2022

I believe this should be fixed by 983187e and available in 0.28.0

Please note that I have opened a new issue #604

I would really highly suggest to not use EnableTrackTrivia() for rendering to HTML. EnableTrackTrivia() was mainly introduced for roundtrip. I have seen other rendering issues with it.

Otherwise I'm curious about the use case for using EnableTrackTrivia() with rendering to HTML?

@xoofx xoofx closed this as completed Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants