Generated Markdown is missing line breaks between heading and image #59

kiejo · 2023-02-02T09:49:50Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

mdast-util-to-markdown 1.5.0

Link to runnable example

No response

Steps to reproduce

import {toMarkdown} from 'mdast-util-to-markdown'

const tree = {
  type: 'root',
  children: [
    { type: 'heading', depth: 1, children: [{ type: 'text', value: 'Test' }] },
    {
      type: 'image',
      url: 'https://example.com/image.png',
    },
  ]
}

console.log(toMarkdown(tree))

Expected behavior

It should log the following output:

# Test

![](https://example.com/image.png)

Actual behavior

It logs the following output:

# Test![](https://example.com/image.png)

It looks like this issue was introduced with this commit: 122101f

Affected runtime and version

[email protected]

Affected package manager and version

No response

Affected OS and version

No response

Build and bundle tools

No response

The text was updated successfully, but these errors were encountered:

wooorm · 2023-02-02T10:23:59Z

Hi! Please follow the mdast specification, and wrap your images in paragraphs. They may not appear on their own like that

Duplicate of #58

github-actions · 2023-02-02T10:24:26Z

Hi! Thanks for taking the time to contribute!

Because we treat issues as our backlog, we close duplicates to focus our work and not have to touch the same chunk of code for the same reason multiple times. This is also why we may mark something as duplicate that isn’t an exact duplicate but is closely related.

Thanks,
— bb

github-actions · 2023-02-02T10:24:57Z

Hi team! It seems this post is a duplicate, but hasn’t been marked as such. Please post a comment w/ Duplicate of #123(no final .) to do so. See GH docs for more info.

Thanks,
— bb

wooorm · 2023-02-02T10:26:13Z

Duplicate of #58

kiejo · 2023-02-02T12:11:27Z

Thanks for the quick response! I can confirm that wrapping the image in a paragraph solves the issue, but I noticed that this causes the image alt text to get escaped in a way I did not expect. I would have expected the escaping of the image alt text to be treated in a similar way to the title text.

While this sounds like a separate issue, I thought it would make sense to bring it up in this context as I did not encounter this issue when not wrapping the image inside a paragraph (which I now understand is not spec compliant).

Here is an example for reference:

import {toMarkdown} from 'mdast-util-to-markdown'

const tree = {
  type: 'root',
  children: [
    { type: 'paragraph', children: [
      {
        type: 'image',
        url: 'https://example.com/image.png',
        alt: '*hello* [world]',
        title: '*hello* "world"',
      }
    ]},
  ]
}

console.log(toMarkdown(tree))

Output:

![\*hello\* \[world\]](https://example.com/image.png "*hello* \"world\"")

Expected:

![*hello* [world\]](https://example.com/image.png "*hello* \"world\"")

Please let me know if the current image alt text escaping is by design or if I should open a separate issue for this.

wooorm · 2023-02-02T12:16:38Z

Your expected markdown doesn’t work. Try pasting here:

![\*hello\* \[world\]](https://example.com/image.png "*hello* \"world\"")

![*hello* [world\]](https://example.com/image.png "*hello* \"world\"")

![hello world]

Please let me know if the current image alt text escaping is by design or if I should open a separate issue for this.

Escaping is definitely by design. That being sad, as this case shows, escaping is incredibly hard. If it can be improved, that’s of course great, but maybe it can’t.

wooorm · 2023-02-02T12:19:33Z

If this is your reduced test case:

![*alpha*](xxx) and ![\*bravo\*](yyy)

you should check out the alt text of the resulting HTML:

and

Note that the *s disappear. That’s because they actually are interpreted in images. Markdown “works” in there. But, the tags (<em> and </em> in this case) are ignored.

kiejo · 2023-02-02T14:19:18Z

Thanks for the additional details and examples. I did not expect image alt text to be interpreted as Markdown as the Alternative mixin only specifies a simple string attribute just like the title attribute is defined as part of the Resource mixin.

While the alt and title attributes seem to be using the same definition, they are interpreted in different ways:

import {fromMarkdown} from 'mdast-util-from-markdown'
console.log(JSON.stringify(fromMarkdown('![*alt*](xxx "*title*")'), null, 2))

Output of image node:

{
  "type": "image",
  "title": "*title*",
  "url": "xxx",
  "alt": "alt"
}

I am wondering if it actually makes sense to interpret the image alt text as Markdown if the information gets discarded and is not even part of the generated AST. The only use case I could come up with so far, would be for Markdown syntax highlighting where the alt text of an image could be rendered in a stylized way. But when for example trying this in Visual Studio Code, the image alt text does not seem to be interpreted as Markdown and instead as simple text (which also matches my initial expectation):

As you can see in the screenshot, the alt text of the image does not use italics or bold while the content of the link does.

At the same time it looks like the current logic matches GitHub's behavior, which might be reason enough to keep it as is.

wooorm · 2023-02-02T14:43:02Z

This isn’t about mdast, which is a specification for representing markdown as nodes. And, yeah, there these fields both use string as types.

This is about how markdown works. Markdown prescribes this. CommonMark-compliant markdown processors do this.

On the parsing side, this is because links, images, but also emphasis, are not parsed the way you think.
They are parsed in chunks: * is seen. Or ![ is seen. And ** is seen. And then ](#whatever).

If such a valid closing is found (e.g., ]). Then the parser looks back to see if there is a corresponding opening (e.g., ![).
Meanwhile, emphasis/strong (what I call attention) is parsed as separate runs around all that (*, **, ***).
And then “normal” things are parsed as whole units from left to right, such as &, or <https://example.com>, or `asd`.

So you see, this parsing is already done. Only afterwards do we know that these things occurred in an image.
And then CommonMark says: well, <em> in there doesn’t make sense, drop those tags.

wooorm · 2023-02-02T14:43:33Z

More on how parsing works in https://github.com/wooorm/markdown-rs/blob/main/src/construct/label_start_image.rs and https://github.com/wooorm/markdown-rs/blob/main/src/construct/label_end.rs.
A related issue on CommonMark is this one: commonmark/commonmark-spec#716.

kiejo · 2023-02-02T16:42:23Z

Thanks for the additional explanation and the links. I wasn't aware of how much of the image description logic is already defined in the CommonMark spec. That's very helpful! I am finally able to understand why escaping in the image description is absolutely required as it supports much more than just simple text.

Reading the following in the CommonMark spec:

Syntax for images is like the syntax for links, with one difference. Instead of link text, we have an image description. The rules for this are the same as for link text, except that (a) an image description starts with ![ rather than [, and (b) an image description may contain links. An image description has inline elements as its contents. When an image is rendered to HTML, this is standardly used as the image’s alt attribute.

I think trying to more closely reflect this spec in the Image AST node could look like this:

type ImageDescriptionContent =
  Break | Emphasis | HTML | Link | LinkReference | InlineCode | Strong | Text

interface Image <: Node {
  type: "image",
  description: [ImageDescriptionContent]
}

Image includes Resource

I think this AST would be closer to the spec and it would be up to the renderer to decide how to best render the spec compliant image description as HTML. The renderer could follow the recommendation and use an alt tag and only render the plain string content without formatting (but it would also have the option of rendering it in a different way since this does not seem to be part of the spec). It would also make it more explicit and clear that the Markdown image description represents much more than the traditional alt attribute of an HTML img tag.

I assume that this approach has already been evaluated and I can see how a simpler AST might be the better overall trade-off if the majority of real-world cases involve the image description being set to a simple text anyways. But only exposing a simple string field for the image description in the AST feels like it somewhat defeats the purpose of being able to put more than just text in the image description in the first place.

wooorm · 2023-02-02T16:52:36Z

:)

A bit about the spec: it’s also a bit of a vague document. It’s nice that it provides many examples, more than we had before, but there are still many edge cases. So we still need to look at what parsers actually do: how does GFM for example actually work?

Yeah, I’ve thought about adding children to images. Which would include stuff like code in there and such. The downside is that you then have two sources of truths. A plugin would edit one, but would likely forget about the other, or at least it’ll get out of sync. And that’s what an AST is for, to remove such complexity, to make it simple to change things.

One alternative is to remove alt completely, and only serialize this stuff when going to html. But that’s rather breaking.

Perhaps tho, you have interesting needs, for such rich content. By hearing them I might be able to advise better? E.g., perhaps you can use directives, which do allow rich content, and compile to some other element that has a figcaption or so?

kiejo · 2023-02-02T18:16:58Z

I agree with the downside of having two sources of truths and I like the simplicity of the current AST.

My needs are actually quite simple, which is why I would personally prefer the image description to simply work like the alt attribute of an HTML img tag without characters like "*" or "_" getting dropped/interpreted or needing to escape them. It was quite unexpected to me that the image description supports all these inline elements in the spec as I have always thought of it as the alt attribute of an HTML image. With this mindset, using directives for more advanced needs makes a lot of sense, just like you suggested.

I guess the goal of the current image description spec was to make it consistent with the way links work as the syntax is so similar. So I think I understand the trade-off that was being made in the spec even if there might not be that many good use cases for all these inline elements in the image description. Or maybe there are common use cases that I just haven't encountered yet.

After having thought through several use cases and now having a better understanding of the CommonMark spec, I think you picked really good trade-offs for this library. The result is a spec compliant system, which hides some of the flexibility/complexity of CommonMark image descriptions while making it very easy to use for the 99% of cases where a simple alt attribute is all the user needs and expects. Great work!

Thanks so much for taking the time to answer all my questions and helping me better understand the design decisions of this library :)

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Feb 2, 2023

wooorm closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2023

wooorm added the 👯 no/duplicate Déjà vu label Feb 2, 2023

This comment has been minimized.

Sign in to view

github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Feb 2, 2023

wooorm marked this as a duplicate of #58 Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated Markdown is missing line breaks between heading and image #59

Generated Markdown is missing line breaks between heading and image #59

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023 •

edited

Loading

This comment has been minimized.

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023 •

edited

Loading

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

Generated Markdown is missing line breaks between heading and image #59

Generated Markdown is missing line breaks between heading and image #59

Comments

kiejo commented Feb 2, 2023

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Actual behavior

Affected runtime and version

Affected package manager and version

Affected OS and version

Build and bundle tools

wooorm commented Feb 2, 2023 • edited Loading

This comment has been minimized.

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023 • edited Loading

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023

kiejo commented Feb 2, 2023

wooorm commented Feb 2, 2023 •

edited

Loading

github-actions bot commented Feb 2, 2023 •

edited

Loading