Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated Markdown is missing line breaks between heading and image #59

Closed
4 tasks done
kiejo opened this issue Feb 2, 2023 · 14 comments
Closed
4 tasks done

Generated Markdown is missing line breaks between heading and image #59

kiejo opened this issue Feb 2, 2023 · 14 comments
Labels
👯 no/duplicate Déjà vu 👎 phase/no Post cannot or will not be acted on

Comments

@kiejo
Copy link

kiejo commented Feb 2, 2023

Initial checklist

Affected packages and versions

mdast-util-to-markdown 1.5.0

Link to runnable example

No response

Steps to reproduce

import {toMarkdown} from 'mdast-util-to-markdown'

const tree = {
  type: 'root',
  children: [
    { type: 'heading', depth: 1, children: [{ type: 'text', value: 'Test' }] },
    {
      type: 'image',
      url: 'https://example.com/image.png',
    },
  ]
}

console.log(toMarkdown(tree))

Expected behavior

It should log the following output:

# Test

![](https://example.com/image.png)

Actual behavior

It logs the following output:

# Test![](https://example.com/image.png)

It looks like this issue was introduced with this commit: 122101f

Affected runtime and version

[email protected]

Affected package manager and version

No response

Affected OS and version

No response

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Feb 2, 2023
@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

Hi! Please follow the mdast specification, and wrap your images in paragraphs. They may not appear on their own like that

Duplicate of #58

@wooorm wooorm closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2023
@wooorm wooorm added the 👯 no/duplicate Déjà vu label Feb 2, 2023
@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Hi! Thanks for taking the time to contribute!

Because we treat issues as our backlog, we close duplicates to focus our work and not have to touch the same chunk of code for the same reason multiple times. This is also why we may mark something as duplicate that isn’t an exact duplicate but is closely related.

Thanks,
— bb

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Hi team! It seems this post is a duplicate, but hasn’t been marked as such. Please post a comment w/ Duplicate of #123(no final .) to do so. See GH docs for more info.

Thanks,
— bb

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Feb 2, 2023
@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

Duplicate of #58

@wooorm wooorm marked this as a duplicate of #58 Feb 2, 2023
@kiejo
Copy link
Author

kiejo commented Feb 2, 2023

Thanks for the quick response! I can confirm that wrapping the image in a paragraph solves the issue, but I noticed that this causes the image alt text to get escaped in a way I did not expect. I would have expected the escaping of the image alt text to be treated in a similar way to the title text.

While this sounds like a separate issue, I thought it would make sense to bring it up in this context as I did not encounter this issue when not wrapping the image inside a paragraph (which I now understand is not spec compliant).

Here is an example for reference:

import {toMarkdown} from 'mdast-util-to-markdown'

const tree = {
  type: 'root',
  children: [
    { type: 'paragraph', children: [
      {
        type: 'image',
        url: 'https://example.com/image.png',
        alt: '*hello* [world]',
        title: '*hello* "world"',
      }
    ]},
  ]
}

console.log(toMarkdown(tree))

Output:

![\*hello\* \[world\]](https://example.com/image.png "*hello* \"world\"")

Expected:

![*hello* [world\]](https://example.com/image.png "*hello* \"world\"")

Please let me know if the current image alt text escaping is by design or if I should open a separate issue for this.

@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

Your expected markdown doesn’t work. Try pasting here:

![\*hello\* \[world\]](https://example.com/image.png "*hello* \"world\"")

![*hello* [world\]](https://example.com/image.png "*hello* \"world\"")

*hello* [world]

![hello world]


Please let me know if the current image alt text escaping is by design or if I should open a separate issue for this.

Escaping is definitely by design. That being sad, as this case shows, escaping is incredibly hard. If it can be improved, that’s of course great, but maybe it can’t.

@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

If this is your reduced test case:

![*alpha*](xxx) and ![\*bravo\*](yyy)

you should check out the alt text of the resulting HTML:

alpha and *bravo*


Note that the *s disappear. That’s because they actually are interpreted in images. Markdown “works” in there. But, the tags (<em> and </em> in this case) are ignored.

@kiejo
Copy link
Author

kiejo commented Feb 2, 2023

Thanks for the additional details and examples. I did not expect image alt text to be interpreted as Markdown as the Alternative mixin only specifies a simple string attribute just like the title attribute is defined as part of the Resource mixin.

While the alt and title attributes seem to be using the same definition, they are interpreted in different ways:

import {fromMarkdown} from 'mdast-util-from-markdown'
console.log(JSON.stringify(fromMarkdown('![*alt*](xxx "*title*")'), null, 2))

Output of image node:

{
  "type": "image",
  "title": "*title*",
  "url": "xxx",
  "alt": "alt"
}

I am wondering if it actually makes sense to interpret the image alt text as Markdown if the information gets discarded and is not even part of the generated AST. The only use case I could come up with so far, would be for Markdown syntax highlighting where the alt text of an image could be rendered in a stylized way. But when for example trying this in Visual Studio Code, the image alt text does not seem to be interpreted as Markdown and instead as simple text (which also matches my initial expectation):
img-alt-md

As you can see in the screenshot, the alt text of the image does not use italics or bold while the content of the link does.

At the same time it looks like the current logic matches GitHub's behavior, which might be reason enough to keep it as is.

@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

This isn’t about mdast, which is a specification for representing markdown as nodes. And, yeah, there these fields both use string as types.

This is about how markdown works. Markdown prescribes this. CommonMark-compliant markdown processors do this.

On the parsing side, this is because links, images, but also emphasis, are not parsed the way you think.
They are parsed in chunks: * is seen. Or ![ is seen. And ** is seen. And then ](#whatever).

If such a valid closing is found (e.g., ]). Then the parser looks back to see if there is a corresponding opening (e.g., ![).
Meanwhile, emphasis/strong (what I call attention) is parsed as separate runs around all that (*, **, ***).
And then “normal” things are parsed as whole units from left to right, such as &amp;, or <https://example.com>, or `asd`.

So you see, this parsing is already done. Only afterwards do we know that these things occurred in an image.
And then CommonMark says: well, <em> in there doesn’t make sense, drop those tags.

@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

@kiejo
Copy link
Author

kiejo commented Feb 2, 2023

Thanks for the additional explanation and the links. I wasn't aware of how much of the image description logic is already defined in the CommonMark spec. That's very helpful! I am finally able to understand why escaping in the image description is absolutely required as it supports much more than just simple text.

Reading the following in the CommonMark spec:

Syntax for images is like the syntax for links, with one difference. Instead of link text, we have an image description. The rules for this are the same as for link text, except that (a) an image description starts with ![ rather than [, and (b) an image description may contain links. An image description has inline elements as its contents. When an image is rendered to HTML, this is standardly used as the image’s alt attribute.

I think trying to more closely reflect this spec in the Image AST node could look like this:

type ImageDescriptionContent =
  Break | Emphasis | HTML | Link | LinkReference | InlineCode | Strong | Text

interface Image <: Node {
  type: "image",
  description: [ImageDescriptionContent]
}

Image includes Resource

I think this AST would be closer to the spec and it would be up to the renderer to decide how to best render the spec compliant image description as HTML. The renderer could follow the recommendation and use an alt tag and only render the plain string content without formatting (but it would also have the option of rendering it in a different way since this does not seem to be part of the spec). It would also make it more explicit and clear that the Markdown image description represents much more than the traditional alt attribute of an HTML img tag.

I assume that this approach has already been evaluated and I can see how a simpler AST might be the better overall trade-off if the majority of real-world cases involve the image description being set to a simple text anyways. But only exposing a simple string field for the image description in the AST feels like it somewhat defeats the purpose of being able to put more than just text in the image description in the first place.

@wooorm
Copy link
Member

wooorm commented Feb 2, 2023

:)

A bit about the spec: it’s also a bit of a vague document. It’s nice that it provides many examples, more than we had before, but there are still many edge cases. So we still need to look at what parsers actually do: how does GFM for example actually work?


Yeah, I’ve thought about adding children to images. Which would include stuff like code in there and such. The downside is that you then have two sources of truths. A plugin would edit one, but would likely forget about the other, or at least it’ll get out of sync. And that’s what an AST is for, to remove such complexity, to make it simple to change things.

One alternative is to remove alt completely, and only serialize this stuff when going to html. But that’s rather breaking.


Perhaps tho, you have interesting needs, for such rich content. By hearing them I might be able to advise better? E.g., perhaps you can use directives, which do allow rich content, and compile to some other element that has a figcaption or so?

@kiejo
Copy link
Author

kiejo commented Feb 2, 2023

I agree with the downside of having two sources of truths and I like the simplicity of the current AST.

My needs are actually quite simple, which is why I would personally prefer the image description to simply work like the alt attribute of an HTML img tag without characters like "*" or "_" getting dropped/interpreted or needing to escape them. It was quite unexpected to me that the image description supports all these inline elements in the spec as I have always thought of it as the alt attribute of an HTML image. With this mindset, using directives for more advanced needs makes a lot of sense, just like you suggested.

I guess the goal of the current image description spec was to make it consistent with the way links work as the syntax is so similar. So I think I understand the trade-off that was being made in the spec even if there might not be that many good use cases for all these inline elements in the image description. Or maybe there are common use cases that I just haven't encountered yet.

After having thought through several use cases and now having a better understanding of the CommonMark spec, I think you picked really good trade-offs for this library. The result is a spec compliant system, which hides some of the flexibility/complexity of CommonMark image descriptions while making it very easy to use for the 99% of cases where a simple alt attribute is all the user needs and expects. Great work!

Thanks so much for taking the time to answer all my questions and helping me better understand the design decisions of this library :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👯 no/duplicate Déjà vu 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

2 participants