Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update org.jetbrains.markdown from 0.3.1 to 0.5.2 #3231

Merged
merged 3 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ korlibs-template = "4.0.10"
kotlinx-html = "0.9.1"

## Markdown
jetbrains-markdown = "0.3.1"
jetbrains-markdown = "0.5.2"

## JSON
jackson = "2.12.7" # jackson 2.13.X does not support kotlin language version 1.4, check before updating
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,28 @@ public open class MarkdownParser(
).flatMap { it.children }
)

/**
* Handler for [MarkdownTokenTypes.ATX_CONTENT], which is the content of the header
* elements like [MarkdownElementTypes.ATX_1], [MarkdownElementTypes.ATX_2] and so on.
*
* For example, a header line like `# Header text` is expected to be parsed into:
* - One [MarkdownTokenTypes.ATX_HEADER] with startOffset = 0, endOffset = 1 (only the `#` symbol)
* - Composite [MarkdownTokenTypes.ATX_CONTENT] with four children: WHITE_SPACE, TEXT, WHITE_SPACE, TEXT.
*/
private fun headerContentHandler(node: ASTNode): List<DocTag> {
// ATX_CONTENT contains everything after the `#` symbol, so if there's a space
// in-between the `#` symbol and the text (like `# header`), it will be present here too.
// However, we don't need the first space between the `#` symbol and the text,
// so we just skip it (otherwise the header text will be parsed as `<whitespace>header` instead of `header`).
val textStartsWithWhitespace = node.children.firstOrNull()?.type == MarkdownTokenTypes.WHITE_SPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should trim not only the beginning.

Leading and trailing spaces or tabs are ignored in parsing inline content:

See https://spec.commonmark.org/0.30/#atx-headings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought, fixed

val children = if (textStartsWithWhitespace) node.children.subList(1, node.children.size) else node.children
Copy link
Member Author

@IgnatBeresnev IgnatBeresnev Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests started failing because they expected Header text, but got Header text (notice the whitespace in the beginning).

Quick debugging showed that before the update # header used to return the following nodes:

  • ATX_HEADER
  • ATX_CONTENT
    • TEXT

After the update, # header started returning

  • ATX_HEADER
  • ATX_CONTENT
    • WHITE_SPACE
    • TEXT
Debugger screenshots

Before

image

After

image

Maybe there's a nicer or a better way to fix it - I honestly didn't spend much time on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just interesting, how it worked before/after if there are multiple spaces in the beginning? f.e. # header - may be we need to filter all leading white space tokens here?

Copy link
Member Author

@IgnatBeresnev IgnatBeresnev Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent thought, I'll check 👍

From what I've seen, my guess would be that there will still be a single WHITE_SPACE element, but it will have a different offset (so from 0 to 3 instead of 0 to 1), but we'll see :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's how I expected - there's only one WHITE_SPACE element, but it's wider, so there's no need to do anything additionally.

I've added a test and a comment about it though. Also created #3233 after seeing a single test file with 1.6k lines of code 😅

2023-10-18_12-18-27


return DocTagsFromIElementFactory.getInstance(
MarkdownElementTypes.PARAGRAPH, // PARAGRAPH instead of TEXT to preserve compatibility with prev. versions
children = children.evaluateChildren()
)
}

private fun horizontalRulesHandler() =
DocTagsFromIElementFactory.getInstance(MarkdownTokenTypes.HORIZONTAL_RULE)

Expand Down Expand Up @@ -365,6 +387,7 @@ public open class MarkdownParser(
MarkdownElementTypes.ATX_5,
MarkdownElementTypes.ATX_6,
-> headersHandler(node)
MarkdownTokenTypes.ATX_CONTENT -> headerContentHandler(node)
MarkdownTokenTypes.HORIZONTAL_RULE -> horizontalRulesHandler()
MarkdownElementTypes.STRONG -> strongHandler(node)
MarkdownElementTypes.EMPH -> emphasisHandler(node)
Expand Down