Skip to content

Commit

Permalink
Treat HTML as inline element
Browse files Browse the repository at this point in the history
  • Loading branch information
benbrandt committed Feb 10, 2024
1 parent e1befa1 commit 0b03460
Show file tree
Hide file tree
Showing 10 changed files with 114 additions and 84 deletions.
25 changes: 21 additions & 4 deletions src/unstable_markdown.rs
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,10 @@ enum SemanticLevel {
SoftBreak,
/// An inline element that is within a larger element such as a paragraph, but
/// more specific than a sentence.
/// Falls back to [`Self::Sentence`]
/// Falls back to [`Self::SoftBreak`]
InlineElement(SemanticSplitPosition),
/// Hard line break (two newlines), which signifies a new element in Markdown
/// Falls back to [`Self::SoftBreak`]
/// Falls back to [`Self::InlineElement`]
HardBreak,
/// thematic break/horizontal rule
Rule,
Expand Down Expand Up @@ -207,8 +207,8 @@ impl SemanticSplit for Markdown {
let ranges = Parser::new_ext(text, Options::all())
.into_offset_iter()
.filter_map(|(event, range)| match dbg!(event) {
Event::Start(_) | Event::End(_) | Event::Html(_) | Event::Text(_) => None,
Event::Code(_) => Some((
Event::Start(_) | Event::End(_) | Event::Text(_) => None,
Event::Code(_) | Event::Html(_) => Some((
SemanticLevel::InlineElement(SemanticSplitPosition::Own),
range,
)),
Expand Down Expand Up @@ -529,6 +529,23 @@ mod tests {
);
}

#[test]
fn test_html() {
let markdown = Markdown::new("<div>Some text</div>");

assert_eq!(
vec![&(
SemanticLevel::InlineElement(SemanticSplitPosition::Own),
0..20
),],
markdown.ranges().collect::<Vec<_>>()
);
assert_eq!(
SemanticLevel::InlineElement(SemanticSplitPosition::Own),
markdown.max_level()
);
}

#[test]
fn test_softbreak() {
let markdown = Markdown::new("Some text\nwith a softbreak");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ input_file: tests/inputs/markdown/markdown_basics.md
- " <li><a href=\"/projects/markdown/syntax\" title=\"Markdown Syntax Documentation\">Syntax</a></li>\n"
- " <li><a href=\"/projects/markdown/license\" title=\"Pricing and License Information\">License</a></li"
- ">\n <li><a href=\"/projects/markdown/dingus\" title=\"Online Markdown Web Form\">Dingus</a></li>\n"
- "</ul>\n\n\nGetting the Gist of Markdown's Formatting Syntax\n"
- "------------------------------------------------\n\n"
- "This page offers a brief overview of what it's like to use Markdown.\n"
- "</ul>\n"
- "\n\nGetting the Gist of Markdown's Formatting Syntax\n------------------------------------------------\n"
- "\nThis page offers a brief overview of what it's like to use Markdown.\n"
- "The [syntax page] [s] provides complete, detailed documentation for\n"
- "every feature, but Markdown should be very easy to pick up simply by\n"
- "looking at a few examples of it in action. The examples on this page\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ source: tests/text_splitter_snapshots.rs
expression: chunks
input_file: tests/inputs/markdown/markdown_basics.md
---
- "Markdown: Basics\n================\n\n<ul id=\"ProjectSubmenu\">\n <li><a href=\"/projects/markdown/\" title=\"Markdown Project Page\">Main</a></li>\n <li><a class=\"selected\" title=\"Markdown Basics\">Basics</a></li>\n <li><a href=\"/projects/markdown/syntax\" title=\"Markdown Syntax Documentation\">Syntax</a></li>\n <li><a href=\"/projects/markdown/license\" title=\"Pricing and License Information\">License</a></li>\n <li><a href=\"/projects/markdown/dingus\" title=\"Online Markdown Web Form\">Dingus</a></li>\n</ul>\n\n\nGetting the Gist of Markdown's Formatting Syntax\n------------------------------------------------\n\nThis page offers a brief overview of what it's like to use Markdown.\nThe [syntax page] [s] provides complete, detailed documentation for\nevery feature, but Markdown should be very easy to pick up simply by\nlooking at a few examples of it in action. The examples on this page\nare written in a before/after style, showing example syntax and the\n"
- "HTML output produced by Markdown.\n\nIt's also helpful to simply try Markdown out; the [Dingus] [d] is a\nweb application that allows you type your own Markdown-formatted text\nand translate it to XHTML.\n\n**Note:** This document is itself written using Markdown; you\ncan [see the source for it by adding '.text' to the URL] [src].\n\n [s]: /projects/markdown/syntax \"Markdown Syntax\"\n [d]: /projects/markdown/dingus \"Markdown Dingus\"\n [src]: /projects/markdown/basics.text\n\n\n## Paragraphs, Headers, Blockquotes ##\n\nA paragraph is simply one or more consecutive lines of text, separated\nby one or more blank lines. (A blank line is any line that looks like\na blank line -- a line containing nothing but spaces or tabs is\nconsidered blank.) Normal paragraphs should not be indented with\nspaces or tabs.\n\nMarkdown offers two styles of headers: *Setext* and *atx*.\nSetext-style headers for `<h1>` and `<h2>` are created by\n\"underlining\" with equal signs (`=`) and hyphens (`-`"
- "), respectively.\nTo create an atx-style header, you put 1-6 hash marks (`#`) at the\nbeginning of the line -- the number of hashes equals the resulting\nHTML header level.\n\nBlockquotes are indicated using email-style '`>`"
- "Markdown: Basics\n================\n\n<ul id=\"ProjectSubmenu\">\n <li><a href=\"/projects/markdown/\" title=\"Markdown Project Page\">Main</a></li>\n <li><a class=\"selected\" title=\"Markdown Basics\">Basics</a></li>\n <li><a href=\"/projects/markdown/syntax\" title=\"Markdown Syntax Documentation\">Syntax</a></li>\n <li><a href=\"/projects/markdown/license\" title=\"Pricing and License Information\">License</a></li>\n <li><a href=\"/projects/markdown/dingus\" title=\"Online Markdown Web Form\">Dingus</a></li>\n</ul>\n"
- "\n\nGetting the Gist of Markdown's Formatting Syntax\n------------------------------------------------\n\nThis page offers a brief overview of what it's like to use Markdown.\nThe [syntax page] [s] provides complete, detailed documentation for\nevery feature, but Markdown should be very easy to pick up simply by\nlooking at a few examples of it in action. The examples on this page\nare written in a before/after style, showing example syntax and the\nHTML output produced by Markdown.\n\nIt's also helpful to simply try Markdown out; the [Dingus] [d] is a\nweb application that allows you type your own Markdown-formatted text\nand translate it to XHTML.\n\n**Note:** This document is itself written using Markdown; you\n"
- "can [see the source for it by adding '.text' to the URL] [src].\n\n [s]: /projects/markdown/syntax \"Markdown Syntax\"\n [d]: /projects/markdown/dingus \"Markdown Dingus\"\n [src]: /projects/markdown/basics.text\n\n\n## Paragraphs, Headers, Blockquotes ##\n\nA paragraph is simply one or more consecutive lines of text, separated\nby one or more blank lines. (A blank line is any line that looks like\na blank line -- a line containing nothing but spaces or tabs is\nconsidered blank.) Normal paragraphs should not be indented with\nspaces or tabs.\n\nMarkdown offers two styles of headers: *Setext* and *atx*.\nSetext-style headers for `<h1>` and `<h2>` are created by\n\"underlining\" with equal signs (`=`) and hyphens (`-`), respectively.\nTo create an atx-style header, you put 1-6 hash marks (`#`) at the\nbeginning of the line -- the number of hashes equals the resulting\nHTML header level.\n\nBlockquotes are indicated using email-style '`>`"
- "' angle brackets.\n\nMarkdown:\n\n A First Level Header\n ====================\n\n A Second Level Header\n ---------------------\n\n Now is the time for all good men to come to\n the aid of their country. This is just a\n regular paragraph.\n\n The quick brown fox jumped over the lazy\n dog's back.\n\n ### Header 3\n\n > This is a blockquote.\n >\n > This is the second paragraph in the blockquote.\n >\n > ## This is an H2 in a blockquote\n\n\nOutput:\n\n <h1>A First Level Header</h1>\n\n <h2>A Second Level Header</h2>\n\n <p>Now is the time for all good men to come to\n the aid of their country. This is just a\n regular paragraph.</p>\n\n <p>The quick brown fox jumped over the lazy\n dog's back.</p>\n\n <h3>Header 3</h3>\n\n <blockquote>\n <p>This is a blockquote.</p>\n\n <p>This is the second paragraph in the blockquote.</p>\n\n <h2>This is an H2 in a blockquote</h2>\n </blockquote>\n\n\n\n### Phrase Emphasis ###\n\n"
- "Markdown uses asterisks and underscores to indicate spans of emphasis.\n\nMarkdown:\n\n Some of these words *are emphasized*.\n Some of these words _are emphasized also_.\n\n Use two asterisks for **strong emphasis**.\n Or, if you prefer, __use two underscores instead__.\n\nOutput:\n\n <p>Some of these words <em>are emphasized</em>.\n Some of these words <em>are emphasized also</em>.</p>\n\n <p>Use two asterisks for <strong>strong emphasis</strong>.\n Or, if you prefer, <strong>use two underscores instead</strong>.</p>\n\n\n\n## Lists ##\n\nUnordered (bulleted) lists use asterisks, pluses, and hyphens (`*`,\n`+`, and `-`"
- ") as list markers. These three markers are\ninterchangable; this:\n\n * Candy.\n * Gum.\n * Booze.\n\nthis:\n\n + Candy.\n + Gum.\n + Booze.\n\nand this:\n\n - Candy.\n - Gum.\n - Booze.\n\nall produce the same output:\n\n <ul>\n <li>Candy.</li>\n <li>Gum.</li>\n <li>Booze.</li>\n </ul>\n\nOrdered (numbered) lists use regular numbers, followed by periods, as\nlist markers:\n\n 1. Red\n 2. Green\n 3. Blue\n\nOutput:\n\n <ol>\n <li>Red</li>\n <li>Green</li>\n <li>Blue</li>\n </ol>\n\nIf you put blank lines between items, you'll get `<p>`"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ input_file: tests/inputs/markdown/markdown_basics.md
- "Web Form\">"
- Dingus</a>
- "</li>\n"
- "</ul>\n\n\n"
- "</ul>\n"
- "\n\n"
- "Getting "
- "the Gist "
- "of "
Expand Down
Loading

0 comments on commit 0b03460

Please sign in to comment.