Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser aggressively breaks paragraphs up by punctuation. #59

Open
yoshikischmitz opened this issue Jan 4, 2019 · 7 comments
Open

Parser aggressively breaks paragraphs up by punctuation. #59

yoshikischmitz opened this issue Jan 4, 2019 · 7 comments

Comments

@yoshikischmitz
Copy link

If I parse a sentence like this:

md.defaultBlockParse('Lorem Ipsum Dolor. Sit, Amet')

the output I would expect is:

[
    {
        "content": [
            {
                "content": "Lorem Ipsum Dolor. Sit, Amet",
                "type": "text"
            },

        ],
        "type": "paragraph"
    }
]

Instead what I get is:

[
    {
        "content": [
            {
                "content": "Lorem Ipsum Dolor",
                "type": "text"
            },
            {
                "content": ". Sit",
                "type": "text"
            },
            {
                "content": ", Amet",
                "type": "text"
            }
        ],
        "type": "paragraph"
    }
]

For longer documents this creates quite a lot of redundant nodes which leads to bloated output and potential performance issues. I think it'd be great if either the library didn't break text up like this in the first place or had a streamlining step at the end to merge them into a single node.

@yoshikischmitz yoshikischmitz changed the title Paragraphs get aggressively broken up by punctuation Parser aggressively breaks paragraphs up by punctuation. Jan 4, 2019
@ariabuckles
Copy link
Owner

Hi @yoshikischmitz !

What you're reporting definitely happens, and I'd like to see if we can address it.

Could you explain a little more about your use case? Are you parsing markdown and then saving the parse tree / AST to a database, and concerned about that json being overly-verbose? Or are you parsing markdown and then immediately outputting it to react/html/something else? Also, what version of simple-markdown are you using?

  1. If you're outputting to react-dom or html, the output() step should be doing the type of text-node-joining that you suggest:

    or had a streamlining step at the end to merge them into a single node.

  2. If you're using a different outputter (such as for react-native?), then as-of [email protected] there's a slightly improved outputter API that allows you to customize how adjacent text nodes are treated, and should do the right thing by default most of the time. The code to use that API looks like:

    var output = SimpleMarkdown.outputFor(rules, 'react');

    If this is still giving you multiple text nodes next to each other in the output, you can customize the Array output rule. See simple-markdown.js:808 for the default react rule for combining array elements.

  3. If you're saving the output of parse() to a database or file or something and specifically need text nodes joined in the parse tree, you can do that by writing an outputter that only handles arrays specially. Unfortunately, the syntax to do so right now is a bit tricky (and long term can maybe be simplified).

If you could elaborate on your use case, and whether it matches one of the scenarios above, and/or if you have some sample code (doesn't need to be runnable, but just so I can see a bit more of what you're doing), then I'd love to help you get one of the above solutions working. (Or if those aren't relevant, see what I can do to help in a different way).

@yoshikischmitz
Copy link
Author

@ariabuckles thanks a ton for the super detailed response! You guessed correctly, I am saving the output from the parser to the file system, serving it over the network to a client, which is then rendering it(I think this is a pretty cool use case of this library for mobile apps because it allows for rich text views that can be updated server side without relying on web views).

@ariabuckles
Copy link
Owner

Hi again @yoshikischmitz !

Thanks, that makes sense.

I'd recommend writing a custom outputter that does this AST transformation to join the nodes. This is somewhat clunky right now, but quite functional.

I've taken a shot at that here: https://gist.github.com/ariabuckles/039442f9e50525fada464ce34953725d (join-smd.js exports a function which takes an ast and outputs an ast with text nodes combined, which is demonstrated in test.js and test-readme.js). I did some basic testing, but it's possible I've missed testing some edge cases or wrote the table rules poorly; if anything comes up I'd be happy to fix it.

I'd love to make this API a bit easier to use in the future, but I've still got some edge cases and figuring things out to even see how a cleaner way might work.

Would something like that work for you?

@ariabuckles
Copy link
Owner

Closing this issue for inactivity, but please feel free to re-open or comment if the above solution doesn't work for you (or anyone reading this!)

@niksy
Copy link

niksy commented Feb 27, 2020

Is there any possiblity of having this kind of output as option inside this module? I’m working with Vue component which could benefit from generic joined output. Currently, React output is fine for everything, but it’s tightly coupled to implementation.

@ariabuckles
Copy link
Owner

@niksy I think so! I haven't written good docs for this yet (sorry >_<) but something similar is possible by providing an Array: rule, it just happens during output time rather than parse time.

Conceptually, the Array: rule specifies how to output an array found in the AST, like most rules specify how to output a node of a specific type found in the AST.

The default Array rule for react loops through the array elements, and when it finds text concatenates all the text nodes into one text node.

If you just need to customize how this joining works for Vue, you can provide your own Array: rule customized from react's (or technically provide any function that takes an array of parsed nodes and returns an array of nodes to be output).

If you need to do that before parse time, unfortunately something like the smd-join.js gist from above is the best way to do that for now.

If you have some example code for how you're using this with vue, I'd be happy to try to figure out how best to accomplish what you're going for!

P.S. I just started using vue at work; I'm excited to hear you're using simple-markdown with vue :). If there are some things I can do to make working with this in vue easier, I'm open to it!

@ariabuckles ariabuckles reopened this Feb 29, 2020
@niksy
Copy link

niksy commented Sep 9, 2020

Just wanted to chime in regarding Vue/React dilemma—we’re using React implementation with Vue components and it works great! It’s still coupled to React-specific implementation, but we’ve managed to create Vue components easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants