Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create function for transforming JSON state to text representation #1428

Closed
vpontis opened this issue Jun 7, 2021 · 16 comments
Closed

Create function for transforming JSON state to text representation #1428

vpontis opened this issue Jun 7, 2021 · 16 comments
Labels
sponsor 💖 This issue or pull request was created by a Tiptap sponsor Type: Feature The issue or pullrequest is a new feature

Comments

@vpontis
Copy link

vpontis commented Jun 7, 2021

The problem I am facing
I let my users send newsletters to their audience (think Substack).

We store their custom rich text body as JSON and then when we want to send the email, we need to convert the message into HTML and text.

The solution I would like
I would like a function generateText that converts a JSON state to text.

This could be done in two ways:

  • the function generateText looks at nodes and converts them to text
  • we add a toText option on every Node / Mark in the schema and call that in order to convert a node to text

Alternatives I have considered

I am not using TipTap yet, but I wrote my own wrapper around ProseMirror and created this file to serialize our JSON state to HTML / text — https://gist.github.com/vpontis/e5e373ea9950b413c18c867a5aedf352

@vpontis vpontis added Type: Feature The issue or pullrequest is a new feature v2 labels Jun 7, 2021
@github-actions github-actions bot added the sponsor 💖 This issue or pull request was created by a Tiptap sponsor label Jun 7, 2021
@tobiasfuhlroth
Copy link

Looking for the same option. This would be really handy…

@BrianHung
Copy link
Contributor

There's a generateHTML from json method: https://github.com/ueberdosis/tiptap/blob/main/packages/html/src/generateHTML.ts if that's what you're looking for?

@tobiasfuhlroth
Copy link

There's a generateHTML from json method: https://github.com/ueberdosis/tiptap/blob/main/packages/html/src/generateHTML.ts if that's what you're looking for?

@BrianHung No, the requirement is to generate text similar to prosemirrors textContent.

@BrianHung
Copy link
Contributor

@tobiasfuhlroth What about https://prosemirror.net/docs/ref/#model.Node^fromJSON? You can use that server-side, just need the schema.

@tobiasfuhlroth
Copy link

@BrianHung Sorry, but i don't see how that is related to the feature request described above. Can you please elaborate?

@BrianHung
Copy link
Contributor

@tobiasfuhlroth A rough sketch:

import { Node } from "prosemirror-model";
const schema = editor.state.schema;
const jsonState = // get from database or somewhere
const doc = Node.fromJSON(schema, jsonState);
const textContent = doc.textContent;

@tobiasfuhlroth
Copy link

tobiasfuhlroth commented Jun 9, 2021

@BrianHung Ah ok, i see where you going at. The problem only starts with the last line.

Please see how ProseMirror's textContent works:

Currently there is no way in tiptap (at least that i know of), how to specify the text representation of a created node (other than of text type).

This is actually what this feature request is about.

@BrianHung
Copy link
Contributor

Ah, you want to customize the behavior for each specific node and or mark -- that is something novel. For the time while its unsupported, you could try to write something like the fragment.textBetween method that operates on a prosemirror node, and takes advantage of node specs and methods. That approach would be easier than operating on the JSON representation alone, as done in vpontis's gist.

@vpontis
Copy link
Author

vpontis commented Jun 9, 2021

@BrianHung thanks for the suggestions!

Here is another caveat: the browser version textContent is also pretty bad and it would be much nicer to get something similar to innerText

MDN has some explanation of the difference between textContent and innerText.

Another thing that would be nice about defining your own text representation is that you could prefix bullets with a dash or expand linked text

@tobiasfuhlroth
Copy link

@vpontis The implementation of textContent in ProseMirror has nothing to do with the DOM implementation of textContent. It is a custom implementation. See the source in the two links in #1428 (comment)

@BrianHung
Copy link
Contributor

Another thing that would be nice about defining your own text representation is that you could prefix bullets with a dash or expand linked text

This also sounds pretty close to the to_markdown functionality in prosemirror-markdown: you could use that as inspiration as well.

@hivokas
Copy link
Contributor

hivokas commented Jun 16, 2021

I also needed such function and due to the lack of one in the tiptap I implemented by own based on this piece of code:

function generateText(editor) {
  let text = '';
  let separated = true;
  const from = 0;
  const to = editor.state.doc.content.size;
  const blockSeparator = '\n';
  const leafText = null;

  editor.state.doc.nodesBetween(from, to, (node, pos) => {
    const textSerializer = editor.extensionManager.textSerializers[node.type.name]

    if (textSerializer) {
      text += textSerializer({ node })
      separated = !blockSeparator
    } else if (node.isText) {
      text += node.text.slice(Math.max(from, pos) - pos, to - pos)
      separated = !blockSeparator
    } else if (node.isLeaf && leafText) {
      text += leafText
      separated = !blockSeparator
    } else if (!separated && node.isBlock) {
      text += blockSeparator
      separated = true
    }
  }, 0)

  return text;
}

In order for custom nodes to be converted to the text correctly, renderText() must be defined.

@philippkuehn @hanspagel it would be great if textBetween() function is extracted to src/utilities so that it can be used to convert editor's content to the text representation.

@hivokas
Copy link
Contributor

hivokas commented Jun 16, 2021

Well, I've just decided to submit a PR :) #1482

@philippkuehn
Copy link
Contributor

@vpontis I’m currently building a getText and getTextBetween method and would like to tackle this one as well. Can you show me an example of a document as HTML and your expected text format? What would you expect as a line separator by default? (I tend to \n\n for blocks and \n for hard break)

You also want to do this server side, right?

@philippkuehn
Copy link
Contributor

I’ve implemented a first version of getText and generateText. More details here: #1482
Feel free to let me know what you think about it.

@vpontis
Copy link
Author

vpontis commented Sep 14, 2021

@philippkuehn thanks! It looks great :)

I think you are right about those line breaks!

For right now, we can't use this on the server side since our extensions are only on the client side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sponsor 💖 This issue or pull request was created by a Tiptap sponsor Type: Feature The issue or pullrequest is a new feature
Projects
None yet
Development

No branches or pull requests

5 participants