Replies: 1 comment
-
sentence-splitter can not configure whitespace. Example: import { parse } from '@textlint/markdown-to-ast';
import { splitAST, SentenceSplitterSyntax, split } from 'sentence-splitter';
import { traverse } from '@textlint/ast-traverse';
const onChange = () => {
const input = document.querySelector('#input');
// parse markdown
const AST = parse(input.value);
// collect Paragraph Nodes
const paragraphNodes = [];
traverse(AST, {
enter(node) {
if (node.type === 'Paragraph') {
paragraphNodes.push(node);
}
},
});
// parse each pargraph to sentences
const allSentences = paragraphNodes.flatMap((pNode) => {
const sentenceAST = splitAST(pNode);
const sentences = sentenceAST.children.filter(
(node) => node.type === SentenceSplitterSyntax.Sentence
);
return sentences;
});
document.querySelector('#output').textContent = allSentences
.map((sentenceNode) => sentenceNode.raw)
.join('\n---\n');
};
document.querySelector('#input').addEventListener('input', onChange);
onChange(); |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi. First, thank you for making this project!
I have tried to pass "\n" in the array of chars that should be treated as sentence separators. But this doesn't work for some reason. Basically, I am trying to get two sentences returned in a case list this one in your tests.
This is my code:
However, this does not have the expected effect. I still get only one sentence (with children split at the white space).
My use case is that I am trying to send the output of a LLM to a text-to-speech API sentence by sentence in order to decrease latency. The output is often markdown with lists such as
Thanks for any pointers!
Beta Was this translation helpful? Give feedback.
All reactions