-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async marked #2474
Async marked #2474
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@calculuschild @styfle @joshbruce @davisjam what do you think about this approach? Is it too much to maintain? We could probably move more of the duplicate code to helpers. |
@jimmywarting - Saw you had a PR about async and curious if you have thoughts. @UziTech - Is the extra maintenance from this approach because there's the non-async and then the async here? Is this a fairly standard approach for the JS community? (Making sure the Marked community can easily understand and update.) |
it's a pity that we need to have two version of marked... one being async and the other sync. either way, i think i might prefer to only use the async b/c i'm not that concerned about the performances b/c i would only use it on the client side and process one small chunk of markdown that would resolve pretty quickly, but i could understand if this would have to run on a server and be pretty fast to process lots of markdown data at once The way i did it in my PR was in my opinion more stream friendly with iterators that yields chunks bits at a time. for (let chunk of iterable) {
if (typeof chunk !== string) chunk = await chunk
res.write(chunk)
}
res.end() iterable have the posibility to be turned into a stream as well using with my PR we could keep much of the logic the same. for await (let chunk of iterable) res.write(chunk)
res.end() |
@joshbruce ya the reason for the extra maintenance is because we have to maintain files that are almost the same just not quite. For instance anytime we fix a bug in the tokenizer we will have to fix it in two files. The async code is a little under half the speed of the sync version (bench sync 5000ms, bench async 11000ms) @jimmywarting the iterable approach doesn't work for renderers that return false to fall back to original renderers. If that feature didn't exist the async code could be much easier. |
That sounds like it could be prone to error, especially for someone who is a new contributor to the project.
That sounds desirable since it would help reduce the chance of needing to modify two files to fix a single bug.
I'm still a little skeptical that this change is needed at all. It seems like the "fetch image from CMS" use case can be solved with the current lexer and parser. Something like this: const tokens = marked.lexer(md);
for (const token of tokens) {
if (token.type === 'image') {
// mutate token
}
}
const html = marked.parser(tokens); |
I hadn't thought of that approach. That actually seems better. Perhaps all that we have to make async is the |
after playing around with this a little bit it looks like we could just return the values from walkTokens and await those to do just about anything async. The async code would look something like: const tokens = marked.lexer(md);
await Promise.all(marked.walkTokens(tokens, (token) => {
if (token.type === 'image') {
return fetch(...)
.then(res => res.json())
.then(data => {
token.href = ...
});
}
}));
const html = marked.parser(tokens); We could do this inside marked to just allow an option. This will only slow down marked if they specify const html = await marked(md, {
async: true,
walkTokens() {
if (token.type === 'image') {
return fetch(...)
.then(res => res.json())
.then(data => {
token.href = ...
});
}
},
}) |
@@ -316,8 +316,9 @@ export class Lexer { | |||
return tokens; | |||
} | |||
|
|||
inline(src, tokens) { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
updates? |
@vixalien Thanks for the push! 😉👍I had this almost done for a while. I changed this to use the suggestion by @styfle in #2474 (comment). I made the The way I see this working is anyone who wants to do anything async will do it by awaiting Something like: const tokens = marked.lexer(markdown)
await Promise.all(marked.walkTokens(tokens, asynFunction))
const html = marked.parser(tokens) To make the const walkTokens = asyncFunction
marked.use({walkTokens, async: true})
const html = await marked.parse(markdown) or in an extension: const importUrl = {
extensions: [{
name: 'importUrl',
level: 'block',
start(src) { return src.indexOf('\n:'); },
tokenizer(src) {
const rule = /^:(https?:\/\/.+?):/;
const match = rule.exec(src);
if (match) {
return {
type: 'importUrl',
raw: match[0],
url: match[1],
html: '' // will be replaced in walkTokens
};
}
},
renderer(token) {
return token.html;
}
}],
async: true, // needed to tell marked to return a promise
async walkTokens(token) {
if (token.type === 'importUrl') {
const res = await fetch(token.url);
token.html = await res.text();
}
}
};
marked.use(importUrl);
const markdown = `
# example.com
:https://example.com:
`;
const html = await marked.parse(markdown); TODO
|
@UziTech I tried out your promising work and found a potential issue. It seems it breaks link (
I have the code at https://github.com/gustwindjs/gustwind/blob/bug/async-links/site/transforms/markdown.ts . The interesting point is that when you remove that |
@UziTech I managed to solve it. I had the following piece of code as per your example: start(src: string) {
return src.indexOf(":");
}, After removing, it fixed the links. Probably what happened is that it was trying to apply the extension on links since they often have My apologies for the noise. 😄 |
@UziTech Is the In any case, I think it's a great addition to the API as it allows a lot of imaginative uses. It could be even worth mentioning some use cases at the docs to inspire people. |
@bebraw the start is needed to make sure the paragraph and text tokenizers doesn't consume too much.
For example, without the start function the above markdown would be seen as one paragraph since there is no blank line between the lines and it doesn't know to stop at |
@styfle @joshbruce @calculuschild @davisjam Any objections to this? I think it will help allow a whole new type of async extensions for marked without any breaking changes. The only down side is an added option ( see the added documentation at https://marked-website-git-fork-uzitech-async-marked-markedjs.vercel.app/using_pro#async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks fine to me. I don't really have any feedback other than what @styfle has already mentioned.
Co-authored-by: Steven <[email protected]>
Co-authored-by: Steven <[email protected]>
What do you think about adding a I know Jasmine ran into an issue when adding async expect where people would forget to await the We could do something similar. If the |
Sounds good. Also the PR description no longer matches the content so we should probably update to avoid confusion to readers in the future. |
After working on the For now I think this is good to go once you approve @styfle. 😁👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
# [4.1.0](v4.0.19...v4.1.0) (2022-08-30) ### Features * add async option ([#2474](#2474)) ([994b2e6](994b2e6))
🎉 This PR is included in version 4.1.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
yay! |
Thanks for the work on this. I was thinking of using Marked in my Express backend which streaming OpenAI responses. Those responses can be markdown but I'm not sure if this feature would support my use case. My goal is to simulate what OpenAI ChatGPT is doing in their client. I think I have two options as each OpenAI stream arrives.
So as I say this... I'm thinking #1 is my best option which means I do not need Async Marked, but thought I would put that question out for folks who may have wandered here like I did. Again, thanks for the work! |
Description
The
walkTokens
function returns the values of the function so someone can await any Promises.The way I see this working is anyone who wants to do anything async will do it by awaiting
marked.walkTokens(tokens, callback)
Something like:
To make the walkTokens option be able to be async we have to introduce an
async
option so users can instead write:or in an extension:
Contributor
Committer
In most cases, this should be a different person than the contributor.