-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add preprocess and postprocess hooks #2730
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
static passThroughHooks = new Set([ | ||
'preprocess', | ||
'postprocess' | ||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently both preprocess
and postprocess
are "passthrough" hooks, which means the return value of one function gets passed to the next hook.
In the future I could see us adding some hooks that would not pass the return to the next hook. For example the way we call renderer or tokenizer methods is not pass through. The first tokenizer is called and if it returns something other than false
we don't call the next tokenizer of the same name at all.
const hooks = marked.defaults.hooks || new Hooks(); | ||
for (const prop in pack.hooks) { | ||
const prevHook = hooks[prop]; | ||
if (Hooks.passThroughHooks.has(prop)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we determine what to do for the next hook based on whether the hook is a "passthrough" hook. Either we pass the return value to the next function or we stop on false
like tokenizers and renderers.
src/marked.js
Outdated
opt = merge({}, marked.defaults, opt || {}); | ||
const origOpt = { ...opt }; | ||
opt = { ...marked.defaults, ...origOpt }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the merge
helper in favor of the spread operator. This actually sped up marked's benchmarks by about 300ms (~5%).
docs/USING_PRO.md
Outdated
@@ -255,7 +255,7 @@ smartypants('"this ... string"') | |||
|
|||
<h2 id="walk-tokens">Walk Tokens : <code>walkTokens</code></h2> | |||
|
|||
The walkTokens function gets called with every token. Child tokens are called before moving on to sibling tokens. Each token is passed by reference so updates are persisted when passed to the parser. The return value of the function is ignored. | |||
The walkTokens function gets called with every token. Child tokens are called before moving on to sibling tokens. Each token is passed by reference so updates are persisted when passed to the parser. When [`async`](#async) mode is enabled, the return value is awaited. Otherwise the return value is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be a bad idea to create multiple PRs for this change:
I added them all here to get the complete picture and make sure my idea actually worked. |
@calculuschild @styfle thoughts? Are these type of extension hooks something we want to maintain? Pros
Cons
The alternative is for these types of extensions just wrapping marked and including it as a dependency. But there is no easy way to compose multiple extensions that way. |
I read the PR description but I don’t understand the purpose of this feature. Since the input of preprocess/postprocess is just the markdown/html, it seems like it would be equivalent to the current usage: const md = preprocess(markdown)
const html = marked(md)
const output = postprocess(html) Is there a use case that hooks cover that isn’t covered by this snippet? |
No, that is exactly what this does. The benefit of this PR is that extensions can do that so they can be composable. The examples that I give are frontmatter and dompurify. If someone creates an npm package that wraps marked to add front matter parsing and another package that wraps marked to do sanitization with dompurify a user wouldn't be able to use them together. With this PR the user would be able to include both or just one with |
My initial impression is this is moving a bit outside the scope of being a Markdown parser, and my inclination is that we should keep Marked.js that way. If I were a developer who wanted to have some kind of frontmatter parsing and a post-sanitizer, I would probably opt to just do it the way @styfle shows, passing the result from process to process rather than embedding everything inside of Marked.js.
I may be misunderstanding, but why would a package designer want to wrap marked.js like this anyway? Why not just use the base DOMPurify package as an external postprocessing step as we have always done? If a user wants to have both a pre- and post-processing step it seems more practical to just use the packages directly rather than having to build a specialized Marked.js-extension plugin that does the same thing? A lot of time has passed since #1878.... Now that custom extensions exist I wonder if its worth re-evaluating whether/which hooks are still necessary? |
The dompurify and front-matter example are fairly trivial examples, but if someone wants to create an extension like Table of Contents or footnotes that have multiple parts (a tokenizer/renderer and something that modifies the output html) then anyone who wants to use that extension would need to implement multiple parts instead of just being able to write
Ya I don't think all of those hooks are necessary which is why I started a new PR instead of building on that one. That was more of a POC to define which hooks would be needed to get rid of the current options so we could just have extensions. I would still like to deprecate a lot of the current options and find ways to move them to extensions. I agree this is somewhat out of scope for marked. @huaichaow do you have any input on this? I know you are trying to write an extension that would take advantage of these hooks. Is there any use cases we should consider that can't be worked around without these hooks? |
Hi @UziTech @styfle @calculuschild the idea is to have a place to clean up the side effects created in extensions, e.g., the Table-of-Contents extension I've been working on https://github.com/huaichaow/marked-toc-extension/blob/main/src/index.ts. all the
currently, it's not possible as the settings are cumulative, and there's no method to reset the
and most importantly, lifecycle hooks will make it more flexible and easier to create Extensions - more stages are available for use in extensions |
@styfle @calculuschild what is the final decision? Are we saying these type of hooks are out of scope and should be handled separate from the custom extension? For example in @huaichaow's example he could export a function that users of his extension should call after each marked run instead of being able to handle cleanup in the extension, like: import { marked } from 'marked';
import { tocExtension, tocCleanup } from 'marked-toc-extension';
marked.use(tocExtension);
marked.parse(someMarkdown);
tocCleanup();
marked.parse(someOtherMarkdown);
tocCleanup(); I would say I am still on the fence. On the one hand this is slightly out of scope. On the other hand this does seem to complement the custom extensions really well. I can think of many extensions that would need these hooks to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +0 on this one.
It seems it might be useful for some of those cases mentioned above.
Marked version: 4.2.12
Description
This is the start of implementing #1878
Add
preprocess
andpostprocess
hooks so extensions can process the markdown and options before sending it to marked and they can process the html after marked is done with it.A couple use cases come to mind. We can have a sanitize extension that works on the final output html. We can have an extension that parses front-matter for the options marked should use during the parsing.
Docs examples:
Example: Set options based on front-matter
Output:
Example: Sanitize HTML with isomorphic-dompurify
Output:
Contributor
Committer
In most cases, this should be a different person than the contributor.