Next stage in the plugin architecture #631

Michael-F-Bryan · 2018-02-24T10:37:11Z

We merged a deliberately minimal Preprocessor phase in #532, so it might be time to start thinking of how the concept of a preprocessor can be extended.

After looking through @Byron's example preprocessor (#629) it's pretty obvious that the API for accessing a book's contents less than ideal, and making the Book API more ergonomic will probably go a long way in allowing third parties to create their own plugins.

This is more of a discussion thread so we can throw around ideas and opinions. @Byron, @sorin-davidoi, @azerupi and anyone else who's interested, feel free to chime in with ideas or suggestions 😁

To get the ball rolling I think a couple things we could do are:

Expose the sections item inside a Book so people can manually walk through a Book's contents
Allow people to provide their own preprocessor programs, similar to how alternate backends work
Possibly consider going from a tree-based structure for representing book chapters to a graph-based structure
- The idea is a nested chapter can have links back to its parent (and maybe siblings?), making things like section numbers and breadcrumbs a lot easier to implement.
- Parsing the SUMMARY.md file may become easier because we can avoid the ugly recursion needed to append items to a chapter.
- To deal with ownership we'll probably want to use something like petgraph instead of rolling ourselves
- Ensuring the entire thing is consistent after doing mutations may be interesting.
- This breaks all current alternate backends (due to different JSON representations) and will probably make serializing/deserializing more tricky

The text was updated successfully, but these errors were encountered:

Byron · 2018-02-24T16:47:53Z

Thanks @Michael-F-Bryan for getting started with this, and for involving us :)!

For me the top-priority would be to allow preprocessors to be standalone programs, using a plugin mechanism similar to what renderers can already do.
Otherwise, people would have to provide their own clones of basic mdbook functionality, like termbook did, just because there is no other way to get a preprocessor in.

Right after that, and to my mind a very low hanging delicious fruit, is to allow iterate section more comfortably, while supporting 'standard' error handling. This should read as 'not what this preprocessor example' has to do.

And here is where I would already make the cut, just because this would already solve all the problems I had while working on termbook.

Besides that, I do like the simplicity of it, which also means I like the tree chapters form, reflecting clearly the SUMMARY.md they are coming from.
I can imagine that certain preprocessors require more, and maybe a petgraph powered data structure would just be their thing.
For renderers, I think it would suffice if it was easy to convert the current tree into a graph, for example, and they can use that to their hearts content to generate their output.
For mutable access with a graph, as possibly required by preprocessors, I would probably want to postpone to another round, until there is demand, just because the whole thing gets a lot more complicated then. (I am assuming here that the current tree structure is kept as foundation in any case).

My general driving forces are

keep it simple
wait until there is actual demand with a clear usecase
let's not get ahead of ourselves (keep it simple, phrased differently :D)

Michael-F-Bryan · 2018-03-05T08:45:14Z

I'm going to copy across the conversation from #626 (comment). It's quite relevant to this issue.

@Michael-F-Bryan #626 (comment)

When the work for this issue is done, would you prefer a squashed commit? Or do you want to keep the history?

I usually merge PRs with the "squash and merge" button because it leads to a much cleaner git history. GitHub will do the squashing for us, so you shouldn't need to do anything.

Mathjax is only relevant for the HTML renderer and a mathjax_support option can be found in the HtmlConfig.

This is one of the things I was referring to in #629 (comment) when I said:

... I'm stumped on a couple things, for instance some preprocessors will only be relevant when combined with a specific backend ...

I was thinking a user would add the mathjax preprocessor in their book.toml (or we'd add it as part of the default list if the user doesn't specify any preprocessors) and then for each backend we'll run the entire preprocessor pipeline independently. You can then tell each preprocessor which renderer it's being run for (the Renderer trait requires a name() method), and it then becomes the preprocessor's responsibility to change its behaviour (e.g. run or return early) based on that.

Or expressed as code:
// when generating the book

for renderer in &self.renderers {
  let mut book = self.book.clone();
  let name = renderer.name();
  
  for preprocessor in &self.preprocessors {
    let ctx = /* ... */;
    preprocessor.run(ctx, name, &mut book)?;
  }

  renderer.render(...)?;
}

// in the MathJax preprocessor

const RENDERER_WHITELIST: &[&str] = &["html"];

impl Preprocessor for MathJax {
  fn run(&self, ctx: &PreprocessorContext, renderer: &'static str, book: &mut Book) -> Result<()> {
    if !RENDERER_WHITELIST.contains(renderer) {
      return Ok(());  // there's nothing to do. Bail early.
    }

    /* do usual preprocessing */
  }
}
This is probably going to be the most scalable approach because we aren't using an ad-hoc side-channel for communicating whether the preprocessor should run. It does allow you to couple preprocessors to specific renderers, but in this case I'd say the MathJax preprocessor is already innately coupled to the HTML renderer so it's not necessarily a problem. If a preprocessor doesn't really care which renderer it's being run for (e.g. the current link rewriters), it can always ignore the renderer argument.

@dvberkel sorry for the wall of text! It's a non-trivial problem and I can imagine as we start using preprocessors for more stuff we'll need to find a more generic solution. I'm kinda hesitant to add custom logic to the determine_preprocessor() function (your second point) because that couples it to the HTML renderer and won't work for 3rd party preprocessors.

@dvberkel #626 (comment)

@Michael-F-Bryan Don't worry about the wall of text. It conveys your point very well. I would rather want to read more, think and do a well-informed job then a rush to a mediocre solution.

Is what you describe already in place? Or is it something that needs to be done?

@Michael-F-Bryan #626 (comment)

Is what you describe already in place? Or is it something that needs to be done?

It's one of the ideas I've been thinking about, but nothing is implemented yet. I started #631 so we have a place to discuss these sorts of things.

Of itself, introducing this renderer argument to Preprocessor::run() is a fairly trivial change to make. I just want to hear more people's opinions before implementing anything.

For example what do we do if a user wants to say "run the MathJax preprocessor when building an EPUB"? With this system the preprocessor's whitelist would be hard-coded and wouldn't allow that. A preprocessor could add a setting under its table in book.toml (e.g. renderers = ["html", "epub"]), but then we're just swapping one ad-hoc side-channel for another.

Another option is to let mdbook decide whether to run a preprocessor using the previous renderers = ... setting, falling back to some sort of hint from the preprocessor if it's not available.

e.g. the user could manually specify which renderer to run the MathJax preprocessor for:
[preprocessor.mathjax]
renderers = ["html", "epub"]
And if the renderers key isn't provided, we use a fallback, probably by adding a fn hint_renderer_is_supported(&self, render: &str) -> bool method to the Preprocessor trait.

FreeMasen · 2019-04-11T22:45:21Z

I am curious what the appetite for breaking up the mdbook library into a few smaller crates to allow for easier plugin development?

Currently if you want to build a preprocessor you are required to bring all filesystem portions of mdbook even though your application will only be concerned with reading from stdin, Book and sending that back out via stdout.

It seems silly to have to compile notify, walkdir and tempfile even though my preprocessor is never going to touch the file system.

In my head the overall structure might look something like this

mdbook-core
- would contain the contents of src/book/book.rs except for create_missing and load_book
mdbook-preprocessor
- Would contain Preprocessor, PreprocessorContext and CmdPreprocessor
- Would re-export mdbook-core
mdbook-renderer
- would contain Renderer, RendererContext and CmdRenderer
- Would re-export mdbook-core

By moving these items into their own library crates, you would eliminate the need for these extra dependencies.

Sorry if this isn't the appropriate place to make this comment. I would be happy to participate in a similar conversation elsewhere or open a new issue to discuss this further.

FreeMasen · 2019-04-13T16:06:39Z

For a blog post I am working I have have set up what I described above as a fork or the current master. You can find it here.

https://github.com/freemasen/mdbook

Michael-F-Bryan mentioned this issue Mar 5, 2018

WIP: Improve MathJax Support #626

Closed

rbuckland mentioned this issue Aug 30, 2018

Configurable preprocessor #658

Merged

ehuss added the A-API Area: API label Jul 9, 2019

WofWca mentioned this issue Aug 2, 2019

feature: Add chapther numbers to search result breadcrumbs #992

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next stage in the plugin architecture #631

Next stage in the plugin architecture #631

Michael-F-Bryan commented Feb 24, 2018

Byron commented Feb 24, 2018

Michael-F-Bryan commented Mar 5, 2018

FreeMasen commented Apr 11, 2019

FreeMasen commented Apr 13, 2019

Next stage in the plugin architecture #631

Next stage in the plugin architecture #631

Comments

Michael-F-Bryan commented Feb 24, 2018

Byron commented Feb 24, 2018

Michael-F-Bryan commented Mar 5, 2018

FreeMasen commented Apr 11, 2019

FreeMasen commented Apr 13, 2019