-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extension to allow Critic Markup pass through #5430
Comments
Our earlier discussions about supporting CM at the AST level were thinking that the CM would be parsed into Inserted and Deleted elements, containing (presumably) inlines. There are some issues with that. A very lightweight change would be to add a This seems like a half-measure, but given the conceptual difficulties supporting CM at the AST level (discussed in the other issues), it might be useful. I'd be interested in hearing comments from other CM users. |
Another approach would be to have the On this approach, the CM delimiters would appear verbatim in all formats. |
Or |
I really like the idea of having a way to pass through all the CM syntax as RawInline Markdown. This would make my publishing workflow a lot easier. Regarding your second approach, wouldn't that mean other inline markup would not get parsed if it happens to fall inside CM? E.g.:
What would happen to the emphasis markup on |
Just to be clear, on both approaches the idea was to make the delimiters like |
To be more precise, CriticMarkup is a preprocessor for Markdown syntaxes. The reference implementation of CriticMarkup is actually a preprocessor (i.e. the markdown parser doesn't "see" the CriticMarkup. All existing implementation of CriticMarkup (that I know of) is happening at the preprocessor level. I've a tool at https://github.com/ickc/pancritic, what it does is to take the reference implementation of CriticMarkup (which is no longer maintained, so I cleaned it up and improved it a bit), and wrap pandoc inside it. So you could use pancritic as if it is pandoc (with a pandoc-like cli interface) if There's another issue here from a LaTeX package maintainer that has a nicer LaTeX output for CriticMarkup. I'm interested in implementing it but don't get time yet. Obviously there's a few issues there too. PR is appreciated, otherwise I might take a look at them this weekend (don't hold your breath though.) Edit: to be clear, I mean since CriticMarkup is happening at the preprocessor level, any "cleaning up" of the round trip markdown should also happens at the preprocessor level. This is very easy to do and you've a couple of options. Edit 2: the "issue" mentioned above is actually in pandoc-discuss: https://groups.google.com/d/msg/pandoc-discuss/sHoQhJsxEXw/9bN7cAwqCQAJ |
Actually I disagree here. It was originally conceived that way and the toolkit on the concept site acts that way, but I think this is both an oversight and a missed opportunity on their part. I would go so far as to say their own documentation is contradictory on this point. Their toolkit covers usage not related to pre-processing as well. Your own attempt to wrap full blown version of Pandoc inside a "pre" processor and suggestion that any round-trip cleanup would also happen "pre" highlights the concept of this being more than a pre-processor issue. It's both a pre and post issue, and hence why wrapping Pandoc makes sense at all. Right not I'm also both pre and post processing content to get it o round-trip. Hence the thought that Pandoc ought to be taught to treat the syntax as part of the document format. The fact is that anything that exists in Markdown source does no "at rest", it is part of the file and hence part of the syntax. Assuming that the only thing you would want to do with the syntax is remove it is selling the idea short. In my use case copy editing whole books, the life cycle of such edits (comments, suggestions, etc.) is much longer lived than a single pipeline. I don't just use Pandoc for final output where a preprocessor would have stripped the CM out. I want to actually do something with it on the output side, and I want to use Pandoc to normalize the source as I go along (part of the project linter is making sure the book source round-trips safely). |
Are you claiming CriticMarkup should/can be implemented as part of the AST, or are you proposing pandoc to have built-in pre/post-processor of CriticMarkup? If it’s the later, you ain’t disagreeing with me. |
@ickc I'm not sure how to answer that because –as much as I've reviewed the related issues and discussions– I'm not entirely clear on what the difference would be, particularly as an end user. For sure at least the former would be a boon to my workflow(s), but I can't get my head around why the latter wouldn't be better. There seems to be two main issues:
|
I have to say, CriticMarkup seems a bit of a mess in its present form. For example, if you try it on
you get this result: <code>code <del></code> this is deleted </del> which isn't even well-formed HTML. Oddly, their toolchain seems not to be just a preprocessor which does a markdown -> markdown translation prior to converting to HTML. That would make a lot more sense, and it would be easy to implement (10 line script). You'd have problems if your document contained code that had the CM delimiters in it, because they'd be treated as delimiters rather than literal text, but the present system has this problem too. One could imagine a CM-like system (perhaps using the same symbols) that created nodes in the pandoc AST instead of acting as a preprocessor. With this system, you wouldn't be able to put CM delimiters inside literal contexts, like code blocks or spans, and there would be some limits to the kinds of edits you could notate. But the advantage would be that, in principle, one could convert a document with the CM marks into, say, a Word document with track changes, or a LaTeX document using the changes package. This is something a preprocessor couldn't give you. |
@jgm The original toolchain isn't worth fiddlesticks. No offense to anybody involved, but it was more of a proof of concept than a reference implementation, and it suffers from a litany of ailments. I highly recommend ignoring it in this discussion. Maybe the syntax highlighters were useful for some editors, that's about it. Any serious use I have seen in the wild involves other systems, either home brews or with tools like @ickc's. The later thing you describe would be much more useful (even with it's limitations) than what we have currently. I'd much rather limits to what could be marked up this way and be able to interact between document formats than not have anything at all. Not being able to use CM markup inside code blocks is trivial considering the primary use for this is prose. |
Why would you think since the inception of CriticMarkup there’s no improvement? It is because the whole concept is flawed in making it a markdown syntax (I.e. happening at the AST level.) CriticMarkup is about tracking change at the source level and by definition that can cross any markup boundary making it impossible to have a spec (unless you really enumerate all the ways it is crossing boundary but is it really tractable?) And it is because of that it is decided historically that pandoc is not adopting that in the AST. And since pandoc is in no business in pre/post-processor, once it is decided it is not part of the AST, it is essentially decided that it would be a 3rd party effort to support that. (So one thing and one thing good and making it compossible.) Of course I’m not opposing to making it part of the AST if @jgm agrees, even if that means a more restrictive CriticMarkup. From time to time he has changed in his mind. |
I'm guessing there's also relatively little interest in CriticMarkup because a lot of people that use markdown (and pandoc), also use a version control system like git, that comes with diff tools. For example, for prose I use:
|
But CriticMarkup is different. It is more like a collaboration tool then a personal diff tool. For this reason like @jgm said if it gains native support and can be converted back and forth to Words’ track change then it’s going to be very helpful. In pancritic I implemented output to LaTeX diff using the changes package. There’s an issue over there requesting converting from docx track change to CriticMarkup, while I might have an idea how to do that, having native pandoc support is much better (all 2 way streets will be much easier.) About CriticMarkup in AST, I wonder if it is possible to solve the boundary crossing problem by normalizing syntax (I.e. syntax closed and opened again when CriticMarkup boundary is crossed.) |
It’s a different thing, it takes 2 inputs and take a diff. CriticMarkup is one of its output format. Pancritic is to take the diff as part of the document (eg a track changed authored, or a CriticMarkup written in the same document.) And did the README didn’t mention it or what? Because it doesn’t seem to do what you said it does. It mentioned both as possible output formats but not one to another. |
Skimming through that thread, approaching the end the discussion really goes towards more CriticMarkup related. Merging the 2 issues? However what I said up there is slightly different. I think the kind of native support one would want to have for CriticMarkup in the AST is really dedicated AST elements for them. I think @jgm might means this up there but I could be wrong. |
@ickc see the last example at the bottom of the README ( |
Interesting. Then I probably should not reinvent the wheel but just close that issue by referring to this (although the 2 languages are different. But they are composable.) They got to mention that in the readme though... do you know if it has a CriticMarkup reader? (Pancritic is essentially a CriticMarkup reader and pandiff from the readme is a CriticMarkup writer (which takes a different kinds of inputs.)) |
It does now. This can also serve as a preprocessor for normalising CriticMarkup syntax as you suggested (see |
@mb21 I hear you loud and clear. In fact my own personal workflow is strongly with you — and uses CriticMarkup serves a purpose that other tooling does not serve well, and it serves best when it is an integral part of both the input and output formats — in other words when it can be kept intact through the whole pipeline. |
@alerque, I think @mb21 is just trying to explain why CriticMarkup hasn't gained much interests from the pandoc community. Discussions like this happened long time ago, repeatedly. That's why so many people tried to DIY when this is needed. Pandoc's development isn't based on needs. To a certain extent it isn't even based on volunteering. What I mean is even someone spent the time to do the hard work, it might not be merged if the community doesn't agree on that feature. (Not happened often, and just my observations.) Pandoc community seems to like to spend the time needed to decide on the right feature to add, or the right way to implement something. In the case of CriticMarkup, historically there's enough flaws to deter it to be included in the AST (I mean having AST elements that makes it possible to be in Markdown and other formats.) And among those @jgm's opinion takes most weights. But from my experience once the "philosophical" problems are solved, implementations often comes very quickly. So I think the fastest way to move this forward is to think of a design that is convincing (i.e. doable, not too ugly/hacky, and not too much of a compromise.) In the past people have failed to convince this. But recent developments and usage patterns (e.g. like Word's track change) or may be new ideas could be changing that. I'm counting on you to convince us ;) (Just my 2 cents though.) |
Seems pretty good. I'll try it out later when I got time. Also, did you advertise it in pandoc-discuss and the wiki? I might have missed it. There's a feature request in mine that yours seems to already be supporting so after trying it out I might just direct people needing that to use yours. |
There is a wonderful [1] https://gist.github.com/noamross/12e67a8d8d1fb71c4669cd1ceb9bbcf9 |
I have reviewed issue #2873 regarding supporting Critic Markup. Personally I seriously think that needs to be revisited (see also #1560), but this is a different issue.
Critic Markup is an extension to Markdown syntax. It currently quite an ordeal to mix and match the use of CM in a workflow inlovling Pandoc. The arguments made in the other issue about how CM should be handled only cover use cases where you are either a) outputting to some special non-published format for review or b) resolving the status of an edit. The use case is for copy-editing books and translations of books at a publishing company. As such our markup has a much longer lifespan than this, and often we want to pass it though our publishing pipeline with the markup intact.
One of the things we do is normalize our Markdown by passing it through Pandoc periodically. We are also using downstream tools that know what to do with CM in the output.
Pandoc is currently escaping several aspects of CM syntax in Markdown formats. For example:
This leaves me in a situation where I have to preprocess the text before Pandoc sees it replace all the CM with tokens it won't care about, then convert the tokens back to markup on the other end.
I think there should be an extension to allow know instances of CM through:
This would allow Pandoc to be used as a pre-processor on Markdown files that include Critic Markup without molesting the source.
Note the
{++add++}
,{--remove--}
, and{==highlight==}
syntaxes don't have any characters that get escaped in normal usage, only the strike, change, and comment syntaxes are a problem.The text was updated successfully, but these errors were encountered: