Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Hide metadata header in markdown #7183

Open
sdbbs opened this issue Mar 28, 2021 · 12 comments
Open

Enhancement: Hide metadata header in markdown #7183

sdbbs opened this issue Mar 28, 2021 · 12 comments

Comments

@sdbbs
Copy link

sdbbs commented Mar 28, 2021

I would like to propose, as an enhancement, the same approach taken here Hide metadata header in markdown · Issue #527 · mwouts/jupytext :

Starting from Jupytext 1.6.0, the metadata header in Jupytext Markdown notebooks will look like this:

<!--
jupyter:
  jupytext:
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.2'
      jupytext_version: 1.4.2
  kernelspec:
    display_name: Python 3
    language: python
    name: python3
-->

and thus the metadata will be hidden on GitHub and also when the .md file is rendered as HTML.

In other words - allow that, instead of the default opening and closing "three-dashes" (---) strings, that define start and end of a YAML header block in Pandoc Markdown - the opening (<!--) and closing (-->) tags for HTML comments are used. In that way, the header would still be interpreted by Pandoc - while being fully hidden from typical automatic online parsers of Markdown to HTML (such as GitHub's).

Alternatively, allow that the very first line in a Pandoc Markdown document can start with an HTML comment, and that the starting --- of a Pandoc Markdown YAML header can be on the second line of the Markdown text file; in that way, probably most of the code that parses the YAML header can be kept (including starting and stopping ---), while still allowing for hiding the YAML header from online Markdown parsers.

@jgm
Copy link
Owner

jgm commented Mar 29, 2021

You can do -t gfm-yaml_metadata_block and the metadata block will be omitted.

@jgm
Copy link
Owner

jgm commented Mar 29, 2021

Or, if you want the metadata in an HTML comment, here's another trick you can already do: create a template ipynb.markdown as follows:

<!--
$meta-json$
-->
$body$

Then

pandoc my.ipynb --template ipynb.markdown -t gfm-yaml_metadata_block
<!--
{"jupyter":{"jupytext":{"text_representation":{"format_version":"1.2","jupytext_version":"1.4.2","extension":".md","format_name":"markdown"}},"kernelspec":{"display_name":"Python 3","name":"python3","language":"python"}}}
-->
my doc

@jgm
Copy link
Owner

jgm commented Mar 29, 2021

By the way, I kind of like the idea of putting metadata inside an HTML comment. I suggested exactly this in 2011 on the markdown-discuss mailing list.

In principle, we could create a new extension, yaml_metadata_in_html_comment, that enables this (for both input and output). But I'm reluctant to add to the gratuitous proliferation of syntax extensions.

@sdbbs
Copy link
Author

sdbbs commented Apr 12, 2021

Hi @jgm,

Many thanks for the feedback - and sorry I could not respond earlier!

You can do -t gfm-yaml_metadata_block and the metadata block will be omitted.

Was not aware of that option - however, I think it only helps if it is pandoc creating the HTML; what I want to do instead, is use a Markdown file otherwise intended for pandoc, in an automatic online Markdown->HTML parser, such as Github's.

Here is an example: I have an .md file, that is intended as a source for pandoc, with the intended pandoc output being PDF via Latex. However, I also keep this file in git, and in my online repository, I use https://github.com/gitbucket/gitbucket as a web interface to my git repositories.

When I access GitBucket, and try to open this .md file, I get something like this:

gitbucket_pandoc_md

In other words - the Markdown-HTML parser of Gitbucket did not recognize the YAML header block, and started interpreting eveything inside it as Markdown. Specifically, I have a line in the header:

# lines starting with # are YAML-level comments!

... and indeed, pandoc interprets this fine as a comment inside the YAML header - however, Gitbuckets Markdown parser intepreted it as plain Markdown, that is, it intepreted it as a heading.

So, if we could alternatively use say <!--- and ---> (note, three dashes!) as opening and closing of a YAML header block in a Markdown file in pandoc, then:

  • pandoc would intepret that section as a YAML header block, as intended
  • Other Markdown-HTML parsers (like Gitbucket's), will see the the <!-- as opening of HTML comment, ignore everything inside, and see the --> as a closing of an HTML comment, and thus will not print any text in the YAML header block in that Markdown file (and will be thus easier to look at online, since there will be no metadata text from the YAML header, and corresponding weird formatting, to interfere).

Or, if you want the metadata in an HTML comment, here's another trick you can already do: create a template ipynb.markdown as follows:

Thanks - that seems to be specific to Jupyter notebooks; I haven't really tried it, but it does not look to me, that it would help with my use case ( I want to keep a YAML header block in .md file, while hiding it from other Markdown parsers).

In principle, we could create a new extension, yaml_metadata_in_html_comment, that enables this (for both input and output). But I'm reluctant to add to the gratuitous proliferation of syntax extensions.

I guess that a new extension would help my use case personally - however, I see your point with "gratuitous proliferation", and I agree with it... So, maybe my suggestion above is worth considering:

  • <!--- (three dashes) could be an alternative syntax for opening a YAML block when pandoc interprets Markdown;
  • ---> (three dashes) could be an alternative syntax for closing a YAML block when pandoc interprets Markdown

... and all this "built-in" pandoc (i.e. without enabling an extension) -- and all other Markdown parsers would see a HTML comment here instead, and thus not process the text content of the YAML block.

@alerque
Copy link
Contributor

alerque commented Apr 17, 2021

With all due respect I think the onus should be on your other parser to support YAML meta data, not on Pandoc to hide it. If it doesn't need to do anything with it all they need to do is spot the standard YAML separators and discard the block. This is a very standard extension to Markdown and used by many many parsers. If you need to support something less featured then some kind of build step that exports the variant you need should be considered par for the course.

Hidden behind a non-default option flag I couldn't actually object to this being a "feature", but both the proliferation of options and the proliferation of format variants seems like a bad thing to me.

@tarleb
Copy link
Collaborator

tarleb commented Apr 17, 2021

Playing my broken "Lua filter" record again: if all else fails, here's a filter to make pandoc work with the syntax proposed by @sdbbs:

-- file: yaml-in-html-comments.lua
local meta

function RawBlock (raw)
  if raw.format == 'html' and raw.text:match '%<%!%-%-%-'then
    local yaml = raw.text:gsub('^<!%-%-%-', '---'):gsub('%-%-%->$', '---')
    meta = pandoc.read(yaml, 'markdown+yaml_metadata_block').meta
  end
end

-- set as document's metadata; could also do a merge instead (if necessary).
function Meta (_) return meta end

Use with pandoc --lua-filter=yaml-in-html-comments.lua ....

@tarleb
Copy link
Collaborator

tarleb commented Apr 17, 2021

I think the Lua filter solution should work well enough, so I'm closing this. Please reopen if the proposed solution proves to be insufficient.

@tarleb tarleb closed this as completed Apr 17, 2021
@jgm jgm reopened this Apr 17, 2021
@jgm
Copy link
Owner

jgm commented Apr 17, 2021

I'd like to keep this open for further consideration.

@sdbbs
Copy link
Author

sdbbs commented Apr 19, 2021

Thanks all for the comments:

With all due respect I think the onus should be on your other parser to support YAML meta data, not on Pandoc to hide it.

Yes, I should have mentioned, that I didn't easily decide to post this, because it obviously would increase the work/support load on the pandoc project - which as a happy user otherwise, I'd like to avoid.

This is a very standard extension to Markdown and used by many many parsers.

OK, I was not aware of this, thanks for mentioning it.

However, gitbuckets parser at least does not support it (yet); and my thinking was: if other platforms advertise simply "Markdown", and I tried to ask them for this enhancement (i.e. add code in their parsers that would ignore YAML headers), they could always point to the original Markdown spec https://daringfireball.net/projects/markdown/ and say that there is no mention of --- or YAML headers there.

Hidden behind a non-default option flag I couldn't actually object to this being a "feature", but both the proliferation of options and the proliferation of format variants seems like a bad thing to me.

Fully agree there.

But now that I have seen the lua filter in #7183 (comment) - I actually think I could live with it, since I use lua filters in my workflow anyways; so I guess, that particular lua filter solves my problem.

@alerque
Copy link
Contributor

alerque commented Apr 19, 2021

These days the Common Mark project is a much better place to point projects toward if you want them to have interoperable Markdown than the original Daring Fireball post, but you do have a point — as widespread as YAML meta data is (used by many publishing platforms, static side generators, even Markdown note taking applications!) it is still an extension to Markdown not part of Markdown itself. Even CommonMark thinks of it that way. The Pandoc flavor includes it by default, but having a way to wrap the extra data in a way that any CommonMark compatible parser would not break would be an interesting extension.

@sdbbs
Copy link
Author

sdbbs commented Apr 20, 2021

Thanks, @alerque :

These days the Common Mark project is a much better place to point projects toward if you want them to have interoperable Markdown

Thanks, good to know this!

Btw, I just found something going against my suggestion of <!--- (triple dash) as alternative for opening tag for YAML:

https://stackoverflow.com/questions/4823468/comments-in-markdown

I use standard HTML tags, like

<!---
your comment goes here
and here
-->

Note the triple dash. The advantage is that it works with pandoc when generating TeX or HTML output. More information is available on the pandoc-discuss group.

Not sure if this is still applicable though, tried <!--- vs <!-- on multiline (as in, \n line) text in my doc in pandoc 2.13, they both seemed to work fine. But in any case, there is a historical precedent of using <!--- for something else.

@alerque
Copy link
Contributor

alerque commented Apr 20, 2021

Triple dashes being treated differently was probably a bug. HTML comments are a nightmare to parse. Did you know -- is a field separator in comments? Yes comments have fields. And the get parsed for other things too. Some browsers overload them, some servers use them as preprocessing hints, and so on. They are minefields. In any case I don't think triple dashes are a good way to overload comments.

norwd added a commit to jirastopwatch/jirastopwatch.github.io that referenced this issue Feb 12, 2023

Verified

This commit was signed with the committer’s verified signature.
Jont828 Jonathan Tong
This was suggested in jgm/pandoc#7183 but I expect this might not be what I want

Signed-off-by: Y. Meyer-Norwood <106889957+norwd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants