Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composing Jupytext with Jupyter Nbconvert should preserve Markdown files #321

Closed
mwouts opened this issue Sep 1, 2019 · 9 comments
Closed
Milestone

Comments

@mwouts
Copy link
Owner

mwouts commented Sep 1, 2019

This problem was first revealed by jupyter-book/jupyter-book#283

There, we start from a .md file, convert it to a Jupyter Notebook, and then convert it back to a .md file using jupyter nbconvert.

Following this operation, the code cells in the Markdown document that don't have an explicit language are converted to raw cells by Jupytext, and therefore the formatting of those code extracts is lost when jupyter nbconvert converts the document back to Markdown.

@mwouts mwouts added this to the 1.3.0 milestone Sep 7, 2019
mwouts added a commit that referenced this issue Sep 21, 2019
Jupytext Markdown format in version 1.2 - #321
Raw cells are encoded using HTML comments (``<!-- #raw -->`` and ``<!-- #endraw -->``) in Markdown files.
Code blocks from Markdown files, when they don't have an explicit language, are displayed as Markdown cells in Jupyter
mwouts added a commit that referenced this issue Sep 21, 2019
mwouts added a commit that referenced this issue Sep 21, 2019
mwouts added a commit that referenced this issue Sep 21, 2019
@mwouts
Copy link
Owner Author

mwouts commented Sep 21, 2019

@choldgraf , an updated Jupytext Markdown format is available on branch 1.3.0. Now, when you open a Markdown file with Jupytext, and export it to Markdown again using nbconvert, you get something close(r) to the original.

To see exactly what I mean by 'close', you can have a look at this test:

def test_markdown_jupytext_nbconvert_is_identity(md_file):
"""Test that a Markdown file, converted to a notebook, then
exported back to Markdown with nbconvert, yields the original file"""
with open(md_file) as fp:
md_org = fp.read()
nb = jupytext.reads(md_org, 'md')
import nbconvert
md_nbconvert, _ = nbconvert.export(nbconvert.MarkdownExporter, nb)
# Our expectations
md_expected = md_org.splitlines()
# #region and #endregion comments are removed
md_expected = [line for line in md_expected if line not in ['<!-- #region -->', '<!-- #endregion -->']]
# language is not inserted by nbconvert
md_expected = ['```' if line.startswith('```') else line for line in md_expected]
# nbconvert inserts no empty line after the YAML header (which is in a Raw cell)
md_expected = '\n'.join(md_expected).replace('---\n\n', '---\n') + '\n'
# an extra blank line is inserted before code cells
md_nbconvert = md_nbconvert.replace('\n\n```', '\n```')
jupytext.compare.compare(md_nbconvert, md_expected)

You can also see the Markdown file being tested here: f2167f1 .

Please let me know if you'd like to play with that version. In that makes it simpler for you, I could publish a dev version on pypi.

@mwouts mwouts closed this as completed Sep 22, 2019
@choldgraf
Copy link
Contributor

Nice! It looks like in this case, code fences make the markdown file break into different cells, but those cells are still markdown cells so the formatting shouldn't change much, yeah?

This seems reasonable to me - so in this case, how would one mark in the md file that a code fence is executable?

@mwouts
Copy link
Owner Author

mwouts commented Sep 23, 2019

Thanks @choldgraf .

That is correct: with the implementation on 1.3.0-dev, code fences without a language, or with a language that is not understood by Jupytext, are mapped to isolated markdown cells (where the fences are preserved).

Code cells that do have a language are still mapped to executable code cells in Jupyter. Now, if you open every markdown file with 1.3.0-dev as a Notebook in Jupyter Book, you still have to decide when a particular Markdown file should be executed, or not. Maybe if it has a 'jupyter' entry in the front-matter YAML? (i.e. jupytext.notebook_metadata_filter != '-all')

@choldgraf
Copy link
Contributor

Ah gotcha - thanks for the clarification.

I'm still a little iffy on whether we can assume that just because a code fence has a language attached to it, that it can be executed, however I don't think this'll be an issue in Jupyter Book. Currently I'm doing what you suggest - read in a markdown file, check if there was jupytext configuration in there. If not, then dump the notebook back to a markdown string and create a new notebook w/ one cell in it.

mwouts added a commit that referenced this issue Oct 12, 2019
Jupytext Markdown format in version 1.2 - #321
Raw cells are encoded using HTML comments (``<!-- #raw -->`` and ``<!-- #endraw -->``) in Markdown files.
Code blocks from Markdown files, when they don't have an explicit language, are displayed as Markdown cells in Jupyter
mwouts added a commit that referenced this issue Oct 12, 2019
mwouts added a commit that referenced this issue Oct 12, 2019
mwouts added a commit that referenced this issue Oct 12, 2019
@mwouts
Copy link
Owner Author

mwouts commented Oct 13, 2019

@choldgraf , a release candidate that includes the fix for the raw cells in the Markdown format is available on pypi:

pip install jupytext==1.3.0rc0

I'm not sure you will want to open every Markdown file as a notebook in Jupyter Book again, but in theory this becomes possible with that version. Let me know if you can give it a try!

@choldgraf
Copy link
Contributor

Nice! I just opened it and it looks quite good. I still can't open markdown files in JB w/o jupytext metadata, as (at least in the jupyter book docs) it runs all of the jupytext pages, but there are lots of code cells that have syntax highlighting (e.g., python cells) but that aren't runnable code :-/

maybe there is some way to mark a code cell as not runnable?

@mwouts
Copy link
Owner Author

mwouts commented Oct 13, 2019

Thanks @choldgraf for having a look. Good news!

maybe there is some way to mark a code cell as not runnable?

Well, I'm not sure to see yet why you would want this - do you want to execute only a fraction of the cells, is that it? Anyway, here is what I am thinking of

  1. encapsulate the inactive cell in a Markdown region, or
  2. add an active="md" cell metadata, or an active-md cell tag to the cell, or a {"runtools": {"frozen": true}} cell metadata, to mean that it should not be active in the ipynb representation. Unfortunately, a quick test seems to suggest that this is may not be working properly on Markdown files... I will confirm.

Is that something like 2. that you are looking for? Maybe in the context of a Markdown file a tag like inactive-ipynb would sound more natural?

@mwouts
Copy link
Owner Author

mwouts commented Oct 13, 2019

I confirm that the tests on active/inactive cells do not cover the .md format, see https://github.com/mwouts/jupytext/blob/bf6049b9107fef171647345f84364cdb4c5aaf5e/tests/test_active_cells.py .

Also, how do you expect that Jupytext could make the cell not runnable? Should it turn it into a Markdown cell - probably the most reasonable way to make sure the cell is not runnable in Jupyter, without breaking the nbconvert compatibility? Or should we just add a tag to the cell, and let the execution preprocessor know that it should not execute those cells?

@mwouts
Copy link
Owner Author

mwouts commented Oct 16, 2019

The missing tests were added in 55ad5a9.

With that commit, the active metadata is not lost any more in the Markdown format. But it still has no effect on the Markdown file: this is because, unlike in scripts or R Markdown, cells are never expected to be active in Markdown! We'll see how to answer that specific question at #347 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants