Pandoc format for Jupyter notebook #208

mwouts · 2019-03-27T06:06:59Z

Pandoc allow users to create Jupyter notebook from Markdown files, cf. the documentation. We'll try to plug pandoc into Jupytext and see if that is usable.

The text was updated successfully, but these errors were encountered:

#208

mwouts · 2019-03-27T06:22:12Z

Interesting! Pandoc reformats the Markdown contents. A round-trip on a notebook with a cell

# Header

converts it to

Header
======

mwouts · 2019-03-27T06:24:37Z

The representation of the test notebooks, in the pandoc format, are here: https://github.com/mwouts/jupytext/tree/1.1.0_pandoc_with_mirror_tests/tests/notebooks/mirror/ipynb_to_pandoc

See for instance a notebook with cell metadata.

As our test notebooks have atx headers #208

#208

mwouts · 2019-03-27T22:09:16Z

On pandoc-discuss, @jgm suggested to use the --atx-headers option for pandoc. If in addition we ignore line returns in Markdown cells, we do have identical contents for Markdown cells on round trips.

However, the round trip on the notebook itself is still not satisfactory. None of the 18 test notebooks passes the round-trip test. John, would you like me to report this at jgm/pandoc?

For instance, the round-trip of jupytext/tests/notebooks/ipynb_py/Notebook with function and cell metadata 164.ipynb replaces the first code cell

{
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "10"
    }
   },
   "outputs": [],
   "source": [
    "def f(x):\n",
    "    return x"
   ]
  },

with a markdown cell:

 {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::: {.cell .code execution\\_count=â€œ2â€�\n",
    "attributes=â€œ{â€�nâ€œ:â€�10â€œ,â€�idâ€œ:â€�â€œ,â€�classesâ€œ:\\[\\]}â€�}\n",
    "\n",
    "``` python\n",
    "def f(x):\n",
    "    return x\n",
    "```\n",
    "\n",
    ":::\n",
    "\n",
    "::: {.cell .code execution\\_count=â€œ3â€�\n",
    "attributes=â€œ{â€�nâ€œ:â€�10â€œ,â€�idâ€œ:â€�â€œ,â€�classesâ€œ:\\[\\]}â€�}\n",
    "\n",
    "``` python\n",
    "f(5)\n",
    "```\n",
    "\n",
    "    5\n",
    "\n",
    ":::"
   ]
  },

jgm · 2019-03-27T23:34:59Z

I don't think this is an issue with pandoc. Saving your notebook as x.ipynb, I tried

pandoc -s x.ipynb -o x2.ipynb --wrap=preserve

and then used json-diff to see the differences. There were just three, all quite minor:

[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/toc/base_numbering","value":"1.0"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]

That is:

nbformat_minor changes to 5 (since we target 4.5 in the ipynb writer).
base_numbering in toc in the metadata changed from 1 to "1.0".
codemirror_mode / version changed from 3 to "3.0".

The numbering changes definitely look like bugs and could be reported. [EDIT: I've fixed them already.] The nbformat_minor change is more debatable. Currently pandoc's ipynb writer targets a single version, but it could be trained to respect the minor version number of the input if it's present in metadata. [EDIT: got this one too.]

EDIT: After the code changes noted above, I tried all the ipynb files in that directory with pandoc --wrap=preserve -s -f ipynb -t ipynb --atx-headers and tested them all with json-diff. Discarding insignificant differences due to newlines in the encoded images, I got just these:

Notebook_with_many_hash_signs.ipynb
[{"op":"replace","path":"/cells/0/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/0/source/1","value":"\n"},{"op":"replace","path":"/cells/0/source/4","value":"\n"},{"op":"add","path":"/cells/0/source/5","value":"################################################################## "},{"op":"replace","path":"/cells/2/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/2/source/1","value":"\n"},{"op":"replace","path":"/cells/2/source/4","value":"\n"},{"op":"add","path":"/cells/2/source/5","value":"################################################################## "}]

jupyter_with_raw_cell_in_body.ipynb
[{"op":"add","path":"/cells/1/metadata/format","value":""}]

jupyter_with_raw_cell_on_top.ipynb
[{"op":"add","path":"/cells/0/metadata/format","value":""}]

In Notebook_with_many_hash_signs.ipynb, pandoc is adding a space at the end of the line of #s (which it takes to be a header) and inserting a newline after it and before the next one.

In the other two files, pandoc adds a metadata field "format": "".
(Should there be some other default format when no format is specified on a raw cell?)

jgm · 2019-03-28T01:10:28Z

Sorry, I realize you were probably referring to ipynb -> markdown -> ipynb rather than direct ipynb -> pandoc AST -> ipynb. I ran tests with that, using:

% for x in *.ipynb; do pandoc -s -f ipynb -t markdown --wrap=preserve --atx-headers --extract-media new "$x" -o new/"$x".markdown; pandoc -s -f markdown -t ipynb --wrap=preserve --atx-headers new/"$x".markdown -o new/"$x"; echo $x; json-diff "$x" new/"$x";  done | grep -v VBOR # this grep removes the big diffs for images which just have to do with newline placement

A few more issues appeared, but nothing like what you're reporting above.

Notebook_with_R_magic.ipynb
Notebook_with_function_and_cell_metadata_164.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/toc/nav_menu","value":""},{"op":"replace","path":"/metadata/toc/base_numbering","value":"1.0"},{"op":"replace","path":"/metadata/toc/toc_position","value":""},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_html_and_latex_cells.ipynb
[WARNING] Duplicate identifier 'section' at line 50 column 1
[WARNING] Duplicate identifier 'section-1' at line 54 column 1
Notebook_with_many_hash_signs.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/0/source/1","value":"\n"},{"op":"replace","path":"/cells/0/source/4","value":"\n"},{"op":"add","path":"/cells/0/source/5","value":"################################################################## "},{"op":"replace","path":"/cells/2/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/2/source/1","value":"\n"},{"op":"replace","path":"/cells/2/source/4","value":"\n"},{"op":"add","path":"/cells/2/source/5","value":"################################################################## "},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_more_R_magic_111.ipynb
pandoc: Stack space overflow: current size 33624 bytes.
pandoc: Use `+RTS -Ksize -RTS' to increase it.
World_population.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/metadata/language_info"},{"op":"remove","path":"/metadata/kernelspec"}]
convert_to_py_then_test_with_update83.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/outputs/0/text/2","value":"Wall time: 188 µs"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
frozen_cell.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/outputs/0/text/0","value":"I'm a regular cell so I run and print!"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_again.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/1/outputs/0/text/6","value":"title: Quick ioslides"},{"op":"remove","path":"/cells/1/outputs/0/text/7"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_in_body.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/1"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_on_top.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
notebook_with_complex_metadata.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"},{"op":"replace","path":"/metadata/widgets/state/a65a11f142ca44eebc913788d256adcb/views/0/cell_index","value":"92.0"}]
nteract_with_parameter.ipynb
sample_rise_notebook_66.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]

mwouts · 2019-03-28T05:44:36Z

Hello @jgm , thanks for your detailed feedback. I agree that the small modifications you see in the round trip are nothing to be afraid of.

So possibly the issue that I encounter is specific to my pandoc install: pandoc 2.7.1 installed from conda-forge on Windows 10, and using pipe to pipe the markdown into pandoc again, like here:

pandoc x.ipynb --from ipynb --to markdown | pandoc --from markdown --to ipynb

PS: the --wrap=preserve option will make it easier to test the round-trip, thanks for mentioning that.

jgm · 2019-03-28T06:02:27Z

Hm. The pandoc version should be okay. You should use the `-s` option to get a standalone markdown document with YAML metadata header. But otherwise, the command line you give looks just like what I did... You are using a pipe on Windows, and I know I've seen some kooky problems with piping things in a Windows console, having to do with encodings. See for example jgm/pandoc#3208. That could be the issue here. Try creating a markdown file pandoc x.ipynb -s -o x.md --wrap=preserve --atx-headers and then use that pandoc x.md -s -o x.ipynb --wrap=preserve --atx-headers

mwouts · 2019-03-28T07:54:07Z

Thanks @jgm . These are two useful suggestions. I will give a try to the -s option, and run pandoc on files rather than piping utf-8 text into it.

Regarding jgm/pandoc#3208, I am not so sure that the issue with piping is with the windows console, as it also occurs when I pipe using the python subprocess library, cf. here.

#208

mwouts · 2019-03-29T09:43:47Z

@jgm , this is definitely looking better. With the -s and --wrap=preserve options I get 8 out of the 18 notebooks preserved in the round trip, and this is going to improve with the next version of pandoc.

I have two additional questions

Could you have a look at the julia notebooks? There seem to be indentation changes on julia_benchmark_plotly_barchart.ipynb, cell 3.
Do you have recommendations on how to install pandoc on travis? Should I simply download and extract the binary file there?

jgm · 2019-03-29T16:42:56Z

Could you have a look at the julia notebooks? There seem to be indentation changes on julia_benchmark_plotly_barchart.ipynb, cell 3.

Try --preserve-tabs; I didn't see any issues with master other than tab/space differences.

Do you have recommendations on how to install pandoc on travis? Should I simply download and extract the binary file there?

That should work.

mwouts · 2019-03-30T01:44:34Z

Try --preserve-tabs; I didn't see any issues with master other than tab/space differences.

Indeed, that one was an issue with tabs, thanks.

With pandoc from master (on Windows again), I now get little to no differences on most test notebooks. But don't you think there is still an issue with the two notebooks that have a raw cell on top/in the body? My tests seem to say that the raw cell on top becomes a code cell, while that in the body becomes a markdown cell...

jgm · 2019-03-30T04:16:07Z

Can you point me to the specific files you mean, so I can test?
PS. Was this the one with a raw cell starting with ---\n? I fixed a bad interaction between setext header syntax and fenced div syntax just recently. I think it was having this result. Try pulling the latest master.

mwouts · 2019-03-30T10:50:46Z

Sure!

The two files are https://github.com/mwouts/jupytext/blob/1.1.0_pandoc_with_mirror_tests/tests/notebooks/ipynb_py/jupyter_with_raw_cell_in_body.ipynb and https://github.com/mwouts/jupytext/blob/1.1.0_pandoc_with_mirror_tests/tests/notebooks/ipynb_py/jupyter_with_raw_cell_on_top.ipynb.

My pandoc was up-to-date with master when I ran the test.

jgm · 2019-03-30T16:22:42Z

Here's what I get after converting those to markdown, then back to ipynb:

% json-diff jupyter_with_raw_cell_in_body.ipynb jupyter_with_raw_cell_in_body.md.ipynb 
 {
   cells: [
     ...
-    {
-      cell_type: "raw"
-      metadata: {
-      }
-      source: [
-        "This is a raw cell"
-      ]
-    }
     ...
   ]
 }
% json-diff jupyter_with_raw_cell_on_top.ipynb jupyter_with_raw_cell_on_top.md.ipynb 
 {
   cells: [
-    {
-      cell_type: "raw"
-      metadata: {
-      }
-      source: [
-        "---\n"
-        "title: \"Quick test\"\n"
-        "output:\n"
-        "  ioslides_presentation:\n"
-        "    widescreen: true\n"
-        "    smaller: true\n"
-        "editor_options:\n"
-        "     chunk_output_type console\n"
-        "---"
-      ]
-    }
     ...
     ...
   ]
 }

So I'm not seeing either raw cell become another type of cell (I believe that was due to a bug fixed yesterday; you probably had a version from before the fix). Instead, the raw cell disappears. I need to look into why that is happening.

jgm · 2019-03-30T16:32:35Z

OK, I see. As the pandoc documentation suggests, when translating between ipynb and markdown you should use the format markdown-raw_html-raw_tex+raw_attribute; this will force explicit raw blocks and avoid issues of this kind. With that change I get a perfect round-trip except for one thing: format of text/html is added to the raw cells. That's because pandoc doesn't have a concept of a "wildcard" raw block; every raw block has to have a specific format indicated. (I'm not really sure how to handle this mismatch between nbformat and pandoc.) (EDIT: OK, I think I've got a decent solution to this, which at least allows lossless roundtrips between markdown and ipynb.)

mwouts · 2019-04-01T21:58:09Z

Thanks @jgm . I confirm that with your latest commit to pandoc, there is no issue any more with the raw cells.

Now we should think about which command line we want to make the default for pandoc in Jupytext. Currently Jupytext uses pandoc --from markdown --to ipynb -s --atx-headers --wrap=preserve. We could add --preserve-tabs. If jgm/pandoc#5408 is implemented, then I think it should become the default.

Then, I would like the pandoc command line to be configurable, per notebook. Personally, I would have a use case for the automated Markdown reformatting (that of making python scripts PEP8 by avoiding long lines in Markdown cells). Also, as the pandoc format can store output cells, we should have an option to preserve them in that format.

#208

mwouts · 2019-04-09T06:39:47Z

As the latest commit states, I have hijacked travis to finally use conda as an additional configuration. The good point there is that we will always be testing the latest pandoc from conda-forge.

choldgraf · 2019-04-09T15:48:03Z

@mwouts seems like latest pandoc on conda forge is a decent model to start with, note that this might not behave the same on non *nix platforms (but that's probably to large of a challenge for jupytext to tackle itself).

Lemme know if/when there's a version you'd like me to demo :-) sorry for the slow replies, we have been on a major grant-writing effort to try finding some more funding!

mwouts · 2019-04-09T17:36:30Z

seems like latest pandoc on conda forge is a decent model to start with

I do agree. What I like with this is that we will automatically test with the latest pandoc, that's a good way of detecting troubles right when they happen!

note that this might not behave the same on non *nix platforms (but that's probably to large of a challenge for jupytext to tackle itself).

Well, it turned out that I developped most of Jupytext on Windows! So the non-*nix platforms are also covered with the tests... Indeed, I ran into a series of issues (different names for exceptions, utf-8 support when piping into pandoc, etc), but I prefer to experiment those myself than the end user...

Lemme know if/when there's a version you'd like me to demo

Sure! Version 1.1.0-rc1 will be out soon, I will let you know when it is ready for testing.
Good luck with the funding, by the way

#208

mwouts · 2019-04-09T21:12:59Z

@choldgraf , the new rc is available:

pip install jupytext==1.1.0rc1

Can you give it a try? The corresponding section in the README is here - basically you should use md:pandoc instead of md in jupytext.formats.

choldgraf · 2019-04-09T23:19:40Z

cool! will take a look when I have a moment...does this depend on latest pandoc, I assume?

mwouts · 2019-04-10T06:54:02Z

Yes, you will need the latest pandoc. Probably the simplest way to test the latest rc with pandoc is:

conda create -n jupytext-pandoc notebook mock testfixtures pyyaml -y
conda activate jupytext-pandoc
conda install pandoc -c conda-forge -y
pip install jupytext==1.1.0rc1

# And start a notebook server
jupyter notebook

To save a notebook in pandoc format (with no outputs at the moment): edit the notebook metadata and insert

"jupytext": {"formats":"ipynb,md:pandoc"},

Or, if you already have Markdown pandoc notebooks: simply click on them in the tree view, they should open as Jupyter notebooks.

choldgraf · 2019-04-10T16:34:32Z

hmmm, I tried opening up the notebook after following your instructions and got a gigantic recursion error trace...need to figure out what's up (I'm on WSL for what it's worth, so maybe that's messing things up)

mwouts · 2019-04-10T16:41:23Z

Interesting! Can you describe how you did that, and what's the error message? And if possible, what was the notebook? Note that until now I have only used WSL occasionally, but encountered no issue with it.

choldgraf · 2019-04-10T21:08:42Z

Hmmm, I'm first getting a "filenotfound" error:

Error while saving file: Untitled.md [Errno 2] No such file or directory: '/home/choldgraf/.~Untitled.md' -> '/home/choldgraf/Untitled.md'
    Traceback (most recent call last):

and then a neverending traceback of

Details

 File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/pandoc.py", line 70, in notebook_to_md
        tmp_file.write(nbformat.writes(notebook).encode('utf-8'))
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/contentsmanager.py", line 60, in _writes
        return writes(nbk, fmt, version=version, **kwargs)
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/jupytext.py", line 263, in writes
        return writer.writes(notebook, metadata)
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/jupytext.py", line 107, in writes
        return notebook_to_md(new_notebook(metadata=metadata, cells=cells))
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/pandoc.py", line 70, in notebook_to_md
        tmp_file.write(nbformat.writes(notebook).encode('utf-8'))

I wouldn't be surprised if this is a WSL problem

mwouts · 2019-04-11T07:30:52Z

Thanks @choldgraf for spotting this. I can reproduce the problem on both conda windows and conda WSL, so this is a major issue that probably also affects plain linux. I will reproduce it with a test, fix it, and then release another RC...

choldgraf · 2019-04-11T15:48:53Z

glad it wasn't just me :-)

Jupytext's contents manager patches nbformat.reads/writes, so we need to make copies of them #208

mwouts · 2019-04-11T20:28:45Z

That was a rather funny bug... The contents manager uses mock to replace nbformat.writes with Jupytext's one, but for pandoc we have to use the original nbformat.writes to write the notebook before calling pandoc... causing the infinite loop.

Anyway - now we have both a test and a fix for that in the latest RC:

pip install jupytext==1.1.0rc2

choldgraf · 2019-04-11T20:52:41Z

guess that's why we have release candidates :-)

mwouts · 2019-04-14T18:00:37Z

I have just released the version 1.1.0, which includes the md:pandoc format. Please let me know @choldgraf if you have more comments or suggestions. Thanks!

choldgraf · 2019-04-15T03:01:14Z

I'll give it a whirl this week!

choldgraf · 2019-04-16T18:26:02Z

@mwouts it works quite nicely! Working on a little blog post to demo how to blend jupytext/pandoc in an authoring context :-)

mwouts · 2019-04-16T20:18:10Z

Working on a little blog post to demo how to blend jupytext/pandoc in an authoring context :-)

Great news! I am looking forward to seeing how you use it!

ickc · 2019-05-21T11:17:27Z

Seems like I’m pretty late to the party.

Don’t know if it would be too late to mention this, if you want to set up CI to use latest pandoc master, you could try https://github.com/pandoc-extras/pandoc-nightly/releases.

Experience in round trip idempotency in pandoc is not easy, not to mention round trip identity. It’s good to see all these works into that.

mwouts · 2019-05-21T13:32:09Z

Hello @ickc , thanks for joining the conversation.

Don’t know if it would be too late to mention this, if you want to set up CI to use latest pandoc master, you could try https://github.com/pandoc-extras/pandoc-nightly/releases.

We do have a test of jupytext+pandoc in the CI, and it is done using pandoc from conda-forge, cf. these lines. I think this should be good enough - at least I think I saw that conda forge versions of pandoc were well up-to-date.

Experience in round trip idempotency in pandoc is not easy, not to mention round trip identity. It’s good to see all these works into that.

Thanks! Well here we just shared Jupytext's experience for round trips, and applied our test framework to pandoc's format. Jupytext does not have a pandoc encoder, it directly uses pandoc to convert between pandoc's Markdown and Jupyter notebooks. Still, in collaboration with @jgm , we did identify a series of flags that help preserving identity in the round trip. The actual command line is available here.

ickc · 2019-05-22T02:39:26Z

The actual command line is available here.

Is there any reason not to use an existing Python wrapper, such as pypandoc or panflute (from my experience pypandoc is more robust as long as you don't need to access the AST)? At the very least you don't need to write to a temp file and read it again. Without any output files pandoc will write to stdout, and you can use PIPE to capture the stdout text directly.

And will you consider providing an option not to require identity (better yet, allow users to specify the pandoc args used, e.g. personally I'd enable --atx-headers)? In my git repo when I have markdown I often use pandoc to "normalize" the markdown before committing. So that could be a good thing. (Even if identity is not required, idempotency should still be required. i.e. running it subsequently should not change it further. This is hard if the markdown is very general though. pandoc actually breaks idempotency in a lot of artificial examples. But again idempotency should holds in reasonably simple cases.)

it is done using pandoc from conda-forge, cf. these lines.

Seems like you're only using conda to install pandoc but not needing other things from it? If so, you could take a look at https://github.com/ickc/pantable/blob/master/.travis.yml#L40-L46, also see a bit higher to see how multiple pandoc versions can be tested against. This is essentially also how I set up Travis CI in panflute.

I think currently being able to have a round trip identity is partly luck because the test case is not complicated enough. but once jgm/pandoc#5408 is implemented it can be guaranteed (almost, except perhaps the raw to html mentioned there.) And the reason it is achievable probably should be credited to the design of ipynb. In the beginning I really hoped that iPython notebook would take the Rmarkdown approach. But being a JSON has a lot of nice features, including being able to have round-trip identity here. That's also why I'm interested in this project because basically this is providing the best of both world.

So I've a digressed question for you: since this project is very young, I think it is reasonable to ask about the longevity of this format(s). (e.g. I'm considering turning all ipynb in my git repos to this to have better git diff.) e.g. (pardon me as I haven't used jupytext and not even finished reading the README.)

there seems a lot of output format options. Have you considered having some (or just one) to eventually have a spec. i.e. there can be something like rmarkdown that can be standardized and universally recognized? It seems to me now there's a .md and .py options, which can be wildly different. Have you consider something like Rmarkdown that essentially has a markdown that can be run as a Python script (i.e. literate programming.) Or it is like how nbconvert can be used to run a ipynb, but instead the source is the pandoc markdown in jupytext.
It seems like the only doc is the README at the moment. It is quite long and a bit difficult to navigate. One minimal thing you can do is to add a ToC (which I described how to do that using pandoc in https://github.com/jgm/pandoc/wiki/Pandoc-Tricks#toc-generation.) Or may be just use pandoc to generate an HTML and use gh-pages to deliver that in a web format. I see that you've had a "content" line but a tree like structure including sub-headings is easier to scan the structure of the document to find what one's looking for. Also, currently there's a lot of examples but it is quite different to really get a sense of what jupytext is and can do. (probably because it can do a lot of things.)

Actually you might have sparked an idea of writing a pandoc filter that does something like this for me. I'd think about it and see if there's any way to collaborate at some point.

mwouts · 2019-05-22T14:58:41Z

Hello @ickc, I am afraid you have too many questions!! I'll try to answer a few of them

(...) Why not using pypandoc, or even pipe into pandoc?

Pandoc is not required for most of Jupytext formats, so I prefer not to take a dependency on pypandoc. Also, I started the pandoc plugin with pipes, but that did not work on Windows. If you are interested in fixing that, please let me know.

Testing multiple versions of pandoc

At the moment there's only one version of pandoc that can be used with Jupytext. And I prefer to be testing only the latest pandoc, as this is not a pandoc-centered project.

And will you consider providing an option not to require identity (better yet, allow users to specify the pandoc args used, e.g. personally I'd enable --atx-headers)?

Yes! If someone asks for it, I'd be happy to. That someone can be you - please open an issue for that.

So I've a digressed question for you: since this project is very young, I think it is reasonable to ask about the longevity of this format(s).

Certainly. The formats have evolved already. You can see the changes by looking at the history of files in the demo and test folders.

there seems a lot of output format options.

The most popular formats are probably

Scripts with %% cells (supported by many IDEs)
Scripts with few cell markers (the light format)
Markdown
R Markdown

Please read about them in the README.

It seems like the only doc is the README at the moment.

That is correct. I do not have much experience with Python docs, but maybe you can help turning the README into a real python documentation? Please open an issue for that as well, we can discuss that there.

choldgraf · 2019-05-22T15:11:31Z

@mwouts I'd be happy to help you turn the README into a Sphinx site if you'd be interested in this. Would you be open to a PR?

mwouts · 2019-05-22T15:29:28Z

Hi Chris, sure, that would be very helpful! Please go ahead, a PR on this would be just great!

mwouts added a commit that referenced this issue Mar 27, 2019

Always use linux line breaks

e343610

#208

mwouts added a commit that referenced this issue Mar 27, 2019

Activate mirror tests on pandoc format

9efb67e

#208

mwouts mentioned this issue Mar 27, 2019

Maintain cell metadata for RISE #66

Closed

mwouts added a commit that referenced this issue Mar 27, 2019

Use pandoc's --atx-headers option

c3fce8a

As our test notebooks have atx headers #208

mwouts added a commit that referenced this issue Mar 27, 2019

Ignore line breaks in markdown cells when using pandoc

cfe613e

#208

jgm mentioned this issue Mar 28, 2019

ipynb round trip details jgm/pandoc#5398

Closed

mwouts added a commit that referenced this issue Mar 28, 2019

Use temp files rather than pipes

18e47b9

#208

mwouts added a commit that referenced this issue Mar 28, 2019

Filter metadata and cell outputs before pandoc export

b4aa2e7

#208

mwouts added a commit that referenced this issue Mar 28, 2019

Mirror files updated (outputs removed, metadata filtered)

c0a500a

#208

mwouts added a commit that referenced this issue Mar 29, 2019

Skip pandoc format and tests when pandoc>=2.7.1 is not available

0ab6000

#208

mwouts added a commit that referenced this issue Apr 1, 2019

raw cells preserved in round trip #208

c26760f

mwouts added this to the 1.1.0 milestone Apr 6, 2019

mwouts added a commit that referenced this issue Apr 6, 2019

Pandoc 2.7.2

63014a9

#208

mwouts added a commit that referenced this issue Apr 6, 2019

Install pandoc 2.7.2 on travis

455417e

#208

mwouts added a commit that referenced this issue Apr 8, 2019

Pandoc from conda, with python 3.6

c81b032

#208

mwouts added a commit that referenced this issue Apr 9, 2019

Travis: Hijack python 2.6 into conda with Python 3

ba99788

#208

mwouts added a commit that referenced this issue Apr 9, 2019

Make sure that md:pandoc is preserved in jupytext formats

2e65cfb

#208

mwouts mentioned this issue Apr 9, 2019

1.1.0 rc1 #214

Merged

mwouts added a commit that referenced this issue Apr 11, 2019

Reproduce and fix infinite recursion

b720bff

Jupytext's contents manager patches nbformat.reads/writes, so we need to make copies of them #208

mwouts closed this as completed Apr 14, 2019

mwouts mentioned this issue Jun 21, 2019

Links to sample md files do not work in read the docs #255

Closed

Pandoc format for Jupyter notebook #208

Pandoc format for Jupyter notebook #208

Comments

mwouts commented Mar 27, 2019

mwouts commented Mar 27, 2019

mwouts commented Mar 27, 2019

mwouts commented Mar 27, 2019

jgm commented Mar 27, 2019 • edited Loading

jgm commented Mar 28, 2019

mwouts commented Mar 28, 2019

jgm commented Mar 28, 2019 via email

mwouts commented Mar 28, 2019

mwouts commented Mar 29, 2019 • edited Loading

jgm commented Mar 29, 2019

mwouts commented Mar 30, 2019

jgm commented Mar 30, 2019 • edited Loading

mwouts commented Mar 30, 2019

jgm commented Mar 30, 2019

jgm commented Mar 30, 2019 • edited Loading

mwouts commented Apr 1, 2019

mwouts commented Apr 9, 2019

choldgraf commented Apr 9, 2019

mwouts commented Apr 9, 2019

mwouts commented Apr 9, 2019

choldgraf commented Apr 9, 2019

mwouts commented Apr 10, 2019

choldgraf commented Apr 10, 2019 • edited Loading

mwouts commented Apr 10, 2019

choldgraf commented Apr 10, 2019

mwouts commented Apr 11, 2019

choldgraf commented Apr 11, 2019

mwouts commented Apr 11, 2019

choldgraf commented Apr 11, 2019

mwouts commented Apr 14, 2019

choldgraf commented Apr 15, 2019

choldgraf commented Apr 16, 2019

mwouts commented Apr 16, 2019

ickc commented May 21, 2019

mwouts commented May 21, 2019

ickc commented May 22, 2019

mwouts commented May 22, 2019

choldgraf commented May 22, 2019

mwouts commented May 22, 2019

jgm commented Mar 27, 2019 •

edited

Loading

mwouts commented Mar 29, 2019 •

edited

Loading

jgm commented Mar 30, 2019 •

edited

Loading

jgm commented Mar 30, 2019 •

edited

Loading

choldgraf commented Apr 10, 2019 •

edited

Loading