Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc format for Jupyter notebook #208

Closed
mwouts opened this issue Mar 27, 2019 · 40 comments
Closed

Pandoc format for Jupyter notebook #208

mwouts opened this issue Mar 27, 2019 · 40 comments
Milestone

Comments

@mwouts
Copy link
Owner

mwouts commented Mar 27, 2019

Pandoc allow users to create Jupyter notebook from Markdown files, cf. the documentation. We'll try to plug pandoc into Jupytext and see if that is usable.

mwouts added a commit that referenced this issue Mar 27, 2019
mwouts added a commit that referenced this issue Mar 27, 2019
@mwouts
Copy link
Owner Author

mwouts commented Mar 27, 2019

Interesting! Pandoc reformats the Markdown contents. A round-trip on a notebook with a cell

# Header

converts it to

Header
======

@mwouts
Copy link
Owner Author

mwouts commented Mar 27, 2019

The representation of the test notebooks, in the pandoc format, are here: https://github.com/mwouts/jupytext/tree/1.1.0_pandoc_with_mirror_tests/tests/notebooks/mirror/ipynb_to_pandoc

See for instance a notebook with cell metadata.

mwouts added a commit that referenced this issue Mar 27, 2019
As our test notebooks have atx headers #208
@mwouts
Copy link
Owner Author

mwouts commented Mar 27, 2019

On pandoc-discuss, @jgm suggested to use the --atx-headers option for pandoc. If in addition we ignore line returns in Markdown cells, we do have identical contents for Markdown cells on round trips.

However, the round trip on the notebook itself is still not satisfactory. None of the 18 test notebooks passes the round-trip test. John, would you like me to report this at jgm/pandoc?

For instance, the round-trip of jupytext/tests/notebooks/ipynb_py/Notebook with function and cell metadata 164.ipynb replaces the first code cell

{
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "10"
    }
   },
   "outputs": [],
   "source": [
    "def f(x):\n",
    "    return x"
   ]
  },

with a markdown cell:

 {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::: {.cell .code execution\\_count=“2�\n",
    "attributes=“{�n“:�10“,�id“:�“,�classes“:\\[\\]}�}\n",
    "\n",
    "``` python\n",
    "def f(x):\n",
    "    return x\n",
    "```\n",
    "\n",
    ":::\n",
    "\n",
    "::: {.cell .code execution\\_count=“3�\n",
    "attributes=“{�n“:�10“,�id“:�“,�classes“:\\[\\]}�}\n",
    "\n",
    "``` python\n",
    "f(5)\n",
    "```\n",
    "\n",
    "    5\n",
    "\n",
    ":::"
   ]
  },

@jgm
Copy link

jgm commented Mar 27, 2019

I don't think this is an issue with pandoc. Saving your notebook as x.ipynb, I tried

pandoc -s x.ipynb -o x2.ipynb --wrap=preserve

and then used json-diff to see the differences. There were just three, all quite minor:

[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/toc/base_numbering","value":"1.0"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]

That is:

  • nbformat_minor changes to 5 (since we target 4.5 in the ipynb writer).
  • base_numbering in toc in the metadata changed from 1 to "1.0".
  • codemirror_mode / version changed from 3 to "3.0".

The numbering changes definitely look like bugs and could be reported. [EDIT: I've fixed them already.] The nbformat_minor change is more debatable. Currently pandoc's ipynb writer targets a single version, but it could be trained to respect the minor version number of the input if it's present in metadata. [EDIT: got this one too.]

EDIT: After the code changes noted above, I tried all the ipynb files in that directory with pandoc --wrap=preserve -s -f ipynb -t ipynb --atx-headers and tested them all with json-diff. Discarding insignificant differences due to newlines in the encoded images, I got just these:

Notebook_with_many_hash_signs.ipynb
[{"op":"replace","path":"/cells/0/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/0/source/1","value":"\n"},{"op":"replace","path":"/cells/0/source/4","value":"\n"},{"op":"add","path":"/cells/0/source/5","value":"################################################################## "},{"op":"replace","path":"/cells/2/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/2/source/1","value":"\n"},{"op":"replace","path":"/cells/2/source/4","value":"\n"},{"op":"add","path":"/cells/2/source/5","value":"################################################################## "}]

jupyter_with_raw_cell_in_body.ipynb
[{"op":"add","path":"/cells/1/metadata/format","value":""}]

jupyter_with_raw_cell_on_top.ipynb
[{"op":"add","path":"/cells/0/metadata/format","value":""}]

In Notebook_with_many_hash_signs.ipynb, pandoc is adding a space at the end of the line of #s (which it takes to be a header) and inserting a newline after it and before the next one.

In the other two files, pandoc adds a metadata field "format": "".
(Should there be some other default format when no format is specified on a raw cell?)

@jgm
Copy link

jgm commented Mar 28, 2019

Sorry, I realize you were probably referring to ipynb -> markdown -> ipynb rather than direct ipynb -> pandoc AST -> ipynb. I ran tests with that, using:

% for x in *.ipynb; do pandoc -s -f ipynb -t markdown --wrap=preserve --atx-headers --extract-media new "$x" -o new/"$x".markdown; pandoc -s -f markdown -t ipynb --wrap=preserve --atx-headers new/"$x".markdown -o new/"$x"; echo $x; json-diff "$x" new/"$x";  done | grep -v VBOR # this grep removes the big diffs for images which just have to do with newline placement

A few more issues appeared, but nothing like what you're reporting above.

Notebook_with_R_magic.ipynb
Notebook_with_function_and_cell_metadata_164.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/toc/nav_menu","value":""},{"op":"replace","path":"/metadata/toc/base_numbering","value":"1.0"},{"op":"replace","path":"/metadata/toc/toc_position","value":""},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_html_and_latex_cells.ipynb
[WARNING] Duplicate identifier 'section' at line 50 column 1
[WARNING] Duplicate identifier 'section-1' at line 54 column 1
Notebook_with_many_hash_signs.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/0/source/1","value":"\n"},{"op":"replace","path":"/cells/0/source/4","value":"\n"},{"op":"add","path":"/cells/0/source/5","value":"################################################################## "},{"op":"replace","path":"/cells/2/source/0","value":"################################################################## \n"},{"op":"add","path":"/cells/2/source/1","value":"\n"},{"op":"replace","path":"/cells/2/source/4","value":"\n"},{"op":"add","path":"/cells/2/source/5","value":"################################################################## "},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_more_R_magic_111.ipynb
pandoc: Stack space overflow: current size 33624 bytes.
pandoc: Use `+RTS -Ksize -RTS' to increase it.
World_population.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/metadata/language_info"},{"op":"remove","path":"/metadata/kernelspec"}]
convert_to_py_then_test_with_update83.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/outputs/0/text/2","value":"Wall time: 188 µs"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
frozen_cell.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/0/outputs/0/text/0","value":"I'm a regular cell so I run and print!"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_again.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/cells/1/outputs/0/text/6","value":"title: Quick ioslides"},{"op":"remove","path":"/cells/1/outputs/0/text/7"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_in_body.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/1"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_on_top.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
notebook_with_complex_metadata.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"},{"op":"replace","path":"/metadata/widgets/state/a65a11f142ca44eebc913788d256adcb/views/0/cell_index","value":"92.0"}]
nteract_with_parameter.ipynb
sample_rise_notebook_66.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]

@mwouts
Copy link
Owner Author

mwouts commented Mar 28, 2019

Hello @jgm , thanks for your detailed feedback. I agree that the small modifications you see in the round trip are nothing to be afraid of.

So possibly the issue that I encounter is specific to my pandoc install: pandoc 2.7.1 installed from conda-forge on Windows 10, and using pipe to pipe the markdown into pandoc again, like here:

pandoc x.ipynb --from ipynb --to markdown | pandoc --from markdown --to ipynb

PS: the --wrap=preserve option will make it easier to test the round-trip, thanks for mentioning that.

@jgm
Copy link

jgm commented Mar 28, 2019 via email

@mwouts
Copy link
Owner Author

mwouts commented Mar 28, 2019

Thanks @jgm . These are two useful suggestions. I will give a try to the -s option, and run pandoc on files rather than piping utf-8 text into it.

Regarding jgm/pandoc#3208, I am not so sure that the issue with piping is with the windows console, as it also occurs when I pipe using the python subprocess library, cf. here.

@mwouts
Copy link
Owner Author

mwouts commented Mar 29, 2019

@jgm , this is definitely looking better. With the -s and --wrap=preserve options I get 8 out of the 18 notebooks preserved in the round trip, and this is going to improve with the next version of pandoc.

I have two additional questions

  • Could you have a look at the julia notebooks? There seem to be indentation changes on julia_benchmark_plotly_barchart.ipynb, cell 3.
  • Do you have recommendations on how to install pandoc on travis? Should I simply download and extract the binary file there?

@jgm
Copy link

jgm commented Mar 29, 2019

Could you have a look at the julia notebooks? There seem to be indentation changes on julia_benchmark_plotly_barchart.ipynb, cell 3.

Try --preserve-tabs; I didn't see any issues with master other than tab/space differences.

Do you have recommendations on how to install pandoc on travis? Should I simply download and extract the binary file there?

That should work.

@mwouts
Copy link
Owner Author

mwouts commented Mar 30, 2019

Try --preserve-tabs; I didn't see any issues with master other than tab/space differences.

Indeed, that one was an issue with tabs, thanks.

With pandoc from master (on Windows again), I now get little to no differences on most test notebooks. But don't you think there is still an issue with the two notebooks that have a raw cell on top/in the body? My tests seem to say that the raw cell on top becomes a code cell, while that in the body becomes a markdown cell...

@jgm
Copy link

jgm commented Mar 30, 2019

Can you point me to the specific files you mean, so I can test?
PS. Was this the one with a raw cell starting with ---\n? I fixed a bad interaction between setext header syntax and fenced div syntax just recently. I think it was having this result. Try pulling the latest master.

@jgm
Copy link

jgm commented Mar 30, 2019

Here's what I get after converting those to markdown, then back to ipynb:

% json-diff jupyter_with_raw_cell_in_body.ipynb jupyter_with_raw_cell_in_body.md.ipynb 
 {
   cells: [
     ...
-    {
-      cell_type: "raw"
-      metadata: {
-      }
-      source: [
-        "This is a raw cell"
-      ]
-    }
     ...
   ]
 }
% json-diff jupyter_with_raw_cell_on_top.ipynb jupyter_with_raw_cell_on_top.md.ipynb 
 {
   cells: [
-    {
-      cell_type: "raw"
-      metadata: {
-      }
-      source: [
-        "---\n"
-        "title: \"Quick test\"\n"
-        "output:\n"
-        "  ioslides_presentation:\n"
-        "    widescreen: true\n"
-        "    smaller: true\n"
-        "editor_options:\n"
-        "     chunk_output_type console\n"
-        "---"
-      ]
-    }
     ...
     ...
   ]
 }

So I'm not seeing either raw cell become another type of cell (I believe that was due to a bug fixed yesterday; you probably had a version from before the fix). Instead, the raw cell disappears. I need to look into why that is happening.

@jgm
Copy link

jgm commented Mar 30, 2019

OK, I see. As the pandoc documentation suggests, when translating between ipynb and markdown you should use the format markdown-raw_html-raw_tex+raw_attribute; this will force explicit raw blocks and avoid issues of this kind. With that change I get a perfect round-trip except for one thing: format of text/html is added to the raw cells. That's because pandoc doesn't have a concept of a "wildcard" raw block; every raw block has to have a specific format indicated. (I'm not really sure how to handle this mismatch between nbformat and pandoc.) (EDIT: OK, I think I've got a decent solution to this, which at least allows lossless roundtrips between markdown and ipynb.)

@mwouts
Copy link
Owner Author

mwouts commented Apr 1, 2019

Thanks @jgm . I confirm that with your latest commit to pandoc, there is no issue any more with the raw cells.

Now we should think about which command line we want to make the default for pandoc in Jupytext. Currently Jupytext uses pandoc --from markdown --to ipynb -s --atx-headers --wrap=preserve. We could add --preserve-tabs. If jgm/pandoc#5408 is implemented, then I think it should become the default.

Then, I would like the pandoc command line to be configurable, per notebook. Personally, I would have a use case for the automated Markdown reformatting (that of making python scripts PEP8 by avoiding long lines in Markdown cells). Also, as the pandoc format can store output cells, we should have an option to preserve them in that format.

mwouts added a commit that referenced this issue Apr 1, 2019
@mwouts mwouts added this to the 1.1.0 milestone Apr 6, 2019
mwouts added a commit that referenced this issue Apr 6, 2019
mwouts added a commit that referenced this issue Apr 6, 2019
@mwouts
Copy link
Owner Author

mwouts commented Apr 9, 2019

As the latest commit states, I have hijacked travis to finally use conda as an additional configuration. The good point there is that we will always be testing the latest pandoc from conda-forge.

@choldgraf
Copy link
Contributor

@mwouts seems like latest pandoc on conda forge is a decent model to start with, note that this might not behave the same on non *nix platforms (but that's probably to large of a challenge for jupytext to tackle itself).

Lemme know if/when there's a version you'd like me to demo :-) sorry for the slow replies, we have been on a major grant-writing effort to try finding some more funding!

@mwouts
Copy link
Owner Author

mwouts commented Apr 9, 2019

seems like latest pandoc on conda forge is a decent model to start with

I do agree. What I like with this is that we will automatically test with the latest pandoc, that's a good way of detecting troubles right when they happen!

note that this might not behave the same on non *nix platforms (but that's probably to large of a challenge for jupytext to tackle itself).

Well, it turned out that I developped most of Jupytext on Windows! So the non-*nix platforms are also covered with the tests... Indeed, I ran into a series of issues (different names for exceptions, utf-8 support when piping into pandoc, etc), but I prefer to experiment those myself than the end user...

Lemme know if/when there's a version you'd like me to demo

Sure! Version 1.1.0-rc1 will be out soon, I will let you know when it is ready for testing.
Good luck with the funding, by the way

@mwouts
Copy link
Owner Author

mwouts commented Apr 9, 2019

@choldgraf , the new rc is available:

pip install jupytext==1.1.0rc1

Can you give it a try? The corresponding section in the README is here - basically you should use md:pandoc instead of md in jupytext.formats.

@choldgraf
Copy link
Contributor

cool! will take a look when I have a moment...does this depend on latest pandoc, I assume?

@mwouts
Copy link
Owner Author

mwouts commented Apr 10, 2019

Yes, you will need the latest pandoc. Probably the simplest way to test the latest rc with pandoc is:

conda create -n jupytext-pandoc notebook mock testfixtures pyyaml -y
conda activate jupytext-pandoc
conda install pandoc -c conda-forge -y
pip install jupytext==1.1.0rc1

# And start a notebook server
jupyter notebook

To save a notebook in pandoc format (with no outputs at the moment): edit the notebook metadata and insert

"jupytext": {"formats":"ipynb,md:pandoc"},

Or, if you already have Markdown pandoc notebooks: simply click on them in the tree view, they should open as Jupyter notebooks.

@choldgraf
Copy link
Contributor

choldgraf commented Apr 10, 2019

hmmm, I tried opening up the notebook after following your instructions and got a gigantic recursion error trace...need to figure out what's up (I'm on WSL for what it's worth, so maybe that's messing things up)

@mwouts
Copy link
Owner Author

mwouts commented Apr 10, 2019

Interesting! Can you describe how you did that, and what's the error message? And if possible, what was the notebook? Note that until now I have only used WSL occasionally, but encountered no issue with it.

@choldgraf
Copy link
Contributor

Hmmm, I'm first getting a "filenotfound" error:

Error while saving file: Untitled.md [Errno 2] No such file or directory: '/home/choldgraf/.~Untitled.md' -> '/home/choldgraf/Untitled.md'
    Traceback (most recent call last):

and then a neverending traceback of

Details
 File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/pandoc.py", line 70, in notebook_to_md
        tmp_file.write(nbformat.writes(notebook).encode('utf-8'))
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/contentsmanager.py", line 60, in _writes
        return writes(nbk, fmt, version=version, **kwargs)
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/jupytext.py", line 263, in writes
        return writer.writes(notebook, metadata)
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/jupytext.py", line 107, in writes
        return notebook_to_md(new_notebook(metadata=metadata, cells=cells))
      File "/home/choldgraf/anaconda/envs/jupytext-pandoc/lib/python3.7/site-packages/jupytext/pandoc.py", line 70, in notebook_to_md
        tmp_file.write(nbformat.writes(notebook).encode('utf-8'))

I wouldn't be surprised if this is a WSL problem

@mwouts
Copy link
Owner Author

mwouts commented Apr 11, 2019

Thanks @choldgraf for spotting this. I can reproduce the problem on both conda windows and conda WSL, so this is a major issue that probably also affects plain linux. I will reproduce it with a test, fix it, and then release another RC...

@choldgraf
Copy link
Contributor

glad it wasn't just me :-)

mwouts added a commit that referenced this issue Apr 11, 2019
Jupytext's contents manager patches nbformat.reads/writes, so we need to make copies of them
#208
@mwouts
Copy link
Owner Author

mwouts commented Apr 11, 2019

That was a rather funny bug... The contents manager uses mock to replace nbformat.writes with Jupytext's one, but for pandoc we have to use the original nbformat.writes to write the notebook before calling pandoc... causing the infinite loop.

Anyway - now we have both a test and a fix for that in the latest RC:

pip install jupytext==1.1.0rc2

@choldgraf
Copy link
Contributor

guess that's why we have release candidates :-)

@mwouts
Copy link
Owner Author

mwouts commented Apr 14, 2019

I have just released the version 1.1.0, which includes the md:pandoc format. Please let me know @choldgraf if you have more comments or suggestions. Thanks!

@mwouts mwouts closed this as completed Apr 14, 2019
@choldgraf
Copy link
Contributor

I'll give it a whirl this week!

@choldgraf
Copy link
Contributor

@mwouts it works quite nicely! Working on a little blog post to demo how to blend jupytext/pandoc in an authoring context :-)

@mwouts
Copy link
Owner Author

mwouts commented Apr 16, 2019

Working on a little blog post to demo how to blend jupytext/pandoc in an authoring context :-)

Great news! I am looking forward to seeing how you use it!

@ickc
Copy link

ickc commented May 21, 2019

Seems like I’m pretty late to the party.

Don’t know if it would be too late to mention this, if you want to set up CI to use latest pandoc master, you could try https://github.com/pandoc-extras/pandoc-nightly/releases.

Experience in round trip idempotency in pandoc is not easy, not to mention round trip identity. It’s good to see all these works into that.

@mwouts
Copy link
Owner Author

mwouts commented May 21, 2019

Hello @ickc , thanks for joining the conversation.

Don’t know if it would be too late to mention this, if you want to set up CI to use latest pandoc master, you could try https://github.com/pandoc-extras/pandoc-nightly/releases.

We do have a test of jupytext+pandoc in the CI, and it is done using pandoc from conda-forge, cf. these lines. I think this should be good enough - at least I think I saw that conda forge versions of pandoc were well up-to-date.

Experience in round trip idempotency in pandoc is not easy, not to mention round trip identity. It’s good to see all these works into that.

Thanks! Well here we just shared Jupytext's experience for round trips, and applied our test framework to pandoc's format. Jupytext does not have a pandoc encoder, it directly uses pandoc to convert between pandoc's Markdown and Jupyter notebooks. Still, in collaboration with @jgm , we did identify a series of flags that help preserving identity in the round trip. The actual command line is available here.

@ickc
Copy link

ickc commented May 22, 2019

The actual command line is available here.

Is there any reason not to use an existing Python wrapper, such as pypandoc or panflute (from my experience pypandoc is more robust as long as you don't need to access the AST)? At the very least you don't need to write to a temp file and read it again. Without any output files pandoc will write to stdout, and you can use PIPE to capture the stdout text directly.

And will you consider providing an option not to require identity (better yet, allow users to specify the pandoc args used, e.g. personally I'd enable --atx-headers)? In my git repo when I have markdown I often use pandoc to "normalize" the markdown before committing. So that could be a good thing. (Even if identity is not required, idempotency should still be required. i.e. running it subsequently should not change it further. This is hard if the markdown is very general though. pandoc actually breaks idempotency in a lot of artificial examples. But again idempotency should holds in reasonably simple cases.)

it is done using pandoc from conda-forge, cf. these lines.

Seems like you're only using conda to install pandoc but not needing other things from it? If so, you could take a look at https://github.com/ickc/pantable/blob/master/.travis.yml#L40-L46, also see a bit higher to see how multiple pandoc versions can be tested against. This is essentially also how I set up Travis CI in panflute.

I think currently being able to have a round trip identity is partly luck because the test case is not complicated enough. but once jgm/pandoc#5408 is implemented it can be guaranteed (almost, except perhaps the raw to html mentioned there.) And the reason it is achievable probably should be credited to the design of ipynb. In the beginning I really hoped that iPython notebook would take the Rmarkdown approach. But being a JSON has a lot of nice features, including being able to have round-trip identity here. That's also why I'm interested in this project because basically this is providing the best of both world.

So I've a digressed question for you: since this project is very young, I think it is reasonable to ask about the longevity of this format(s). (e.g. I'm considering turning all ipynb in my git repos to this to have better git diff.) e.g. (pardon me as I haven't used jupytext and not even finished reading the README.)

  • there seems a lot of output format options. Have you considered having some (or just one) to eventually have a spec. i.e. there can be something like rmarkdown that can be standardized and universally recognized? It seems to me now there's a .md and .py options, which can be wildly different. Have you consider something like Rmarkdown that essentially has a markdown that can be run as a Python script (i.e. literate programming.) Or it is like how nbconvert can be used to run a ipynb, but instead the source is the pandoc markdown in jupytext.

  • It seems like the only doc is the README at the moment. It is quite long and a bit difficult to navigate. One minimal thing you can do is to add a ToC (which I described how to do that using pandoc in https://github.com/jgm/pandoc/wiki/Pandoc-Tricks#toc-generation.) Or may be just use pandoc to generate an HTML and use gh-pages to deliver that in a web format. I see that you've had a "content" line but a tree like structure including sub-headings is easier to scan the structure of the document to find what one's looking for. Also, currently there's a lot of examples but it is quite different to really get a sense of what jupytext is and can do. (probably because it can do a lot of things.)

Actually you might have sparked an idea of writing a pandoc filter that does something like this for me. I'd think about it and see if there's any way to collaborate at some point.

@mwouts
Copy link
Owner Author

mwouts commented May 22, 2019

Hello @ickc, I am afraid you have too many questions!! I'll try to answer a few of them

(...) Why not using pypandoc, or even pipe into pandoc?

Pandoc is not required for most of Jupytext formats, so I prefer not to take a dependency on pypandoc. Also, I started the pandoc plugin with pipes, but that did not work on Windows. If you are interested in fixing that, please let me know.

Testing multiple versions of pandoc

At the moment there's only one version of pandoc that can be used with Jupytext. And I prefer to be testing only the latest pandoc, as this is not a pandoc-centered project.

And will you consider providing an option not to require identity (better yet, allow users to specify the pandoc args used, e.g. personally I'd enable --atx-headers)?

Yes! If someone asks for it, I'd be happy to. That someone can be you - please open an issue for that.

So I've a digressed question for you: since this project is very young, I think it is reasonable to ask about the longevity of this format(s).

Certainly. The formats have evolved already. You can see the changes by looking at the history of files in the demo and test folders.

there seems a lot of output format options.

The most popular formats are probably

  • Scripts with %% cells (supported by many IDEs)
  • Scripts with few cell markers (the light format)
  • Markdown
  • R Markdown

Please read about them in the README.

It seems like the only doc is the README at the moment.

That is correct. I do not have much experience with Python docs, but maybe you can help turning the README into a real python documentation? Please open an issue for that as well, we can discuss that there.

@choldgraf
Copy link
Contributor

@mwouts I'd be happy to help you turn the README into a Sphinx site if you'd be interested in this. Would you be open to a PR?

@mwouts
Copy link
Owner Author

mwouts commented May 22, 2019

Hi Chris, sure, that would be very helpful! Please go ahead, a PR on this would be just great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants