-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandoc format for Jupyter notebook #208
Comments
Interesting! Pandoc reformats the Markdown contents. A round-trip on a notebook with a cell
converts it to
|
The representation of the test notebooks, in the pandoc format, are here: https://github.com/mwouts/jupytext/tree/1.1.0_pandoc_with_mirror_tests/tests/notebooks/mirror/ipynb_to_pandoc See for instance a notebook with cell metadata. |
As our test notebooks have atx headers #208
On pandoc-discuss, @jgm suggested to use the However, the round trip on the notebook itself is still not satisfactory. None of the 18 test notebooks passes the round-trip test. John, would you like me to report this at jgm/pandoc? For instance, the round-trip of jupytext/tests/notebooks/ipynb_py/Notebook with function and cell metadata 164.ipynb replaces the first code cell
with a markdown cell:
|
I don't think this is an issue with pandoc. Saving your notebook as x.ipynb, I tried
and then used
That is:
The numbering changes definitely look like bugs and could be reported. [EDIT: I've fixed them already.] The EDIT: After the code changes noted above, I tried all the ipynb files in that directory with
In In the other two files, pandoc adds a metadata field |
Sorry, I realize you were probably referring to
A few more issues appeared, but nothing like what you're reporting above.
|
Hello @jgm , thanks for your detailed feedback. I agree that the small modifications you see in the round trip are nothing to be afraid of. So possibly the issue that I encounter is specific to my pandoc install: pandoc 2.7.1 installed from conda-forge on Windows 10, and using pipe to pipe the markdown into pandoc again, like here:
PS: the |
Hm. The pandoc version should be okay.
You should use the `-s` option to get a standalone
markdown document with YAML metadata header.
But otherwise, the command line you give looks just
like what I did...
You are using a pipe on Windows, and I know I've
seen some kooky problems with piping things in a
Windows console, having to do with encodings. See
for example jgm/pandoc#3208.
That could be the issue here. Try creating a markdown
file
pandoc x.ipynb -s -o x.md --wrap=preserve --atx-headers
and then use that
pandoc x.md -s -o x.ipynb --wrap=preserve --atx-headers
|
Thanks @jgm . These are two useful suggestions. I will give a try to the Regarding jgm/pandoc#3208, I am not so sure that the issue with piping is with the windows console, as it also occurs when I pipe using the python |
@jgm , this is definitely looking better. With the I have two additional questions
|
Try
That should work. |
Indeed, that one was an issue with tabs, thanks. With pandoc from master (on Windows again), I now get little to no differences on most test notebooks. But don't you think there is still an issue with the two notebooks that have a raw cell on top/in the body? My tests seem to say that the raw cell on top becomes a code cell, while that in the body becomes a markdown cell... |
Can you point me to the specific files you mean, so I can test? |
Sure! The two files are https://github.com/mwouts/jupytext/blob/1.1.0_pandoc_with_mirror_tests/tests/notebooks/ipynb_py/jupyter_with_raw_cell_in_body.ipynb and https://github.com/mwouts/jupytext/blob/1.1.0_pandoc_with_mirror_tests/tests/notebooks/ipynb_py/jupyter_with_raw_cell_on_top.ipynb. My pandoc was up-to-date with master when I ran the test. |
Here's what I get after converting those to markdown, then back to ipynb:
So I'm not seeing either raw cell become another type of cell (I believe that was due to a bug fixed yesterday; you probably had a version from before the fix). Instead, the raw cell disappears. I need to look into why that is happening. |
OK, I see. As the pandoc documentation suggests, when translating between ipynb and markdown you should use the format |
Thanks @jgm . I confirm that with your latest commit to pandoc, there is no issue any more with the raw cells. Now we should think about which command line we want to make the default for pandoc in Jupytext. Currently Jupytext uses Then, I would like the pandoc command line to be configurable, per notebook. Personally, I would have a use case for the automated Markdown reformatting (that of making python scripts PEP8 by avoiding long lines in Markdown cells). Also, as the pandoc format can store output cells, we should have an option to preserve them in that format. |
As the latest commit states, I have hijacked travis to finally use conda as an additional configuration. The good point there is that we will always be testing the latest |
@mwouts seems like latest pandoc on conda forge is a decent model to start with, note that this might not behave the same on non *nix platforms (but that's probably to large of a challenge for jupytext to tackle itself). Lemme know if/when there's a version you'd like me to demo :-) sorry for the slow replies, we have been on a major grant-writing effort to try finding some more funding! |
I do agree. What I like with this is that we will automatically test with the latest pandoc, that's a good way of detecting troubles right when they happen!
Well, it turned out that I developped most of Jupytext on Windows! So the non-*nix platforms are also covered with the tests... Indeed, I ran into a series of issues (different names for exceptions, utf-8 support when piping into pandoc, etc), but I prefer to experiment those myself than the end user...
Sure! Version 1.1.0-rc1 will be out soon, I will let you know when it is ready for testing. |
@choldgraf , the new rc is available:
Can you give it a try? The corresponding section in the README is here - basically you should use |
cool! will take a look when I have a moment...does this depend on latest pandoc, I assume? |
Yes, you will need the latest pandoc. Probably the simplest way to test the latest rc with pandoc is:
To save a notebook in pandoc format (with no outputs at the moment): edit the notebook metadata and insert
Or, if you already have Markdown pandoc notebooks: simply click on them in the tree view, they should open as Jupyter notebooks. |
hmmm, I tried opening up the notebook after following your instructions and got a gigantic recursion error trace...need to figure out what's up (I'm on WSL for what it's worth, so maybe that's messing things up) |
Interesting! Can you describe how you did that, and what's the error message? And if possible, what was the notebook? Note that until now I have only used WSL occasionally, but encountered no issue with it. |
Hmmm, I'm first getting a "filenotfound" error:
and then a neverending traceback of Details
I wouldn't be surprised if this is a WSL problem |
Thanks @choldgraf for spotting this. I can reproduce the problem on both conda windows and conda WSL, so this is a major issue that probably also affects plain linux. I will reproduce it with a test, fix it, and then release another RC... |
glad it wasn't just me :-) |
Jupytext's contents manager patches nbformat.reads/writes, so we need to make copies of them #208
That was a rather funny bug... The contents manager uses Anyway - now we have both a test and a fix for that in the latest RC:
|
guess that's why we have release candidates :-) |
I have just released the version 1.1.0, which includes the |
I'll give it a whirl this week! |
@mwouts it works quite nicely! Working on a little blog post to demo how to blend jupytext/pandoc in an authoring context :-) |
Great news! I am looking forward to seeing how you use it! |
Seems like I’m pretty late to the party. Don’t know if it would be too late to mention this, if you want to set up CI to use latest pandoc master, you could try https://github.com/pandoc-extras/pandoc-nightly/releases. Experience in round trip idempotency in pandoc is not easy, not to mention round trip identity. It’s good to see all these works into that. |
Hello @ickc , thanks for joining the conversation.
We do have a test of jupytext+pandoc in the CI, and it is done using pandoc from conda-forge, cf. these lines. I think this should be good enough - at least I think I saw that conda forge versions of pandoc were well up-to-date.
Thanks! Well here we just shared Jupytext's experience for round trips, and applied our test framework to pandoc's format. Jupytext does not have a pandoc encoder, it directly uses |
Is there any reason not to use an existing Python wrapper, such as pypandoc or panflute (from my experience pypandoc is more robust as long as you don't need to access the AST)? At the very least you don't need to write to a temp file and read it again. Without any output files pandoc will write to stdout, and you can use PIPE to capture the stdout text directly. And will you consider providing an option not to require identity (better yet, allow users to specify the pandoc args used, e.g. personally I'd enable
Seems like you're only using conda to install pandoc but not needing other things from it? If so, you could take a look at https://github.com/ickc/pantable/blob/master/.travis.yml#L40-L46, also see a bit higher to see how multiple pandoc versions can be tested against. This is essentially also how I set up Travis CI in panflute. I think currently being able to have a round trip identity is partly luck because the test case is not complicated enough. but once jgm/pandoc#5408 is implemented it can be guaranteed (almost, except perhaps the raw to html mentioned there.) And the reason it is achievable probably should be credited to the design of ipynb. In the beginning I really hoped that iPython notebook would take the Rmarkdown approach. But being a JSON has a lot of nice features, including being able to have round-trip identity here. That's also why I'm interested in this project because basically this is providing the best of both world. So I've a digressed question for you: since this project is very young, I think it is reasonable to ask about the longevity of this format(s). (e.g. I'm considering turning all ipynb in my git repos to this to have better git diff.) e.g. (pardon me as I haven't used jupytext and not even finished reading the README.)
Actually you might have sparked an idea of writing a pandoc filter that does something like this for me. I'd think about it and see if there's any way to collaborate at some point. |
Hello @ickc, I am afraid you have too many questions!! I'll try to answer a few of them
Pandoc is not required for most of Jupytext formats, so I prefer not to take a dependency on
At the moment there's only one version of pandoc that can be used with Jupytext. And I prefer to be testing only the latest pandoc, as this is not a pandoc-centered project.
Yes! If someone asks for it, I'd be happy to. That someone can be you - please open an issue for that.
Certainly. The formats have evolved already. You can see the changes by looking at the history of files in the demo and test folders.
The most popular formats are probably
Please read about them in the README.
That is correct. I do not have much experience with Python docs, but maybe you can help turning the README into a real python documentation? Please open an issue for that as well, we can discuss that there. |
@mwouts I'd be happy to help you turn the README into a Sphinx site if you'd be interested in this. Would you be open to a PR? |
Hi Chris, sure, that would be very helpful! Please go ahead, a PR on this would be just great! |
Pandoc allow users to create Jupyter notebook from Markdown files, cf. the documentation. We'll try to plug
pandoc
into Jupytext and see if that is usable.The text was updated successfully, but these errors were encountered: