Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipynb round trip details #5398

Closed
jgm opened this issue Mar 28, 2019 · 7 comments
Closed

ipynb round trip details #5398

jgm opened this issue Mar 28, 2019 · 7 comments

Comments

@jgm
Copy link
Owner

jgm commented Mar 28, 2019

From mwouts/jupytext#208

Using files in https://github.com/mwouts/jupytext/blob/1.1.0_pandoc_with_mirror_tests/tests/notebooks/ipynb_py.

Command:

% for x in *.ipynb; do pandoc -s -f ipynb -t markdown --wrap=preserve --atx-headers --extract-media new "$x" -o new/"$x".markdown; pandoc -s -f markdown -t ipynb --wrap=preserve --atx-headers new/"$x".markdown -o new/"$x"; echo $x; json-diff "$x" new/"$x";  done | grep -v VBOR # this grep removes the big diffs for images which just have to do with newline placement

Result:

Notebook_with_R_magic.ipynb
Notebook_with_function_and_cell_metadata_164.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/toc/nav_menu","value":""},
{"op":"replace","path":"/metadata/toc/base_numbering","value":"1.0"},
{"op":"replace","path":"/metadata/toc/toc_position","value":""},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_html_and_latex_cells.ipynb
[WARNING] Duplicate identifier 'section' at line 50 column 1
[WARNING] Duplicate identifier 'section-1' at line 54 column 1
Notebook_with_many_hash_signs.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/cells/0/source/0","value":"################################################################## \n"},
{"op":"add","path":"/cells/0/source/1","value":"\n"},
{"op":"replace","path":"/cells/0/source/4","value":"\n"},
{"op":"add","path":"/cells/0/source/5","value":"################################################################## "},
{"op":"replace","path":"/cells/2/source/0","value":"################################################################## \n"},
{"op":"add","path":"/cells/2/source/1","value":"\n"},
{"op":"replace","path":"/cells/2/source/4","value":"\n"},
{"op":"add","path":"/cells/2/source/5","value":"################################################################## "},\
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
Notebook_with_more_R_magic_111.ipynb
pandoc: Stack space overflow: current size 33624 bytes.
pandoc: Use `+RTS -Ksize -RTS' to increase it.
World_population.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/cells/0"},{"op":"remove","path":"/cells/0"},
{"op":"remove","path":"/metadata/language_info"},
{"op":"remove","path":"/metadata/kernelspec"}]
convert_to_py_then_test_with_update83.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/cells/0/outputs/0/text/2","value":"Wall time: 188 µs"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
frozen_cell.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/cells/0/outputs/0/text/0","value":"I'm a regular cell so I run and print!"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_again.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/cells/1/outputs/0/text/6","value":"title: Quick ioslides"},
{"op":"remove","path":"/cells/1/outputs/0/text/7"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_in_body.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/1"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
jupyter_with_raw_cell_on_top.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},{"op":"remove","path":"/cells/0"},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]
notebook_with_complex_metadata.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"},
{"op":"replace","path":"/metadata/widgets/state/a65a11f142ca44eebc913788d256adcb/views/0/cell_index","value":"92.0"}]
nteract_with_parameter.ipynb
sample_rise_notebook_66.ipynb
[{"op":"replace","path":"/nbformat_minor","value":5},
{"op":"replace","path":"/metadata/language_info/codemirror_mode/version","value":"3.0"}]

Most of these issues have to do with the fact that nbformat (a JSON integral number 4) gets rendered in markdown YAML metadata as "4.0". One issue has to do with markdown YAML rendering of an empty map as the empty string instead of {}.

There's also a stack overflow, which should definitely be looked at.

@jgm
Copy link
Owner Author

jgm commented Mar 28, 2019

The file that gives the stack overflow contains a raw HTML data cell that is over 2 million characters long. Note also that no stack overflow occurs with output to HTML.

@mwouts
Copy link

mwouts commented Mar 28, 2019

Thanks @jgm for following-up on this.

I should have said that I mostly care about metadata and input cells. Indeed, Jupytext never stores output cells in the text formats (instead, outputs are preserved in a paired .ipynb file). Consequently, my plans are to run pandoc on notebooks with outputs removed.

@jgm
Copy link
Owner Author

jgm commented Mar 28, 2019

Moving stack overflow issue to #5401 (it's not specific to ipynb).
Remaining issues here:

  • should we be creating arbitrary header levels, such as level 65, for #####...? Commonmark has a limit of 6 on header levels.
  • avoid having nbformat render in pandoc YAML markdown as "4.0".
  • improve YAML markdown rendering of an empty map (should be {} not empty string).

jgm added a commit that referenced this issue Mar 28, 2019
Should be `{}` not empty string.

Partially addresses #5398.
@jgm jgm closed this as completed in 4c9a68e Mar 28, 2019
@jgm jgm reopened this Mar 28, 2019
@jgm
Copy link
Owner Author

jgm commented Mar 29, 2019

I've fixed most of these issues, but I am getting one bad result still. This is with jupyter_with_raw_cell_on_top.ipynb. Pandoc takes the first, raw cell

  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"Quick test\"\n",
    "output:\n",
    "  ioslides_presentation:\n",
    "    widescreen: true\n",
    "    smaller: true\n",
    "editor_options:\n",
    "     chunk_output_type console\n",
    "---"
   ]
  },

and produces the markdown

::: {.cell .raw}
---
title: "Quick test"
output:
  ioslides_presentation:
    widescreen: true
    smaller: true
editor_options:
     chunk_output_type console
---  
:::  

which then becomes the ipynb

  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## :::\n",
    "\n",
    "title: “Quick test”\n",
    "output:\n",
    "ioslides\\_presentation:\n",
    "widescreen: true\n",
    "smaller: true\n",
    "editor\\_options:\n",
    "chunk\\_output\\_type console\n",
    "—\n",
    ":::"
   ]
  }

@jgm
Copy link
Owner Author

jgm commented Mar 29, 2019

Ah. This looks like a bad interaction between the setext header syntax and the fenced div syntax.
::: {.cell .raw}\n--- gets parsed as a setex header with attributes.

@jgm
Copy link
Owner Author

jgm commented Mar 29, 2019

Got it.
OK, all the round-trip obstacles removed!

@jgm jgm closed this as completed Mar 29, 2019
@mwouts
Copy link

mwouts commented Mar 29, 2019

Excellent! Thanks @jgm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants