Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture YAML header key/values that aren't in jupyter:? #336

Closed
choldgraf opened this issue Sep 20, 2019 · 4 comments
Closed

Capture YAML header key/values that aren't in jupyter:? #336

choldgraf opened this issue Sep 20, 2019 · 4 comments
Milestone

Comments

@choldgraf
Copy link
Contributor

I'm playing around with using Jupytext to read in content in Jupyter Book, and one challenge I've run into is that it seems only the keys/values in the jupyter: field for the front-matter YAML are retained when you read in a text file with Jupytext. Do you know if:

  • There's a way to embed any front-matter YAML in the resulting NotebookNode object, regardless of its parent key?
  • There's a function within Jupytext that will read in the front-matter YAML that I can just use as a utility function? (we have one in Jupyter Book but I'm always a fan of deleting functions if they're implemented elsewhere :-) )

What I'm considering doing is something like the following:

For any book content, read in the file with jupytext. If it's originally a text file, check the resulting NotebookNode for any jupytext metadata. If that metadata exists, then do nothing. If that metadata does not exist, then replace all the cells with a single markdown cell which has the contents of the original text file (this is because jupytext changes the behavior of "default markdown" documents which I don't want).

The trick here is that I'd like to be able to then retain the other frontmatter that might have existed in the file, but right now it seems Jupytext gets rid of it. Any ideas here?

@mwouts
Copy link
Owner

mwouts commented Sep 21, 2019

Thanks @choldgraf for asking. First thing, I should tell that I was obsessed with the round trip identity when I developed Jupytext. In other words, there is no part of the YAML header that is lost. The part that is not under the jupyter section goes to a raw cell that is prepended to the notebook. See for instance this test:

def test_header_to_metadata_and_cell_metadata():
text = """---
title: Sample header
jupyter:
mainlanguage: python
---
"""
lines = text.splitlines()
metadata, _, cell, pos = header_to_metadata_and_cell(lines, '')
assert metadata == {'mainlanguage': 'python'}
assert cell.cell_type == 'raw'
assert cell.source == """---
title: Sample header
---"""

Unfortunately I don't think any of the Jupytext functions related to the front-matter YAML are generic enough to be exposed outside of the Jupytext package. Currently the corresponding code is in the function jupytext.header.header_to_metadata_and_cell used in the test above. But again, I don't think you will want to use it...

Also, do we agree that the challenge you expose above could be solved by #321? If we have a good confidence that any Markdown file, converted to a notebook by Jupytext (without being executed), and then converted back to Markdown by nbconvert, then maybe we don't need to distinguish whether the Markdown file was conceived as a notebook, or not?

@choldgraf
Copy link
Contributor Author

@mwouts ah cool, I appreciate your attention to detail on the round-trip stuff :-)

I see that the first cell does correctly have the YAML header metadata that you describe!

And you're right that I believe #321 would make this less of a problem (with the caveat that I do still need the non-jupytext yaml metadata for some of my pages, but I can grab it from the first raw cell that should be OK...it might be useful if that raw cell had a little metadata like jupytext: extra_header added to it). If I can read in any markdown cell with Jupytext and know that cells without Jupytext front-matter will behave like "regular" markdown cells, I think that would simplify things a bit

@mwouts
Copy link
Owner

mwouts commented Sep 21, 2019

Chris, I will be working on #321, but maybe your approach is safer. I mean, I am afraid the composition of Jupytext and nbconvert won't be exactly the identity (some extra blank lines are being added, etc). So, maybe you want something like this:

from jupytext import reads, writes
from nbformat.v4.nbbase import new_markdown_cell, new_notebook
# Open notebook
nb = reads(
'''---
title: A Markdown file with a front matter YAML
subtitle: but no 'jupyter' section
---

Etc
''', 'md')
# Not a Jupytext document?
if nb.metadata['jupytext']['notebook_metadata_filter'] == '-all':
    # Recover the Markdown content
    md = writes(nb, 'md') # or re-read the file
    # Replace the notebook with a new one, made of just one Markdown cell
    nb = new_notebook(cells=[new_markdown_cell(md)])

@mwouts mwouts added this to the 1.3.0 milestone Sep 21, 2019
@mwouts
Copy link
Owner

mwouts commented Oct 13, 2019

@choldgraf , I think we're done with this issue, so I'll close it (and #321 is ready now). Please reopen if there's anything I've not answered here.

@mwouts mwouts closed this as completed Oct 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants