Capture YAML header key/values that aren't in `jupyter:`? #336

choldgraf · 2019-09-20T14:37:46Z

I'm playing around with using Jupytext to read in content in Jupyter Book, and one challenge I've run into is that it seems only the keys/values in the jupyter: field for the front-matter YAML are retained when you read in a text file with Jupytext. Do you know if:

There's a way to embed any front-matter YAML in the resulting NotebookNode object, regardless of its parent key?
There's a function within Jupytext that will read in the front-matter YAML that I can just use as a utility function? (we have one in Jupyter Book but I'm always a fan of deleting functions if they're implemented elsewhere :-) )

What I'm considering doing is something like the following:

For any book content, read in the file with jupytext. If it's originally a text file, check the resulting NotebookNode for any jupytext metadata. If that metadata exists, then do nothing. If that metadata does not exist, then replace all the cells with a single markdown cell which has the contents of the original text file (this is because jupytext changes the behavior of "default markdown" documents which I don't want).

The trick here is that I'd like to be able to then retain the other frontmatter that might have existed in the file, but right now it seems Jupytext gets rid of it. Any ideas here?

The text was updated successfully, but these errors were encountered:

mwouts · 2019-09-21T14:10:16Z

Thanks @choldgraf for asking. First thing, I should tell that I was obsessed with the round trip identity when I developed Jupytext. In other words, there is no part of the YAML header that is lost. The part that is not under the jupyter section goes to a raw cell that is prepended to the notebook. See for instance this test:

jupytext/tests/test_header.py

Lines 51 to 65 in 057d626

    
           def test_header_to_metadata_and_cell_metadata(): 
        
               text = """--- 
        
           title: Sample header 
        
           jupyter: 
        
             mainlanguage: python 
        
           --- 
        
           """ 
        
               lines = text.splitlines() 
        
               metadata, _, cell, pos = header_to_metadata_and_cell(lines, '') 
        
               assert metadata == {'mainlanguage': 'python'} 
        
               assert cell.cell_type == 'raw' 
        
               assert cell.source == """--- 
        
           title: Sample header 
        
           ---"""

Unfortunately I don't think any of the Jupytext functions related to the front-matter YAML are generic enough to be exposed outside of the Jupytext package. Currently the corresponding code is in the function jupytext.header.header_to_metadata_and_cell used in the test above. But again, I don't think you will want to use it...

Also, do we agree that the challenge you expose above could be solved by #321? If we have a good confidence that any Markdown file, converted to a notebook by Jupytext (without being executed), and then converted back to Markdown by nbconvert, then maybe we don't need to distinguish whether the Markdown file was conceived as a notebook, or not?

choldgraf · 2019-09-21T14:25:29Z

@mwouts ah cool, I appreciate your attention to detail on the round-trip stuff :-)

I see that the first cell does correctly have the YAML header metadata that you describe!

And you're right that I believe #321 would make this less of a problem (with the caveat that I do still need the non-jupytext yaml metadata for some of my pages, but I can grab it from the first raw cell that should be OK...it might be useful if that raw cell had a little metadata like jupytext: extra_header added to it). If I can read in any markdown cell with Jupytext and know that cells without Jupytext front-matter will behave like "regular" markdown cells, I think that would simplify things a bit

mwouts · 2019-09-21T16:32:10Z

Chris, I will be working on #321, but maybe your approach is safer. I mean, I am afraid the composition of Jupytext and nbconvert won't be exactly the identity (some extra blank lines are being added, etc). So, maybe you want something like this:

from jupytext import reads, writes
from nbformat.v4.nbbase import new_markdown_cell, new_notebook
# Open notebook
nb = reads(
'''---
title: A Markdown file with a front matter YAML
subtitle: but no 'jupyter' section
---

Etc
''', 'md')
# Not a Jupytext document?
if nb.metadata['jupytext']['notebook_metadata_filter'] == '-all':
    # Recover the Markdown content
    md = writes(nb, 'md') # or re-read the file
    # Replace the notebook with a new one, made of just one Markdown cell
    nb = new_notebook(cells=[new_markdown_cell(md)])

mwouts · 2019-10-13T01:39:47Z

@choldgraf , I think we're done with this issue, so I'll close it (and #321 is ready now). Please reopen if there's anything I've not answered here.

mwouts added this to the 1.3.0 milestone Sep 21, 2019

mwouts closed this as completed Oct 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture YAML header key/values that aren't in `jupyter:`? #336

Capture YAML header key/values that aren't in `jupyter:`? #336

choldgraf commented Sep 20, 2019

mwouts commented Sep 21, 2019

choldgraf commented Sep 21, 2019

mwouts commented Sep 21, 2019 •

edited

Loading

mwouts commented Oct 13, 2019

Capture YAML header key/values that aren't in jupyter:? #336

Capture YAML header key/values that aren't in jupyter:? #336

Comments

choldgraf commented Sep 20, 2019

mwouts commented Sep 21, 2019

choldgraf commented Sep 21, 2019

mwouts commented Sep 21, 2019 • edited Loading

mwouts commented Oct 13, 2019

Capture YAML header key/values that aren't in `jupyter:`? #336

Capture YAML header key/values that aren't in `jupyter:`? #336

mwouts commented Sep 21, 2019 •

edited

Loading