Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer MyST notebook files by looking for ```{code-cell} blocks #120

Closed
choldgraf opened this issue Apr 1, 2020 · 21 comments
Closed

Infer MyST notebook files by looking for ```{code-cell} blocks #120

choldgraf opened this issue Apr 1, 2020 · 21 comments
Labels
enhancement New feature or request

Comments

@choldgraf
Copy link
Member

I'm writing up the documentation on MyST notebooks in the CLI, and I found that it is a bit cumbersome to ask users to explicitly create Jupytext headers for their MyST notebooks. It's easy if you're using Jupytext, but if somebody wanted to create a MyST notebook from scratch, they won't be able to remember:

jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: '0.8'
    jupytext_version: 1.4.1+dev
kernelspec:
  display_name: Python 3
  language: python
  name: python3

Could we add an extra check here: https://github.com/ExecutableBookProject/MyST-NB/blob/0919b71ab5efc33823007bc2d92d64a8c7d217da/myst_nb/converter.py#L46

such that if no jupytext: block was found in the metadata, it scanned the lines for one that began with ```{code-cell} and, if so, assumes that this is a myst-nb file? We could even raise a warning in this case saying "I'm inferring that this is a myst notebook, you should explicitly make it so by adding this YAML metadata"

@choldgraf choldgraf added the enhancement New feature or request label Apr 1, 2020
@chrisjsewell
Copy link
Member

chrisjsewell commented Apr 8, 2020

You don't need all of that though, just:

jupytext:
  text_representation:
    format_name: myst
kernelspec:
  display_name: Python 3
  name: python3

Is that really that cumbersome? If not it would likely be way more time-consuming, because you then have to read through the entirety of every markdown file to check for {code-cell}, rather than now it only needs to read the first few lines

@choldgraf
Copy link
Member Author

choldgraf commented Apr 8, 2020

Hmmm - I do worry that it is too much for people to remember. Here are the points of confusion that I can think of:

  • remember the structure of the yaml

    e.g., is it

    jupytext:
    kernelspec:
    

    or

    jupytext:
        kernelspec:
    
  • remember the specific names used (e.g. "is is jupytext or jupyter? is it kernel or kernelspec`?

  • Remember the name of the kernel (python3 on some machines, it's called something different on other machines, and the large majority of users don't ever realize this, they don't think of what a "kernel" is they just "use python")

In the end I guess that most people will need to look up another myst-nb file locally or on the internet and then copy/paste the header information to make it work. That feels like unnecessary complexity to me. My guess is that most people would want a myst markdown file to behave the same way that a notebook works: you just "create the file" and don't have to worry about the details

I think one way that we could get around this would be to have a short-hand for myst-nb, something simple like myst-nb: true at the top, and then when creating the ipynb file it would choose reasonable defaults for all of those values. Or, we could add a helper function in jupyter book that checks for myst-nb: true in the header of pages and replaces the header with the proper jupytext header

@chrisjsewell
Copy link
Member

you just "create the file" and don't have to worry about the details

Well when you say "just create", you still have to install jupyter first, then open a new notebook within the editor you loaded by instantiating jupyter lab and selecting from the available kernels (if more than one).
"just create" would be to actually writing the notebook from scratch as a JSON file, in which case you would have to include the kernel name.

A CLI command would be the way; whereby you supply the file(s) to insert the metadata on, and it would give you the same option of available kernels as you get in jupyterlab/notebook.

@jstac
Copy link
Member

jstac commented Apr 8, 2020

Could this be partly solved by a .mnb file extension? If so, isn't that worth considering?

To me this seems more explicit and hence more informative. Even if you still need to specify the kernel, some of the confusion is removed.

@chrisjsewell
Copy link
Member

I think that ship has already sailed, using a different file extension was already ruled out #82

@choldgraf
Copy link
Member Author

choldgraf commented Apr 8, 2020

@chrisjsewell I wonder if this is something that could be added on the jupytext side. Something like jupytext init path/to/markdown.md <format-flavor>. And all it would do is add the proper YAML at the top of the file for jupytext.

We could also easily add this just for MyST-markdown with the jupyter-book tool. Something like jupyter-book myst path/to/file.md and, if it had an ipynb extension, it would convert it to myst markdown, and if it had an .md extension, it would add the jupytext myst markdown header to the YAML

Also just to be clear - I think that for the MyST-NB repository, it is fine asking people to hand-type their own myst-markdown config. The people using that repo will be more developer types. I am more worried about Jupyter Book users, who are usually not developers and have lower tolerance for UX complexity

edit: if I create a regular markdown file and run jupytext myfile.md --to myst then it doesn't add any header information, so this would either need to be an addition to jupytext, or a custom function in our CLI

@jstac
Copy link
Member

jstac commented Apr 8, 2020

Also just to be clear - I think that for the MyST-NB repository, it is fine asking people to hand-type their own myst-markdown config. The people using that repo will be more developer types. I am more worried about Jupyter Book users, who are usually not developers and have lower tolerance for UX complexity

I'm worried about that too.

I suppose it can be aleviated in the docs: recommended workflow is to edit notebooks in Jupyter. Use jupytext if you wish to work with text based source files. This is helpful if you want to use version control...etc.

Not that many people read documentation. Most will look to examples. And that's a disadvantage of the QE example, now I think about it --- the source files are all myst-nb. Perhaps we need another where the source files are just ipynbs.

@chrisjsewell
Copy link
Member

@chrisjsewell I wonder if this is something that could be added on the jupytext side. Something like jupytext init path/to/markdown.md . And all it would do is add the proper YAML at the top of the file for jupytext.

Yeh by all means raise an issue in jupytext for this

We could also easily add this just for MyST-markdown with the jupyter-book tool. Something like jupyter-book myst path/to/file.md and, if it had an ipynb extension, it would convert it to myst markdown, and if it had an .md extension, it would add the jupytext myst markdown header to the YAML

Yeh maybe something like that.

Obviously it should be noted here that this is only an issue if you are directly creating the markdown file. If you are converting from a notebook jupytext --to myst notebook.ipynb, then all of this metadata gets added automatically.

@choldgraf
Copy link
Member Author

yeah for sure - I have just found a few times now I just wanna start writing a new markdown file from scratch that has a notebook structure, and have gotten tripped up a few times needing to find some YAML header boilerplate to copy from another file

@choldgraf
Copy link
Member Author

choldgraf commented Apr 14, 2020

I asked around in jupytext, and the answer is here:

mwouts/jupytext#485 (comment)

it's basically

jupytext --set-formats md:pandoc --set-kernel - notebook.md

so maybe we add a little CLI command like

jupyter-book myst init myfile.md

or

jupyter-book myst-init myfile.md

and it initializes a myst file with the jupytext header using a default jupyter kernel. WDYT? I think that'd make it more discoverable than asking people to remember the jupytext invocation.

@chrisjsewell
Copy link
Member

Sounds good

using a default jupyter kernel

I think it would be good to prompt the user for which kernel to use (using the code from jupyter kernelspec list), if they don't specify one. That should be pretty easy to do with click option callbacks

@choldgraf
Copy link
Member Author

Do you think as a first step that we could just tell people what kernel was used by adding an output message like:

markdown file {file} was initialized as MyST markdown with kernel {kernel}.

and if it became a point of confusion for users (e.g. if we got issues like "why doesn't my myst kernel work?!") then we could consider making it a forced choice by the users?

@chrisjsewell
Copy link
Member

chrisjsewell commented Apr 14, 2020

No I think its best to just add it straight away, otherwise it will never get done, just something dead simple like:

$ jupyter-book myst-init --kernel python2 a*.md
$ jupyter-book myst-init b*.md
Available kernels:
1)  python2                                             //anaconda/share/jupyter/kernels/python2
2)  python3                                             //anaconda/share/jupyter/kernels/python3
Please select a kernel [1]?

Note that nbformat fails validation if you don't set both display_name and name

@choldgraf
Copy link
Member Author

choldgraf commented Apr 14, 2020

I think this might raise some confusion in people (in my experience the large majority of Jupyter users, and the vast majority of newer users, have no concept of "multiple kernels"...they just "start jupyter with python"). And in many cases where kernels are registered programmatically, it causes more confusion than not. E.g. here's my kernel spec output:

$ jupyter kernelspec list
Available kernels:
  python38064bitdevconda4b8a6e5722f543c58b1e2eb07ef73d17    /home/choldgraf/.local/share/jupyter/kernels/python38064bitdevconda4b8a6e5722f543c58b1e2eb07ef73d17
  python3                                                   /home/choldgraf/anaconda/envs/dev/share/jupyter/kernels/python3

(I suspect that the first one is registered by vscode but I don't actually know where it came from)

If I have no mental model of environments, kernels, etc (which most Jupyter users do not, I would guess), I don't really know what to do about those two options. I'm worried that this confusion will then arise in the form of support questions etc in our repositories...

So I guess my thinking can be summarized as:

  • If people have a mental model of multiple kernels already, then they probably know that this is important to specify when you "register a myst markdown file with a kernel" and don't need to be forced to choose manually.
  • If people don't have a model of multiple kernels, then forcing them to choose may cause more confusion because it forces them to deal with that question without giving them context about what it means.

But, I also recognize why you think this is a good idea...hmmm.

If, considering what I said above, you still think we should get feedback about it, then I'll concede and we can build in a "kernel selector" step (maybe only do this if there's more than one kernel?).

@chrisjsewell
Copy link
Member

But this is exactly what you get when you select a new notebook in Jupyter Lab:
image
I believe your words were: "My guess is that most people would want a myst markdown file to behave the same way that a notebook works" 😜

@chrisjsewell
Copy link
Member

I suspect that the first one is registered by vscode but I don't actually know where it came from

Yep, vscode-python registers ones per conda env

@chrisjsewell
Copy link
Member

and yes, if only one kernel was registered then this would be selected without asking

@choldgraf
Copy link
Member Author

I think the most-common ways to create new notebooks are:

  1. Through the classic notebook UI:

    image

  2. In JupyterLab like so

    image

(though don't have data on that so I might be wrong). In both cases users just click "the python one" and don't necessarily need to know about kernels etc.

but, that aside - OK let's compromise on the following:

  1. API is of the form jupyter-book myst-init --kernel {kernelname} {glob-pattern}.md
  2. If no --kernel is given, then:
  • If there is only one kernel in the kernelspec, that is used
  • If there is > 1 kernel, then a prompt is given to ask users to manually specify

Sound good? Any strong preferences between jupyter-book myst init and jupyter-book myst-init? Can we think of any other kinds of commands that we could envision for a myst click group?

@chrisjsewell
Copy link
Member

chrisjsewell commented Apr 14, 2020

In both the cases you show, you'll get multiple kernels offered, if you have multiple kernels installed:
image
image

But yeh sounds good

Can we think of any other kinds of commands that we could envision for a myst click group?

Nothing important off the top of my head. But if you wanted, it wouldn't be difficult to have some "stats" commands, like counts of syntax elements. Or, some time down the line, linting and formatting

@choldgraf
Copy link
Member Author

I've got some of this functionality now here: executablebooks/cli#97

It's not quite as sophisticated as having a fully-interactive thing, but I think is a good start.

@chrisjsewell
Copy link
Member

Closing this in preference to #214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants