Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for parsing raw multiline strings #836

Closed
wants to merge 2 commits into from

Conversation

jimustafa
Copy link
Contributor

These changes allow for multiline strings containing Markdown to be prefixed with "r" or "R" to indicate that they are raw strings. This can help avoid noisy syntax highlighting for escape characters, especially when writing LaTeX, which uses a lot of backslashes.

@mwouts
Copy link
Owner

mwouts commented Aug 16, 2021

Hi @jimustafa , thanks for your pull request!

Don't you think that we should add a test to make sure that a round trip on a text notebook with a raw string preserves the raw string?

Also for those like me who didn't know about raw strings, I see that they are documented at https://docs.python.org/3/reference/lexical_analysis.html

@jimustafa
Copy link
Contributor Author

Greetings @mwouts. Thanks for the feedback. Yes, it would be good to have a test for this case. Are you suggesting a check that py->ipynb->py is idempotent?

@mwouts
Copy link
Owner

mwouts commented Aug 19, 2021

Hello @jimustafa , yes I think we should test the round trips, both py->ipynb->py and ipynb->py->ipynb. We'll see that at a later stage and I could even take care of that.

Now I have another question for you. Could you please provide more information about what made you consider using raw strings? I am inclined to think that you author notebooks in a text editor and that text editor gave you trouble with some strings? Is that correct? Could you please provide an example or such a string?

I am asking as I think in jupytext we already treat the markdown cells as raw strings. As you've seen, we just remove the triple quotes (I mean, rather than calling eval). So maybe, to make it easier for people who have escaped characters in their notebooks and encode Markdown cells as strings, we should encode these cells as raw strings as soon as they contain an escaped character? What do you think?

@jimustafa
Copy link
Contributor Author

Yes, I have been writing regular Python scripts with a text editor and converting them to notebooks using jupytext. It seems to be the case that jupytext treats strings properly as raw strings and does not have issues with escaped characters. I only tried with a raw triple-quoted string to stop the editor from coloring the escape characters in order to reduce the visual noise. Only then did I notice that the contents were not correctly rendered as Markdown. An example of a raw string to render in Markdown is as follows:

# %% [markdown]
R"""
## Description

This notebook computes the blackbody spectrum.
$$
B_\lambda(\lambda,T)=\frac{2hc^2}{\lambda^5}\frac{1}{e^{hc/(\lambda k_\mathrm{B}T)}-1}
$$
"""

I think the current approach works well; it just needs to be generalized a bit to allow for the opening triple quotes to be prefixed with an "r" or "R" to indicate a raw string. It may be the case that a raw string is the most suitable for encoding Markdown cells, it certainly seems appropriate for writing a Markdown cell in a Python script. Escape characters aside, it would be nice to support the raw string format as acceptable Python syntax.

@mwouts
Copy link
Owner

mwouts commented Aug 31, 2021

Hi @jimustafa , sorry I was busy for a while. Thanks for the example, do you mind if I reuse it into a test?

What I propose is the following:

  • (ipynb to py) If the Markdown cell contains \, then the cell is encoded with a raw string with r"""...""". If it does not contain \, then we encoded it as a plain string as before.
  • (py to ipynb) Plain and raw strings are supported

The impact on existing text notebooks should be minor (only cells with \ will be affected), and this change will turn the py file into a more standard one.

In terms of round trip, ipynb->py->ipynb will be completely stable, however py -> ipynb -> py will turn raw strings into plain strings if these strings do not contain \, and will turn R""" strings into r""" strings. Does that sound OK to you?

@jimustafa
Copy link
Contributor Author

All good, thanks for getting back to the discussion. Yes, please feel free to use the snippet.

Regarding your proposal, I would not change how ipynb->py encodes strings depending on the presence of \. For the py->ipynb case, this PR takes

# %% [markdown]
R"""
# Test
"""

and yields (partially)

 "cells": [
  {
   "cell_type": "markdown",
   "id": "d8046188",
   "metadata": {
    "cell_marker": "R\"\"\"\n,\n\"\"\""
   },
   "source": [
    "# Test"
   ]
  }
 ]

Then the ipynb->py step gives

# %% [markdown]
# # Test

So, it seems that the py->ipynb conversion is working, but not ipynb->py. Without raw strings, the cell_marker metadata is simply "\"\"\"", so maybe the different cell_marker is causing problems.

@mwouts
Copy link
Owner

mwouts commented Sep 8, 2021

Hi @jimustafa , I have added a few tests & commits on top of yours at #850 , would you like to give a try to that version?

You can install it with

BUILD_JUPYTERLAB_EXTENSION=1 pip install git+https://github.com/mwouts/jupytext.git@use_raw_strings_in_py_scripts

(feel free to remove BUILD_JUPYTERLAB_EXTENSION=1 if you don't need it)

@jimustafa
Copy link
Contributor Author

Looks like #850 works! Thanks!

Now, should we start thinking about raw f-strings 😉 Not even sure if that should be expected to work... Well, maybe not interpolation, but, even without interpolation, it would be a valid triple-quoted string. For example:

# %% [markdown]
Rf"""
A $\LaTeX$ expression
"""

@mwouts
Copy link
Owner

mwouts commented Sep 14, 2021

Thank you @jimustafa for giving it a try!

Ha ha I like the f-string suggestion... This reminds me of #498 and of the python-markdown extension

However I don't think it is a good idea to add the f qualifier for Markdown cells that look like f-strings, at least at the moment, because a) I have no clue how popular the python-markdown extension is, b) AFAIK it is not available for Jupyter Lab and c) the extension uses pairs of curly brackets while f-strings use single brackets... Hope you're not disappointed!

@jimustafa
Copy link
Contributor Author

Hey @mwouts. No disappointment at all! Thanks for the discussion, and for pointing me to #498.

Looks like your modifications in #850 did the trick. Your work on jupytext is appreciated, and hopefully I can find other opportunities to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants