add support for parsing raw multiline strings #836

jimustafa · 2021-08-13T06:55:40Z

These changes allow for multiline strings containing Markdown to be prefixed with "r" or "R" to indicate that they are raw strings. This can help avoid noisy syntax highlighting for escape characters, especially when writing LaTeX, which uses a lot of backslashes.

mwouts · 2021-08-16T06:42:13Z

Hi @jimustafa , thanks for your pull request!

Don't you think that we should add a test to make sure that a round trip on a text notebook with a raw string preserves the raw string?

Also for those like me who didn't know about raw strings, I see that they are documented at https://docs.python.org/3/reference/lexical_analysis.html

jimustafa · 2021-08-16T08:20:04Z

Greetings @mwouts. Thanks for the feedback. Yes, it would be good to have a test for this case. Are you suggesting a check that py->ipynb->py is idempotent?

mwouts · 2021-08-19T17:12:03Z

Hello @jimustafa , yes I think we should test the round trips, both py->ipynb->py and ipynb->py->ipynb. We'll see that at a later stage and I could even take care of that.

Now I have another question for you. Could you please provide more information about what made you consider using raw strings? I am inclined to think that you author notebooks in a text editor and that text editor gave you trouble with some strings? Is that correct? Could you please provide an example or such a string?

I am asking as I think in jupytext we already treat the markdown cells as raw strings. As you've seen, we just remove the triple quotes (I mean, rather than calling eval). So maybe, to make it easier for people who have escaped characters in their notebooks and encode Markdown cells as strings, we should encode these cells as raw strings as soon as they contain an escaped character? What do you think?

jimustafa · 2021-08-20T06:44:01Z

Yes, I have been writing regular Python scripts with a text editor and converting them to notebooks using jupytext. It seems to be the case that jupytext treats strings properly as raw strings and does not have issues with escaped characters. I only tried with a raw triple-quoted string to stop the editor from coloring the escape characters in order to reduce the visual noise. Only then did I notice that the contents were not correctly rendered as Markdown. An example of a raw string to render in Markdown is as follows:

# %% [markdown]
R"""
## Description

This notebook computes the blackbody spectrum.
$$
B_\lambda(\lambda,T)=\frac{2hc^2}{\lambda^5}\frac{1}{e^{hc/(\lambda k_\mathrm{B}T)}-1}
$$
"""

I think the current approach works well; it just needs to be generalized a bit to allow for the opening triple quotes to be prefixed with an "r" or "R" to indicate a raw string. It may be the case that a raw string is the most suitable for encoding Markdown cells, it certainly seems appropriate for writing a Markdown cell in a Python script. Escape characters aside, it would be nice to support the raw string format as acceptable Python syntax.

mwouts · 2021-08-31T21:34:26Z

Hi @jimustafa , sorry I was busy for a while. Thanks for the example, do you mind if I reuse it into a test?

What I propose is the following:

(ipynb to py) If the Markdown cell contains \, then the cell is encoded with a raw string with r"""...""". If it does not contain \, then we encoded it as a plain string as before.
(py to ipynb) Plain and raw strings are supported

The impact on existing text notebooks should be minor (only cells with \ will be affected), and this change will turn the py file into a more standard one.

In terms of round trip, ipynb->py->ipynb will be completely stable, however py -> ipynb -> py will turn raw strings into plain strings if these strings do not contain \, and will turn R""" strings into r""" strings. Does that sound OK to you?

jimustafa · 2021-09-01T06:48:47Z

All good, thanks for getting back to the discussion. Yes, please feel free to use the snippet.

Regarding your proposal, I would not change how ipynb->py encodes strings depending on the presence of \. For the py->ipynb case, this PR takes

# %% [markdown]
R"""
# Test
"""

and yields (partially)

 "cells": [
  {
   "cell_type": "markdown",
   "id": "d8046188",
   "metadata": {
    "cell_marker": "R\"\"\"\n,\n\"\"\""
   },
   "source": [
    "# Test"
   ]
  }
 ]

Then the ipynb->py step gives

# %% [markdown]
# # Test

So, it seems that the py->ipynb conversion is working, but not ipynb->py. Without raw strings, the cell_marker metadata is simply "\"\"\"", so maybe the different cell_marker is causing problems.

mwouts · 2021-09-08T22:53:57Z

Hi @jimustafa , I have added a few tests & commits on top of yours at #850 , would you like to give a try to that version?

You can install it with

BUILD_JUPYTERLAB_EXTENSION=1 pip install git+https://github.com/mwouts/jupytext.git@use_raw_strings_in_py_scripts

(feel free to remove BUILD_JUPYTERLAB_EXTENSION=1 if you don't need it)

jimustafa · 2021-09-09T06:20:17Z

Looks like #850 works! Thanks!

Now, should we start thinking about raw f-strings 😉 Not even sure if that should be expected to work... Well, maybe not interpolation, but, even without interpolation, it would be a valid triple-quoted string. For example:

# %% [markdown]
Rf"""
A $\LaTeX$ expression
"""

mwouts · 2021-09-14T20:52:56Z

Thank you @jimustafa for giving it a try!

Ha ha I like the f-string suggestion... This reminds me of #498 and of the python-markdown extension

However I don't think it is a good idea to add the f qualifier for Markdown cells that look like f-strings, at least at the moment, because a) I have no clue how popular the python-markdown extension is, b) AFAIK it is not available for Jupyter Lab and c) the extension uses pairs of curly brackets while f-strings use single brackets... Hope you're not disappointed!

jimustafa · 2021-09-15T07:13:44Z

Hey @mwouts. No disappointment at all! Thanks for the discussion, and for pointing me to #498.

Looks like your modifications in #850 did the trick. Your work on jupytext is appreciated, and hopefully I can find other opportunities to contribute.

add support for parsing raw multiline strings

74f10d5

fix formatting to pass pre-commit checks

0577997

mwouts mentioned this pull request Sep 8, 2021

Encode Markdown cells with a backslash as raw strings in py:percent scripts #850

Merged

jimustafa closed this Sep 15, 2021

mwouts mentioned this pull request Dec 9, 2021

Deprecation warning due to invalid escape sequences. #499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for parsing raw multiline strings #836

add support for parsing raw multiline strings #836

jimustafa commented Aug 13, 2021

mwouts commented Aug 16, 2021

jimustafa commented Aug 16, 2021

mwouts commented Aug 19, 2021

jimustafa commented Aug 20, 2021

mwouts commented Aug 31, 2021

jimustafa commented Sep 1, 2021

mwouts commented Sep 8, 2021

jimustafa commented Sep 9, 2021

mwouts commented Sep 14, 2021

jimustafa commented Sep 15, 2021

add support for parsing raw multiline strings #836

add support for parsing raw multiline strings #836

Conversation

jimustafa commented Aug 13, 2021

mwouts commented Aug 16, 2021

jimustafa commented Aug 16, 2021

mwouts commented Aug 19, 2021

jimustafa commented Aug 20, 2021

mwouts commented Aug 31, 2021

jimustafa commented Sep 1, 2021

mwouts commented Sep 8, 2021

jimustafa commented Sep 9, 2021

mwouts commented Sep 14, 2021

jimustafa commented Sep 15, 2021