Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 #6847

felix-cw · 2023-08-24T09:22:01Z

I'm not 100% sure this is the right project to raise this, since the problem comes from interaction with python-lsp-ruff and juptyerlab-lsp. I only raised it here since the issue arose from a change in ruff.

Since 0.0.285, the jupyterlab-lsp plugin now gives the following diagnostic message:

SyntaxError: Expected a Jupyter Notebook (.ipynb), which must be internally stored as JSON, but found a Python source file: expected value at line 1 column 1

I believe this is because jupyterlab-lsp extracts the python code from the notebook cells, then python-lsp-ruff passes it to ruff via stdin, and specifies --stdin-filename. Since #6628, this makes ruff expect ipynb JSON, rather than python code, which causes the error.

The text was updated successfully, but these errors were encountered:

dhruvmanila · 2023-08-25T02:41:56Z

Hey, thanks for opening the issue!

First of all, your observation is spot on in that the --stdin-filename is used by python-lsp-ruff¹ and that JupyterLab is probably sending the source code as Python (instead of JSON) to the server² (They have a concept of Virtual Document). Here, the server is python-lsp-ruff which uses the Ruff executable.

You can see the URI for the document which has an extension of .ipynb:

A few solutions which come to my mind:

python-lsp-ruff would always update the document URI to use .py extension but that would mean any per-file-ignores using the .ipynb extension won't work

A new CLI flag to specify the source type explicitly (similar to the --parser flag in Prettier):

 --parser <flow|babel|babel-flow|babel-ts|typescript|acorn|espree|meriyah|css|less|scss|json|json5|json-stringify|graphql|markdown|mdx|vue|yaml|glimmer|html|angular>
                          Which parser to use.

Feel free to share any other suggestions :)

felix-cw · 2023-10-03T08:52:58Z

Hi all. Sorry for posting on an old issue. For me, I would prefer option 2: a new CLI flag, so that options such as per-file-ignores can work both in a jupyter lsp context as well as when running ruff on notebooks in command lines, pre-commit etc.

An example of a per-file-ignore that can definitely be scoped to ipynb files is B018.

zanieb · 2023-10-03T13:29:15Z

Could our notebook parser (or a step upstream of it) just be updated to wrap the content in a single cell if it's not already? I don't see a parser option as a user-facing setting.

felix-cw · 2023-10-03T14:50:38Z

I think that could work. My concern was that since ruff output includes cell numbers, it would interfere with the part of jupyterlab-lsp which maps line numbers in the linter output back to the original cells.
However, it looks like when --format=json it just gives line numbers so should be fine.

charliermarsh · 2023-10-03T15:57:05Z

I think the exclusion of cell numbers is a bug in the JSON format is a bug (see, e.g., #7664). Broadly, I'm worried that treating Python source as Jupyter could cause other problems, some unanticipated (e.g., if you accepted the code via stdin, we'd then output notebook JSON when formatting or autofixing).

Without doing deeper research, my current preferences would be something like:

Figure out how to get jupyterlab-lsp to send us the full notebook. This would the most reliable and it would avoid unnecessarily converting from Jupyter to Python back to Jupyter.
Add a dedicated flag to determine the source type (like --parser). I think this is reasonable, even if only intended for LSPs and other programmatic clients.
If we parse a notebook, and it's not JSON, fall back to parsing as Python? This is sort of similar to Zanie's suggestion but internally we'd treat the whole thing as if it were Python rather than trying to convert it to Jupyter. This feels like a structural error though, I don't love it.

zanieb · 2023-10-03T16:05:25Z

Agree that 1. would be preferable. 2. is fine with me if we hide it from the CLI help menu until there are more user-focused use-cases.

felix-cw · 2023-10-06T10:05:56Z

Thanks everyone! I thought that it might be too complicated to get python-lsp-ruff/jupyterlab-lsp to use linters that work on notebooks directly, so I made a PR implementing the --parser option. Thanks for the opportunity to contribute to such a useful and amazing project!

MichaReiser · 2023-10-17T03:25:11Z

My concern with 2 is that, compared to Prettier, we don't support different parsers. I prefer a CLI flag that is more high level and instead allows overriding the file type (mime type remapping), which can be useful in other use cases other than the one mentioned here.

charliermarsh · 2023-10-17T03:32:41Z

Can you spell out in a bit more detail what this CLI flag would look like and/or how it would differ from the above?

dhruvmanila · 2023-10-17T03:51:31Z

overriding the file type (mime type remapping)

Wouldn't that be in conflict with the inferred filetype from the filename's extension provided via --stdin-filename? Or, would that mean that the inference will be avoided?

FWIW, in ruff-dev, I added an option --jupyter which would just mean that the content of the Python file is from a Jupyter Notebook. This helped in development where I didn't need to run a Jupyter server for testing purposes. I think this is a similar case where we want to tell Ruff that the content of this file is from a different filetype.

MichaReiser · 2023-10-17T06:03:29Z

Wouldn't that be in conflict with the inferred filetype from the filename's extension provided via --stdin-filename? Or, would that mean that the inference will be avoided?

What I have in mind is that you can map extensions to other files. I don't have a good python example but JSON supports different standards and a user may decide that all their .json files are JSON5. They could then configure *.json = JSON5. I guess it's similar to parser except that we define the language rather than the parser (because we don't support different Python parser).

We could expose the same option on the CLI and it could then be used as --extension ipnby=python or similar.

charliermarsh · 2023-10-20T16:21:14Z

Interesting, so --extension ipynb=python could be repeated?

zanieb · 2023-10-20T16:33:47Z

I kind of like that API

krassowski · 2023-10-20T23:08:05Z

Hi, some context from jupyter-lsp side:

when jupyter-lsp was created the LSP protocol had exclusively support for plain text files
while the most recent LSP version (3.17) embraced notebooks, it does it in a bit convoluted way, sadly not aligned with Jupyter standards; one consequence of it is that even if jupyterlab-lsp implements support for the notebook messages from 3.17, it will still not pass the true notebook format JSON but rather a quasi-notebook structure implementing the NotebookDocument interface which diverges from the Jupyter ipynb standard (e.g. kind instead of cell_type, executionSummary.executionOrder instead of execution_count etc) and in which the cell sources are not actually stored but abstracted out to other documents which needs to be retrieved separately.
there are some pros and cons for jupyter-lsp sending concatenated source rather than pure JSON:
- (+) we can piece together nested documents having different languages (e.g. ipython cells with %%sql can be hidden from the parent ruff server)
- (+) we do not need to worry about stripping metadata/outputs which can weight hundreds of MB in a notebook
- (-) we loose structure information and thus the ability to provide some cool hints such as "3 empty cells" or "Code in a markdown cell will not be executed, did you change the cell type by accident"

The last issue about loss of structure could be alleviated if Jupyter ends up formalising a canonical text-based representation of notebooks, currently being discussed from the Markdown-first perspective in jupyter/enhancement-proposals#103 and even more if it includes a way to preserve cell-level metadata, but this has less of an impact on a discussion herein, maybe beyond naming of a future option (ipynb is very concrete Jupyter standard, but future Jupyter notebook serialization formats may allow other formats, so maybe it is best to avoid option names like --jupyter or only make them an alias for say --ipynb).

jupyter-lsp could potentially abuse LSP protocol and send the JSON dump as if it was plain text. I would welcome PRs (in jupyterlab/jupyterlab-lsp) which explore this idea, but as of now I am not convinced how well it would work and how much work is needed to make it practical. Again, the problem with the idea that ruff can format notebooks is that if we want to have a quick reply times, we still need to strip the large visualisations from cell outputs and then stitch them back together (as the time 100MB travels on the wire, especially for remote servers, far outweighs the benefits of ruff being blazing fast); if ruff were to rearrange/remove/add/merge cells in reformatting this would be a bit hard to achieve (but possible if we just replaced the outputs with dummy tokens for the round trip).

dhruvmanila · 2023-10-23T02:52:46Z

We could expose the same option on the CLI and it could then be used as --extension ipnby=python or similar.

So, the mapping will be from file extension to the language (or file extension ipynb=py?). This means one could also do py=ipynb which would mean that the Python file should contain a JSON representing the Notebook. I would prefer to introduce this as a hidden flag for now until we finalize a general way of providing multi-language support.

jupyter-lsp could potentially abuse LSP protocol and send the JSON dump as if it was plain text.

Yes, we're doing this in astral-sh/ruff-lsp#264 where the LSP creates a JSON text representing the Notebook in VSCode which is what is being sent to the Ruff CLI. This works for us as we're in control of what fields we use from the JSON format and I agree that it might be difficult to incorporate this in a general way for multiple tools.

felix-cw · 2023-10-26T12:55:22Z

We could expose the same option on the CLI and it could then be used as --extension ipnby=python or similar.

I've had a go at implementing this idea as a hidden flag. Is it OK if I raise a PR for it and close the previous PR?

dhruvmanila · 2023-10-27T06:10:04Z

I'm not exactly sure if this is going to be enough. Ruff keeps track of the cell boundaries which are used to separate import blocks in each cell. So, for example, take the following two cells:

Cell 1:

from math import pi
import os

Cell 2:

from pathlib import Path
import sys

Correct me if I'm wrong but the concatenated source code which Ruff would receive through Jupyterlab LSP would be:

from math import pi
import os
from pathlib import Path
import sys

The imports would be sorted as:

import os
import sys
from math import pi
from pathlib import Path

And then mapped back to the Notebook as:

Cell 1:

import os
import sys

Cell 2:

from math import pi
from pathlib import Path

Notice that the imports have been moved between cells. Ruff's native implementation wouldn't do that and only sort imports in each cell.

felix-cw · 2023-10-30T11:36:41Z

That's interesting. Are there any rules other than I001 where the diagnostic can span multiple cells?
As far as I know, right now jupyterlab-lsp doesn't support formatting/fixing so it can't move import statements between cells.
Is the concern that this will lead to a false-positive?

charliermarsh · 2023-10-30T13:32:06Z

@felix-cw - I001 is the only rule whose diagnostic can span multiple cells, but there are multiple rules whose fixes can span multiple cells (i.e., you may need to add an import to the top of the file in order to access a new standard library module for the code edit).

dhruvmanila · 2023-10-31T03:04:47Z

As far as I know, right now jupyterlab-lsp doesn't support formatting/fixing so it can't move import statements between cells.

Oh, interesting! Thanks for letting us know. So, it seems that they don't support code actions / formatting which are two main ways that we would edit a Notebook content. But, they do plan to support them in the future jupyter-lsp/jupyterlab-lsp#985.

Is the concern that this will lead to a false-positive?

I'm not exactly sure. My main concern is that I don't know how Jupyterlab LSP maps the location information between the actual Notebook source and the concatenated source. This would probably be updated when they plan to integrate editing cell content via code actions / formatting.

There does exists a documentation around their source map model (https://github.com/jupyter-lsp/jupyterlab-lsp/blob/main/docs/Architecture.ipynb) which seems to be mapping the (line, character) value between multiple types of document and concatenating the source code with 2 blank lines?

We could, theoretically, support diagnostics using any of the possible solutions mentioned in this issue but it might probably break when they add support for code actions.

krassowski · 2023-10-31T07:17:45Z

I think you do not need to worry about how jupyterlab-lsp handles mapping between cells and text content, I am sure this will work well. And yes there is a branch with code actions support but I cannot test it against ruff yet because it just rejects text content on the basis of extension ;)

…le extension (#8373) ## Summary This PR addresses the incompatibility with `jupyterlab-lsp` + `python-lsp-ruff` arising from the inference of source type from file extension, raised in #6847. In particular it follows the suggestion in #6847 (comment) to specify a mapping from file extension to source type. The source types are - python - pyi - ipynb Usage: ```sh ruff check --no-cache --stdin-filename Untitled.ipynb --extension ipynb:python ``` Unlike the original suggestion, `:` instead of `=` is used to associate file extensions to language since that is what is used with `--per-file-ignores` which is an existing option that accepts a mapping. ## Test Plan 2 tests added to `integration_test.rs` to ensure the override works as expected --------- Co-authored-by: Charlie Marsh <[email protected]>

dhruvmanila · 2023-11-08T02:43:27Z

Resolved in #8373

@felix-cw Do you mind providing the next steps required to make this integration work? Is it mostly on the user side? If so, I can update the documentation. Are there any changes required from either jupyterlab-lsp / python-lsp-ruff side?

felix-cw · 2023-11-08T10:31:55Z

Thanks for merging my pull request!

I think we need to add --extension ipynb:python to the call to ruff that python-lsp-ruff makes. I'm not sure about the side effects of this, when it is used outside of jupyterlab-lsp though, so it might need to be behind a config option. I'll reach out to the projects.

dhruvmanila · 2023-11-27T16:02:54Z

This is resolved when using v2.0.0 of python-lsp-ruff, big thanks to @felix-cw for driving this change forward 🥳

https://github.com/python-lsp/python-lsp-ruff/releases/tag/v2.0.0

charliermarsh added cli Related to the command-line interface needs-decision Awaiting a decision from a maintainer labels Aug 25, 2023

felix-cw mentioned this issue Oct 6, 2023

Add hidden --parser option to override the input format, rather than inferring it from the file extension #7835

Closed

krassowski mentioned this issue Oct 20, 2023

Set up problems with python-lsp-ruff jupyter-lsp/jupyterlab-lsp#1003

Closed

dhruvmanila mentioned this issue Oct 27, 2023

Add support for Jupyter Notebook astral-sh/ruff-lsp#264

Merged

17 tasks

felix-cw mentioned this issue Oct 31, 2023

Add hidden --extension to override inference of source type from file extension #8373

Merged

dhruvmanila removed the needs-decision Awaiting a decision from a maintainer label Nov 8, 2023

felix-cw mentioned this issue Nov 10, 2023

Add support for ruff extension option to enable jupyterlab-lsp python-lsp/python-lsp-ruff#59

Merged

dhruvmanila closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 #6847

Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 #6847

felix-cw commented Aug 24, 2023

dhruvmanila commented Aug 25, 2023

felix-cw commented Oct 3, 2023

zanieb commented Oct 3, 2023 •

edited

Loading

felix-cw commented Oct 3, 2023

charliermarsh commented Oct 3, 2023

zanieb commented Oct 3, 2023

felix-cw commented Oct 6, 2023

MichaReiser commented Oct 17, 2023

charliermarsh commented Oct 17, 2023

dhruvmanila commented Oct 17, 2023

MichaReiser commented Oct 17, 2023

charliermarsh commented Oct 20, 2023

zanieb commented Oct 20, 2023

krassowski commented Oct 20, 2023 •

edited

Loading

dhruvmanila commented Oct 23, 2023

felix-cw commented Oct 26, 2023

dhruvmanila commented Oct 27, 2023

felix-cw commented Oct 30, 2023

charliermarsh commented Oct 30, 2023

dhruvmanila commented Oct 31, 2023

krassowski commented Oct 31, 2023

dhruvmanila commented Nov 8, 2023

felix-cw commented Nov 8, 2023 •

edited

Loading

dhruvmanila commented Nov 27, 2023

Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 #6847

Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 #6847

Comments

felix-cw commented Aug 24, 2023

dhruvmanila commented Aug 25, 2023

Footnotes

felix-cw commented Oct 3, 2023

zanieb commented Oct 3, 2023 • edited Loading

felix-cw commented Oct 3, 2023

charliermarsh commented Oct 3, 2023

zanieb commented Oct 3, 2023

felix-cw commented Oct 6, 2023

MichaReiser commented Oct 17, 2023

charliermarsh commented Oct 17, 2023

dhruvmanila commented Oct 17, 2023

MichaReiser commented Oct 17, 2023

charliermarsh commented Oct 20, 2023

zanieb commented Oct 20, 2023

krassowski commented Oct 20, 2023 • edited Loading

dhruvmanila commented Oct 23, 2023

felix-cw commented Oct 26, 2023

dhruvmanila commented Oct 27, 2023

felix-cw commented Oct 30, 2023

charliermarsh commented Oct 30, 2023

dhruvmanila commented Oct 31, 2023

krassowski commented Oct 31, 2023

dhruvmanila commented Nov 8, 2023

felix-cw commented Nov 8, 2023 • edited Loading

dhruvmanila commented Nov 27, 2023

zanieb commented Oct 3, 2023 •

edited

Loading

krassowski commented Oct 20, 2023 •

edited

Loading

felix-cw commented Nov 8, 2023 •

edited

Loading