Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-commit removes # -*- coding: utf-8 -*- pragma #907

Closed
stephanecollot opened this issue Jan 26, 2022 · 11 comments · Fixed by #915
Closed

pre-commit removes # -*- coding: utf-8 -*- pragma #907

stephanecollot opened this issue Jan 26, 2022 · 11 comments · Fixed by #915
Milestone

Comments

@stephanecollot
Copy link

Hello,

My CICD is running pre-commit run
And it wants to remove the first line of percent file generate with
jupytext --to py notebook.ipynb --set-formats "ipynb,py:percent"

The first line is:
# -*- coding: utf-8 -*-

Related to the following hook: https://github.com/pre-commit/pre-commit-hooks/blob/master/pre_commit_hooks/fix_encoding_pragma.py

I would like to know what is the best way to handle this issue? Remove the line? Add some setting in pre-commit or jupytext?

@mwouts
Copy link
Owner

mwouts commented Feb 2, 2022

Thank you @stephanecollot for reporting this.

My expectation is that this encoding line should be preserved by Jupytext.
But I have never tested this in the context of the pre-commit hook, so I'll add a test for that, and keep you updated.

Also I was curious to see if there is still a reason nowadays to declare explicitly the utf-8 encoding on Python files, and indeed on this SO question I see this comment:

It won't hurt to declare it, and some editors may use it.

@stephanecollot
Copy link
Author

Hi @mwouts

I looked more in details, and I was referring to the wrong hook, it is not fix-encoding-pragma, it is pyupgrade that create the issue.
It is run with this command: pre-commit run --all-files --show-diff-on-failure

And this config .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
    -   id: pyupgrade
        name: pyupgrade
        description: Automatically upgrade syntax for newer versions.
        entry: pyupgrade
        language: python
        types: [python]
        files: ''
        minimum_pre_commit_version: 0.15.0
        args: ['--py36-plus']

I looked into pyupgrade and they actually force to remove it, and there is no option to disable only this.

Relevant links:
https://github.com/asottile/pyupgrade#-coding--comment
asottile/pyupgrade#89
https://github.com/asottile/pyupgrade/pull/197/files

So maybe a better idea to remove it from jupytext, or add an option?

@mwouts
Copy link
Owner

mwouts commented Feb 4, 2022

Hi @stephanecollot , I am wondering if the issue you encounter here is not a particular case of a larger problem.
One thing that makes the jupytext pre-commit hook very special is that it acts on multiple files (at least when you use the --sync command).

I don't have experience with pyupgrade yet, but I believe that the right approach is to run pyupgrade also on the notebook.
Take for instance our example with jupytext --sync + black:

hooks:
- id: jupytext
args: [--sync, --pipe, black]
additional_dependencies:
- black==20.8b1 # Matches hook
- repo: https://github.com/psf/black
rev: 20.8b1
hooks:
- id: black

I think you'll have to do the same for pyupgrade, I mean:

  1. Make sure that the jupytext hook happens first (i.e. is located above) the pyupgrade hook
  2. Add something like --pipe, 'ipyupgrade ----py36-plus {}' (not tested yet) to the args entry of the jupytext hook
  3. Add ipyupgrade to the additional_dependencies of the jupytext hook, ideally in the same version as for your ipyupgrade hook (which you may want to keep for acting on non-paired .py files)

I am sorry I don't have time to test this atm, but hopefully next week I'll cover this with a new test (and if that turns out to solve the issue we will also add this to the documentation).

@mwouts mwouts added this to the 1.13.7 milestone Feb 4, 2022
@stephanecollot
Copy link
Author

stephanecollot commented Feb 6, 2022

Hi,
I'm not using the jupytext pre-commit hook.
I'm using the jupyter lab plugin with the following notebook meta data settings (I don't push notebook to git, because of sensitive data):

image

I tried to remove the line below the highlighted line, but not difference, it still add the encoding file in the py file after modification of the notebook.

@mwouts
Copy link
Owner

mwouts commented Feb 8, 2022

I see! To summarise,

  • In Jupyter Lab, the notebook has the "encoding" metadata in memory
  • But on disk, the pre-commit hook removed the encoding in the .py file

thus the two representations are not consistent.

My expectation is that, when you click on "save" in Jupyter Lab, you should get a warning that the notebook has changed on disk, and you are asked if you want to "reload" or "overwrite".

Is this the case?

Now, if you click on "reload", then the latest edit from the pre-commit hook will be reloaded and the encoding removed, but if you click on overwrite obviously you get the encoding back. So, since you want to take the edits from the pre-commit hook, you should use "reload", not "overwrite"... Does this make sense?

@stephanecollot
Copy link
Author

Yes exactly.

But every single time I modify the notebook the encoding is back.
And every single commit, I will get this warning, which is always a bit "scary", and because I don't want to loose the cell output locally.
I agree it is not a huge issue but it will make the experience smoother.
I might just disable pyupgrade on the notebook folder

@stephanecollot
Copy link
Author

stephanecollot commented Feb 8, 2022

I read your unit test quickly, nice.

But somehow I think it does work on my side.

  1. Remove the encoding line in the py file.
  2. Open the notebook file in jupyter lab
  3. Modify it
  4. Save it
  5. It is asking my to revert or overwrite. I click revert. The modification is removed.
  6. Modify notebook again (or even without any modification)
  7. Save it
  8. encoding is back again in the py file

jupytext version 1.13.6

@mwouts
Copy link
Owner

mwouts commented Feb 9, 2022

Thanks @stephanecollot for the additional inputs.

Rather than "revert", you need to "reload", to make sure that you get the version of the notebook without the encoding. I would have expected a "reload" button on the dialog when Jupyter detects that the file has changed on disk. Anyway, you can also click on "File -> Reload", or even close and reopen your notebook in Jupyter.

After a reload you should not have the jupytext.encoding metadata in the notebook any more (given that it was removed from the .py file)

because I don't want to loose the cell output locally

I guess you are using text notebooks. Why not use paired notebooks and ignore .ipynb files in your .gitignore? If you do so then cell outputs will persist when you reload the notebook.

@stephanecollot
Copy link
Author

stephanecollot commented Feb 9, 2022

Yes we use paired notebooks with this setup, sorry I should have made this clearer from the beginning.

Full disclosure: I just don't have the jupyterlab-jupytext extension that gives the context menu (I cannot install this extension at the moment due to our protected environment). Basically I create a notebook, I run jupytext --to py notebook.ipynb --set-formats "ipynb,py:percent" and then the pairing is working, meaning if I modify one the other one is updated and vice-versa, after every save.

So I tried everything, Revert, Overwrite, reload from disk, refresh reopening, and the encoding first line in the .py come back everytime after saving the ipynb.
Screenshot 2022-02-09 at 10 39 22

I also tried:

.1) Delete the ipynb.
.2) Modify the .py to remove the first encoding line, so it looks like that:

# ---
# jupyter:
#   jupytext:
#     cell_metadata_filter: -execution
#     custom_cell_magics: kql
#     formats: ipynb,py:percent
#     notebook_metadata_filter: -kernelspec
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.13.6
# ---

.3) Then to "retrigger" the pairing I click on this:
Screenshot 2022-02-09 at 11 19 54
.4) Modify and save the .py file from this notebook edition

.5) The ipynb is automatically recreated and its jupyter notebook metadata is:

Screenshot 2022-02-09 at 11 38 49

.6) Remove the "encoding" property in the metadata, save the ipynb

.7) The first encoding line is back in the .py file

@stephanecollot
Copy link
Author

stephanecollot commented Feb 9, 2022

I just look into the code and I saw:

try:
cell.source.encode("ascii")
except (UnicodeEncodeError, UnicodeDecodeError):
lines.append(comment + _UTF8_HEADER)
break

I just tried now with an empty notebook, and it works, no first encoding line added.
So It means have some non ascii characters in the notebook I was facing the issue with.

Sorry!

@mwouts
Copy link
Owner

mwouts commented Feb 9, 2022

Oh that's interesting - maybe it's time to remove this - I'll give it a try and keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants