-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable integration with pre-commit #698
Conversation
a0b7303
to
b5330b2
Compare
I was actually just about to make this PR myself that I mentioned back in #580, @johnpatton :). The untracked feature is nice, but I think its antipattern to have any pre-commit hooks modify the git index, particular because of how pre-commit infra does a git stash / git pop at the beginning/end of dirty repos. I worry this could cause errors and corrupt the repo (it should definitely complain about it, but the user should be forced to update the git index manually after the pre-commit hook as has run. |
The better way of doing this to default the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on the test suite, but ultimately I think this pre-commit hook is too narrow/specific and doesn't address other common use cases of JuPyText (like generating Google Colab notebooks from scripts).
@JohnPaton Happy to make the changes myself on this PR if given write access.
hooks: | ||
- id: jupytext | ||
args: [--to, py:percent, --pipe, black] | ||
- repo: https://github.com/psf/black |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary or dieal and could conflict with otehr black hooks. It should just be added as an additional_dependency
but the end user if needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just an informational example, it's very normal to have a black hook already present for your python code and if you want your generated code not to get into a hook fail loop, you have to pipe it to black
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is abusing the virtualenv of the pre-commit hook (it should only require the bear minimum requirements and other hooks if they absolutely build upon one another). It should really be installed using additional_depenencies if needed by the end user (see the blog post test I adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added black in additional_depenencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree with additional_dependencies
(I think there is a typo in additional_depenencies
).
Also you say you must ensure that all hooks will pass (...).
I understand that we are obliged to do so when we use --to
, but maybe if we use --sync
it's just "more convenient" as we need to run the hook just once, not twice. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohnPaton , what would you think of removing this example? I was trying to add a test for it, but we seem to have tricky issues - maybe we add that in another PR?
@requires_pre_commit
def test_pre_commit_hook_sync_black(tmpdir):
# get the path and revision of this repo, to use with pre-commit
repo_root = str(Path(__file__).parent.parent.resolve())
repo_rev = system("git", "rev-parse", "HEAD", cwd=repo_root).strip()
git = git_in_tmpdir(tmpdir)
# set up the tmpdir repo with pre-commit
pre_commit_config_yaml = dedent(
f"""
repos:
- repo: {repo_root}
rev: {repo_rev}
hooks:
- id: jupytext
args: [--sync, --pipe, black]
additional_dependencies:
- black==19.10b0 # Matches hook
- repo: https://github.com/psf/black
rev: 19.10b0
hooks:
- id: black
language_version: python3
"""
)
tmpdir.join(".pre-commit-config.yaml").write(pre_commit_config_yaml)
git("add", ".pre-commit-config.yaml")
with tmpdir.as_cwd():
pre_commit(["install", "--install-hooks"])
# write test notebook and output file
nb = new_notebook(
cells=[new_code_cell("1+1")],
metadata={
"jupytext": {"formats": "ipynb,py:percent", "main_language": "python"}
},
)
nb_file = tmpdir.join("test.ipynb")
py_file = tmpdir.join("test.py")
write(nb, str(nb_file))
git("add", ".")
# First attempt to commit fails with message
# "Output file test.py is not tracked in the git index"
with pytest.raises(SystemExit):
git("commit", "-m", "fails")
git("commit", "-m", "succeeds")
assert "test.ipynb" in git("ls-files")
assert "test.py" in git("ls-files")
# py_file should have been reformated by black
assert "\n1 + 1\n" in py_file.read_text()
docs/using-pre-commit.md
Outdated
@@ -27,35 +27,41 @@ Note that these hooks do not update the `.ipynb` notebook when you pull. Make su | |||
|
|||
## Using Jupytext with the pre-commit package manager | |||
|
|||
Using Jupytext with the [pre-commit package manager](https://pre-commit.com/) is another option. You could add the following to your `.pre-commit-config.yaml` file: | |||
``` | |||
Using Jupytext with the [pre-commit package manager](https://pre-commit.com/) is another option. You could add the following to your `.pre-commit-config.yaml` file to convert all staged notebooks to python scripts in `py:percent` format (the default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a use case where the old way of doing it actually preferred, and that's when you want to run locally installed code. This commit hook is great for a general use case where conversion is the only part of JuPyText you want to use, but execution requires being in the local sandbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this comment. Why does it matter if it's installed locally or in the sandbox for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requirements that can't be specified by pip install or are difficult to (like say the local master version of a pip package).
docs/using-pre-commit.md
Outdated
repos: | ||
- repo: local | ||
- repo: https://github.com/mwouts/jupytext | ||
rev: master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really should be pinned to a specific version or left for the user to specify. Otherwise it can cause a bunch of issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is an example.
.pre-commit-hooks.yaml
Outdated
language: python | ||
entry: jupytext --add-untracked | ||
pass_filenames: true | ||
files: ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of files why not use types: [jupyter]
like nbstripout uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work on my machine, probably I had an older pre-commit installed but I wasn't sure when it was added so I did it this way. But we will remove this and just pass all files, ignoring the ones that we aren't trying to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohnPaton, yeah I've run into this before. You need to run pip install -U identify
. Annoying that you can specify a min version of identify in the hook (you can for pre-commit though), but it is the recommended way to do it and will make it a lot easier for end users to override the behavior and support other types.
files: ipynb | |
types: [ipynb, md, rst, python] |
list of supported types: https://github.com/pre-commit/identify/blob/master/identify/extensions.py
docs/using-pre-commit.md
Outdated
language: system | ||
- id: jupytext | ||
args: [--to, py:light] | ||
- id: jupytext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have one hook id with --to py:light --to markdown
it's much more efficient.
(I agree with demonstrating multiple hooks, but this is not the use case for them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I didn't know Jupytext supported that syntax, thanks :)
Hi @Skylion007 , thanks for joining the conversation! Well with @JohnPaton and yourself in the thread I am going to learn a LOT about pre-commit hooks, I am looking forward to it! I don't know what you will think about that, but... I'd like to make the choices about this hook in a test-driven manner. I mean, I am thinking of a series of use cases that I'd like to see supported, like
In addition to these use cases, I'd like to document and test (IMO the main purpose of writing the blog is to test that this is going to be usable 😄) how to
Which are your own expectations? How big is the overlap with mine? Regarding the timing, I hope to have a draft for the blog by tomorrow or Friday, I'll share it with you both then, and then I hope to be able to iterate quite fast, within one or two weeks I mean. |
In regards to three use cases:
By default, pre-commit hooks will only run on edited files for speed. Having it run on all files / the entire configuration is easily doable though.
I'd say the overlap is quite large. |
John, Aaron, I've invited you to https://github.com/mwouts/pre_commit_hooks_for_jupyter_notebooks. The first thing that I'd like to chat about is this "test" (WIP as well):
I like the idea to use such a test to document explicitly our expectations. But at this stage I am not sure that |
@@ -0,0 +1,10 @@ | |||
- id: jupytext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a copy of this hook using the system language so that it runs off the system version of JuPytext, happy to write it @JohnPaton . We can even reuse the node by taking advantage of some nice advanced YAML features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need that? Users can just do (incomplete example)
- repo: local
hooks:
id: jupytext-local
language: system
entry: jupytext
if they want that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some suggestions
I've given you write access to the PR @Skylion007 - I've got a pretty busy few days ahead so I'll be a bit less active than I'd like, sorry in advance |
db9c36a
to
a25df97
Compare
Okay @mwouts @Skylion007, I've made some pretty big changes based on all the feedback (really feeling this collab guys, nice working with you 🏆) I removed
I have also removed default arguments beyond those two from the hook configuration, so that the user is forced to specify what they are trying to do. I still need to update the docs but wanted your feedback :) |
Hi @JohnPaton , @Skylion007 , thank you so much for your work here, I also enjoy very much this collaboration! I completely agree with the Regarding the Also, while trying the hook I saw a potential issue with the new build system for Jupytext - @JohnPaton you might not have seen it before because it was introduced just last week. I'll report on that at #706 . |
I also hit that build issue now @mwouts because I rebased 😬 |
Hmmm, but if they are not using |
Oh yes that's right... By the way I wanted to ask why you don't add Regarding the build issue, I'll try to address it tonight. Meanwhile a quick fix would be to comment out the part of |
rev: #CURRENT_TAG/COMMIT_HASH | ||
hooks: | ||
- id: jupytext | ||
args: [--from, ipynb, --to, py:light, --to, markdown] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry there can only be one value for --to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad for that suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have to update the doc - would you like to take 07b323b?
Almost there I think. The one last problem is that the hook seems to be updating both files in
|
Thank you @JohnPaton , you've done a hard work here! Well you're right, when the Now I'll try to explain the reason for that... it is rooted in the other usage of Jupytext - when Jupytext is used within Jupyter. In that context, the user may edit either the Maybe my preferred way to solve this would be, thus, to find a way to update the timestamp of the Another option is to assume that users of the pre-commit hook won't be using Jupytext's contents manager, and to skip the update of the unmodified file. Or maybe the problem will exist anyway ( @Skylion007 , did that an issue appear in the local version of the hook? Do you have any advice on how to address this? Thanks |
Yeah, to be honest the way I solve this is by deleting one of the files and
then rerunning it. It's not elegant, but it works. Yeah this is one of the
biggest issues with using it as a pre-commit hook currently. Updating the
timestamps of both files might be the way to go here. We can safely do that
now that we require_serial: True in our JuPyText hook.
We don't need to actually write the file btw, we can set the date modified
by using os.utime https://docs.python.org/3/library/os.html#os.utime
I prefer this solution as it still satisfies the JuPyText contents manager
and frankly both files should be updated when one is modified (even if the
update is a NOOP). This behavior could be changed by a flag (we may want to
group all these behaviors under a master flag pre-commit mode).
…On Mon, Jan 18, 2021 at 6:16 PM Marc Wouts ***@***.***> wrote:
Thank you @JohnPaton <https://github.com/JohnPaton> , you've done a hard
work here!
Well you're right, when the .py file is more recent than the .ipynb file,
--sync updates the .ipynb file and then updates (the timestamp) of the .py
file.
Now I'll try to explain the reason for that... it is rooted in the other
usage of Jupytext - when Jupytext is used within Jupyter. In that context,
the user may edit either the .py file or the .ipynb file. We always take
the .py file as the source for inputs, and we *throw an error* (we ask
the user to delete one of the two files) if the .py file is older than
the .ipynb file. This works (and is compatible with --sync, that was
developed later on) as Jupytext always writes the .ipynb file before the
.py version.
Maybe my preferred way to solve this would be, thus, to find a way to
update the timestamp of the .py file (currently done at
https://github.com/mwouts/jupytext/blob/master/jupytext/cli.py#L691-L693
by actually writing the file) that does not attract the attention of
pre-commit. Do you think this is an option?
Another option is to assume that users of the pre-commit hook won't be
using Jupytext's contents manager, and to skip the update of the unmodified
file.
Or maybe the problem will exist anyway (.ipynb more recent than .py
especially on large files) when we pull a commit from a remote? If so,
maybe we could skip the update of the unmodified file, and use a
post-commit hook to finally update the (timestamp) of the .py file to
make it more recent than the .ipynb?
@Skylion007 <https://github.com/Skylion007> , did that an issue appear in
the local version of the hook? Do you have any advice on how to address
this? Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#698 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPVMX2PHQO3KGFFZ765P4TS2S6LFANCNFSM4VOSIZLQ>
.
|
Yes! I'll try to do that tonight (don't rewrite the |
If it was only timestamps, it would notify that the timestamp is being updated, right? I don't see any "[jupytext] Sync timestamp of '{}'".format(nb_file) line in my output above |
I haven't checked, but I believe pre-commit uses git to see which files don't match the index. It looks like git doesn't store any metadata about the file like ACLs or modified time so that might work: https://stackoverflow.com/a/2179825 https://github.com/pre-commit/pre-commit/blob/bf85379619e4aa5019412f3f3237a8c3c3e225e3/pre_commit/commands/run.py#L199 |
Hi @JohnPaton , maybe this commit 22d6961 can help? It generalizes the timestamp update (already there for the |
I'm afraid that wasn't enough either, it seems like both files are still being modified
Not sure if this matters: |
Hi @JohnPaton , yes you're right the issue is with the fact that Here are a few workarounds that I am thinking of...
I think that 1. is clearly the simple approach. And FYI I might also consider 3 because I think it will be required in case of triple-paired notebooks ( |
Alright, let's go with option 1 for now then! Then we can get this bad boy merged 🚀 |
Relevant tests will be skipped in that case
Hi @JohnPaton , @Skylion007 , we're getting ready! I am done with the review and I'd like to discuss a few details:
|
@mwouts Squashing commits will preserve co-authorship if done correctly through the Github API (via the co-author tag in the commit description). Just leaving the default verbose commit description should preserve everything.
I'd much prefer 3 in the long term. It's sane behavior. |
Done! Thank you so much @JohnPaton and @Skylion007 . I am so happy to see this merged! What an impressive work! I will create a follow-up issue to keep track of the few improvements that we have identified above, and also try to make progress on the blog post... I'll keep you posted! |
This is a first attempt at an implementation of the suggestion in #691 to enable "proper" integration with the pre-commit framework. The changes summarized:
.pre-commit-hooks.yaml
file, which tellspre-commit
how to install this package (pre-commit does this internally), which command to run, and what the default options are--add-untracked
, which adds any output files that weren't previously tracked bygit
to the index. Two points to note here:pre-commit
doesn't consider changes to them if they aren't in the index (pre-commit requires all files in the index be unchanged by all hooks before passing)The net effect is a proper pre-commit hook, which will only allow a commit as long as the configured Jupytext command doesn't change any of the files in the index, or add any new ones. This is how pre-commit's hooks are expected to operate.
Questions:
--add-untracked
option so publicly?