Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precommit workflows - clean py generation #446

Open
psychemedia opened this issue Feb 27, 2020 · 2 comments
Open

Precommit workflows - clean py generation #446

psychemedia opened this issue Feb 27, 2020 · 2 comments

Comments

@psychemedia
Copy link

There are a couple of issues relating to precommit workflows (eg #292) as well as "filtering" cell outputs (eg #337 (comment) ) and I was wondering if there are any demonstrated workflows that use a pre-commit hook to generate light python scripts with active-ipynb cells filtered out?

It strikes me there are a couple of issues with a naive set-up where you commit a cleaned py file into the directory you found it. eg if I am developing an .ipynb notebook with all sorts of active-ipynb cells that I don't want to commit, paired to a .py file that I am using as a module in another notebook, and run a precommit that strips it of active-ipynb cells, then the py file on disk will be more recently timestamped than the .ipynb file that generated its original form. As such, the notebook will warn of the more recent pile file and allow it to either be overwritten or overwrite the notebook (which would result in a loss of active-ipynb cells.

One way round this would be to develop in a src directory, and then commit to a release directory. A more hacky route would be to generate the cleaned py file, commit it, and then regenerate the complete paired py file. But that would be really messy, clunky, and downright horrible, right?

Associated with this, I still wonder: would it be useful to have a filter built in to jupytext that can filter out Jupytext tagged cells, eg active-ipynb cells, when used as a pre-commit hook?

@mwouts
Copy link
Owner

mwouts commented Mar 4, 2020

Oh, this is interesting!

First, maybe you could write a hook that updates the release file from the src file with something like this:

import jupytext
nb = jupytext.read('src/notebook.py')
nb.cells = [cell for cell in nb.cells if 'active-ipynb' not in cell.metadata.get('tags', [])]
jupytext.write(nb, 'release/notebook.py')

But your question also makes me think about the roles for the ipynb vs text files. Currently they are: input cells in the text file, and output cells in the ipynb file. When I read your comment I am wondering if you're not wanting: selected input cells in the text file, and unselected inputs cells, plus output cells, in the ipynb file... What do you think?

@psychemedia
Copy link
Author

psychemedia commented Mar 4, 2020

So... I want to use the notebook as a notebook... with half baked ideas in, lots of functions split out into multiple steps so I can look at the output of each step in a function in all its gory detail. And then the function itself, hopefully in a working state.

But if I do write a beautiful clean function, I don't want you to see all the horrible mess and broken bits and failed attempts that I've still got hanging around in my notebook.

So what I want to do is strip out all those horrible bits.

Also, when I'm testing my function in another notebook, I want to load it in as a simple py module; and when I do load it in to that other notebook, I don't want that other notebook to bork on trying to import it because of all the broken bits in the py file that derive from all manner of other code cells in the original ipynb.

A simple active-ipynb tag on my working/testing/doodle cells handles the second case, but I need something to provide a sanitised view of the notebook in the first case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants