Possibility to keep cell output when refactoring e.g. in PyCharm #464

stephenkraemer · 2020-03-20T12:05:30Z

First of all, thanks for this great tool, what a game changer!

Use case: I often switch between editing the python script of a notebook in PyCharm and the notebook itself in jupyterlab. A major use case is to use PyCharms refactoring capabilities, to rename variables etc. Refactorings may hit several notebooks at once, for example if I rename functions in modules used across the notebooks of an entire project.

The problem: If I go back from the script level to notebooks with jupytext --sync, the output of all cell affected by refactorings is lost - I end up with possibly several notebooks which have lost output in a few cells each, which means I need to rerun them all again before sharing with collaborators. However, since I am only doing basic refactoring, I am (sufficiently) sure that the outputs won't change.

Would it be possible to allow keeping cell outputs on user request, as long as some conditions are met (e.g. the number and order of cells does not change, just the content of cells)? Maybe this is already possible and I missed it - in this case sorry for the lengthy prelude!

Something similar is already possible with jupytext --pipe, but this functionality seems to be geared towards code formatters and similar tools.

I am also aware that PyCharm may allow directly refactoring notebooks in the future (https://youtrack.jetbrains.com/issue/PY-30784), however this will be a Pro feature if I am not mistaken, and it will also not cover users of other IDEs/editors.

Thank you!

mwouts · 2020-03-20T18:22:23Z

Hello @stephenkraemer, thank you for your kind words, much appreciated!

Yes, what you are looking for should be possible. The way it works, currently, is the following:

Jupytext reads the two notebooks: .py (without outputs) and .ipynb (with outputs)
The two notebooks are merged into one with the function combine_inputs_with_outputs in combine.py:
That fonction iterates on both the two series of cells, and when the input match (up to space characters and comma, i.e. changes that could incur under black), it adds the output to the notebook.

We could work on a new implementation that is more index than content oriented, but maybe keep a priority on content? Maybe we could loop another time and allocate the outputs for which the content did not match, based on index that time? What do you think?

stephenkraemer · 2020-03-23T07:03:36Z

Thanks for your quick response, help and interest! Off the top of my head it seems difficult to combine both content and index matching. But it may make the functionality more generally useful. If you are interested in getting a pitch for this, I would look into combine.py and get back to you. Unfortunately, it might take me a while to get to it, but I'd still be happy to give it a try!

mwouts · 2020-03-23T08:44:34Z

Hello @stephenkraemer, thanks! Well I don't want to take to much of your time. Unless you are really willing to try this out yourself (which you can, of course!), I think I should be able to propose something later this week, and you help would be very useful for the testing/feedback part. Is that ok with you?

stephenkraemer · 2020-03-23T14:12:41Z

Hi @mwouts, if you can have a look at this yourself that would of course be even better! I just did not want to ask for a new feature without offering my help. I am looking forward to testing or discussing anything in this direction!
Thanks a lot!

mwouts · 2020-03-29T13:16:10Z

Hello @stephenkraemer , I have a tentative implementation... Would you like to give a try to the PR #474 ? You can install that version locally with

pip install git+https://github.com/mwouts/jupytext.git@new_combine_rules_464

stephenkraemer · 2020-03-29T14:15:12Z

Hello @mwouts,
awesome, thanks a ton! I'll try it tomorrow and get back to you!

mwouts · 2020-03-30T17:56:01Z

Thank you @stephenkraemer. I will also be testing that version on my side for a few days - today at least everything went as expected!

stephenkraemer · 2020-03-31T16:36:18Z

Hi @mwouts! I am still very excited about this, just wanted to let you know that something came up outside of work and I am a bit behind schedule, so unfortunately there is no update from my side yet. But I'll have more opportunity to test this in the course of the week, and I am very much looking forward to it.

mwouts · 2020-04-01T06:22:57Z

No problem @stephenkraemer ! Take your time, and no horries, we don't have a deadline here 😄

mwouts · 2020-04-03T22:54:18Z

I've been using this for one weeks, and it seems to work fine.

mwouts · 2020-04-05T17:46:30Z

This is now available on pip: pip install jupytext==1.4.2. Please let me know if it works all right for you!

stephenkraemer · 2020-04-14T12:42:48Z

Hi @mwouts,
I am terribly sorry for the delay. This was due to the exceptional circumstances of the current times, and hopefully did not seem ungrateful.

Today, I have generated a few toy examples for my major use cases, which you have matched perfectly with your solution - awesome!:

Restructure notebooks: I pair the notebook with py, ipynb and md; then I use emacs to view the markdown structure and edit it. Afterwards, I use jupytext --sync. It worked great with my toy examples.
Refactor a notebook, without changing its structure (cell order and count): this is also really nice!
Refactor multiple notebooks and underlying library modules at the same time, without changing the notebook structure: refactoring a module in pycharm which is imported into several notebooks worked perfectly, without losing any output.

My current thoughts: I assume the algorithm would not be able to handle it predictably if I combined multiple actions at once, ie if I restructured the notebook, split cells and refactored it at the same time. This is totally fine for me, but might perhaps lead to confusing results for some users? I will look into this in more detail in the next days, and also use this "in production". If I find any case worth your attention, I'll report it here!

Thanks again very much!

mwouts · 2020-04-15T06:56:13Z

Thanks @stephenkraemer for your feedback! Really there is nothing to worry about, already you had contributed a very nice use case, and I was confident that the fix was mostly going to work...

I am glad to read that your first tests are positive! Indeed if you do multiple changes at once (i.e. refactor + reorder) you may create a more difficult situation for the matching algorithm and that could lead to lost outputs, or unexpected matches. Still I expect that if you sync frequently, you won't loose any output. Anyway please let me know what you find, that is always interesting, especially if you find something that can be improved.

mwouts added this to the 1.4.2 milestone Mar 23, 2020

mwouts mentioned this issue Mar 29, 2020

New rules for combining inputs with outputs #474

Merged

mwouts closed this as completed in #474 Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility to keep cell output when refactoring e.g. in PyCharm #464

Possibility to keep cell output when refactoring e.g. in PyCharm #464

stephenkraemer commented Mar 20, 2020 •

edited

Loading

mwouts commented Mar 20, 2020

stephenkraemer commented Mar 23, 2020

mwouts commented Mar 23, 2020

stephenkraemer commented Mar 23, 2020 •

edited

Loading

mwouts commented Mar 29, 2020

stephenkraemer commented Mar 29, 2020

mwouts commented Mar 30, 2020

stephenkraemer commented Mar 31, 2020

mwouts commented Apr 1, 2020

mwouts commented Apr 3, 2020

mwouts commented Apr 5, 2020

stephenkraemer commented Apr 14, 2020

mwouts commented Apr 15, 2020

Possibility to keep cell output when refactoring e.g. in PyCharm #464

Possibility to keep cell output when refactoring e.g. in PyCharm #464

Comments

stephenkraemer commented Mar 20, 2020 • edited Loading

mwouts commented Mar 20, 2020

stephenkraemer commented Mar 23, 2020

mwouts commented Mar 23, 2020

stephenkraemer commented Mar 23, 2020 • edited Loading

mwouts commented Mar 29, 2020

stephenkraemer commented Mar 29, 2020

mwouts commented Mar 30, 2020

stephenkraemer commented Mar 31, 2020

mwouts commented Apr 1, 2020

mwouts commented Apr 3, 2020

mwouts commented Apr 5, 2020

stephenkraemer commented Apr 14, 2020

mwouts commented Apr 15, 2020

stephenkraemer commented Mar 20, 2020 •

edited

Loading

stephenkraemer commented Mar 23, 2020 •

edited

Loading