Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to keep cell output when refactoring e.g. in PyCharm #464

Closed
stephenkraemer opened this issue Mar 20, 2020 · 13 comments · Fixed by #474
Closed

Possibility to keep cell output when refactoring e.g. in PyCharm #464

stephenkraemer opened this issue Mar 20, 2020 · 13 comments · Fixed by #474
Milestone

Comments

@stephenkraemer
Copy link

stephenkraemer commented Mar 20, 2020

First of all, thanks for this great tool, what a game changer!

Use case: I often switch between editing the python script of a notebook in PyCharm and the notebook itself in jupyterlab. A major use case is to use PyCharms refactoring capabilities, to rename variables etc. Refactorings may hit several notebooks at once, for example if I rename functions in modules used across the notebooks of an entire project.

The problem: If I go back from the script level to notebooks with jupytext --sync, the output of all cell affected by refactorings is lost - I end up with possibly several notebooks which have lost output in a few cells each, which means I need to rerun them all again before sharing with collaborators. However, since I am only doing basic refactoring, I am (sufficiently) sure that the outputs won't change.

Would it be possible to allow keeping cell outputs on user request, as long as some conditions are met (e.g. the number and order of cells does not change, just the content of cells)? Maybe this is already possible and I missed it - in this case sorry for the lengthy prelude!

Something similar is already possible with jupytext --pipe, but this functionality seems to be geared towards code formatters and similar tools.

I am also aware that PyCharm may allow directly refactoring notebooks in the future (https://youtrack.jetbrains.com/issue/PY-30784), however this will be a Pro feature if I am not mistaken, and it will also not cover users of other IDEs/editors.

Thank you!

@mwouts
Copy link
Owner

mwouts commented Mar 20, 2020

Hello @stephenkraemer, thank you for your kind words, much appreciated!

Yes, what you are looking for should be possible. The way it works, currently, is the following:

  • Jupytext reads the two notebooks: .py (without outputs) and .ipynb (with outputs)
  • The two notebooks are merged into one with the function combine_inputs_with_outputs in combine.py:
  • That fonction iterates on both the two series of cells, and when the input match (up to space characters and comma, i.e. changes that could incur under black), it adds the output to the notebook.

We could work on a new implementation that is more index than content oriented, but maybe keep a priority on content? Maybe we could loop another time and allocate the outputs for which the content did not match, based on index that time? What do you think?

@stephenkraemer
Copy link
Author

Thanks for your quick response, help and interest! Off the top of my head it seems difficult to combine both content and index matching. But it may make the functionality more generally useful. If you are interested in getting a pitch for this, I would look into combine.py and get back to you. Unfortunately, it might take me a while to get to it, but I'd still be happy to give it a try!

@mwouts
Copy link
Owner

mwouts commented Mar 23, 2020

Hello @stephenkraemer, thanks! Well I don't want to take to much of your time. Unless you are really willing to try this out yourself (which you can, of course!), I think I should be able to propose something later this week, and you help would be very useful for the testing/feedback part. Is that ok with you?

@mwouts mwouts added this to the 1.4.2 milestone Mar 23, 2020
@stephenkraemer
Copy link
Author

stephenkraemer commented Mar 23, 2020

Hi @mwouts, if you can have a look at this yourself that would of course be even better! I just did not want to ask for a new feature without offering my help. I am looking forward to testing or discussing anything in this direction!
Thanks a lot!

@mwouts
Copy link
Owner

mwouts commented Mar 29, 2020

Hello @stephenkraemer , I have a tentative implementation... Would you like to give a try to the PR #474 ? You can install that version locally with

pip install git+https://github.com/mwouts/jupytext.git@new_combine_rules_464

@stephenkraemer
Copy link
Author

Hello @mwouts,
awesome, thanks a ton! I'll try it tomorrow and get back to you!

@mwouts
Copy link
Owner

mwouts commented Mar 30, 2020

Thank you @stephenkraemer. I will also be testing that version on my side for a few days - today at least everything went as expected!

@stephenkraemer
Copy link
Author

Hi @mwouts! I am still very excited about this, just wanted to let you know that something came up outside of work and I am a bit behind schedule, so unfortunately there is no update from my side yet. But I'll have more opportunity to test this in the course of the week, and I am very much looking forward to it.

@mwouts
Copy link
Owner

mwouts commented Apr 1, 2020

No problem @stephenkraemer ! Take your time, and no horries, we don't have a deadline here 😄

@mwouts
Copy link
Owner

mwouts commented Apr 3, 2020

I've been using this for one weeks, and it seems to work fine.

@mwouts
Copy link
Owner

mwouts commented Apr 5, 2020

This is now available on pip: pip install jupytext==1.4.2. Please let me know if it works all right for you!

@stephenkraemer
Copy link
Author

Hi @mwouts,
I am terribly sorry for the delay. This was due to the exceptional circumstances of the current times, and hopefully did not seem ungrateful.

Today, I have generated a few toy examples for my major use cases, which you have matched perfectly with your solution - awesome!:

  1. Restructure notebooks: I pair the notebook with py, ipynb and md; then I use emacs to view the markdown structure and edit it. Afterwards, I use jupytext --sync. It worked great with my toy examples.
  2. Refactor a notebook, without changing its structure (cell order and count): this is also really nice!
  3. Refactor multiple notebooks and underlying library modules at the same time, without changing the notebook structure: refactoring a module in pycharm which is imported into several notebooks worked perfectly, without losing any output.

My current thoughts: I assume the algorithm would not be able to handle it predictably if I combined multiple actions at once, ie if I restructured the notebook, split cells and refactored it at the same time. This is totally fine for me, but might perhaps lead to confusing results for some users? I will look into this in more detail in the next days, and also use this "in production". If I find any case worth your attention, I'll report it here!

Thanks again very much!

@mwouts
Copy link
Owner

mwouts commented Apr 15, 2020

Thanks @stephenkraemer for your feedback! Really there is nothing to worry about, already you had contributed a very nice use case, and I was confident that the fix was mostly going to work...

I am glad to read that your first tests are positive! Indeed if you do multiple changes at once (i.e. refactor + reorder) you may create a more difficult situation for the matching algorithm and that could lead to lost outputs, or unexpected matches. Still I expect that if you sync frequently, you won't loose any output. Anyway please let me know what you find, that is always interesting, especially if you find something that can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants