-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for using cell ID in diffing and merging #639
Conversation
Markdown and raw cells do not have execution count prompts: The cell IDs tend to be long and non-user-readable. The would likely fit next to the prompt: Is this what you had in mind? I can send a PR against this one. What would you say about only showing those when hovering over the cell to reduce visual clutter?
Another common habit which will lead to this is when working with e.g. visualisation, if a user duplicates the cell, modifies the code, runs the other cell, compares visualisation results and depending on satisfaction removes one or the other. I was wondering whether it should be: if cell_a_id == cell_b_id:
return True rather than: return cell_a_id == cell_b_id I guess this would have much less of a performance benefit (i.e. only benefit if notebook structure is largely unchanged). |
I opened vidartf#3 implementing the UI changes exposing the cell IDs. I think that these could be improved iteratively later. I had also measured the execution time to understand how much improvement this PR brings. In a notebook of 1000 code cells (10 unique cells duplicated 100 times), benchmarked with
This is pretty impressive, especially for the case of a notebook where content of each cell was modified. |
With vidartf#3 this case would be seen as: |
@krassowski I changed how ID comparison is done now. Now it simply uses ID equality as the first pass (any cells that have the same ID will be considered equal, and used for the first pass of creating coherent snakes). It should be more consistent with current behavior when diffing/merging notebooks without IDs (or partial transitions). I'm pretty confident that this should not affect the performance for notebooks with cell IDs, but it might affect performance for mixed notebooks (not sure if positively or negatively). Note that both the current and previous implementation allows for cells to be considered equal even if they have different ID fields. @minrk suggested at JupyterCon that this should be safe in the case where no cell IDs match each on either side, but it might not for other scenarios, so maybe that should be the only scenario where the non-ID comparison is invoked? I.e. if a base notebook with IDs branch, and two sides each add the same exact cell other than the ID, nbdime should not consider them equal. Formulated as code: if {a.id for a in notebook_A} & {b.id for b in notebook_B}: # check for intersection of IDs
notebook_predicates['cells'] = [ compare_cell_by_ids ] # only ID comparison is considered
else:
notebook_predicates['cells'] = [ # IDs have no impact here, so ignore and fall back on old implementation
compare_cell_approximate,
compare_cell_moderate,
compare_cell_strict,
] I would be very happy to have opinions on either of the above points! |
These arguments are probably why I would tilt for the current behavior of allowing non-ID identical cells to match, but there might be counter-argument that I haven't considered(?). |
For the most part this is true, except for not very realistic scenario of a notebook with cells randomly reshuffled, in which case there is still a performance boost by a factor of two (rather than six as before).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether there is anything else you would like to do here, but it seems good to go for me. If you believe this needs further user testing, maybe it would be a good idea to merge it together with JupyterLab 4.0 port and release an alpha/beta for more users to test out?
I was waiting on input on vidartf#3, but I can merge and iterate on that PR. |
0c721b5
to
72dd1bd
Compare
bot please update playwright snapshots |
This works for notebook diff (tested on example8) but not on the merge view (tested on example8).
|
Playwright windows-latest snapshots updated. |
An object that encapsulates differs and predicates, and also the new "is_atomic", which is added so that we can mark e.g. cell IDs as atomic. DiffConfig could potentially also be used in the future to better config notebook ignores etc?
When comparing identical sources we can avoid a full diff by a quick equality check.
Avoid recursing for string mimetypes that are identical
In the strictest check, cells without ID is never considered equal to cells with IDs, but they can be in the two less-strict comparisons. I.e. if a notebook ends up with only some cells having IDs, and then IDs are added to those cells later, then in the first pass the existing cells that all had IDs are first matched to each other, and only in the next passes are the cells that gained IDs compared with their old ID-less versions.
Always use line based diffing for source.
Enable previously not working test
Note: While I have a fix for the failure locally, we still don't have a way to visualize a conflict on cell IDs. The changes in vidartf#3 only display the change in the diff case. I can add a placeholder for now: "conflict on cell ID, use raw text editor to resolve". |
Once you push your changes I can thread the cell IDs in the merge case too, though I am not sure on the design. |
Good for collecting debug data on merge decisions.
@krassowski pushed changes here, I expect some UI test will need re-rendering, but I will let you have a look at improvements first (or maybe in follow-up PR, as branches here seem to be getting rebased, so new PRs are probably less messy). |
Adds tests for the recently added ability to output merge decisions as raw JSON. As the tests indicate, this is a breaking behavior, but this is a major version release, so should be ok.
bot please update playwright snapshots |
Playwright windows-latest snapshots updated. |
Playwright ubuntu-22.04 snapshots updated. |
This is needed to have download test work (assumes no conflicts).
bot please update playwright snapshots |
Playwright ubuntu-22.04 snapshots updated. |
Playwright windows-latest snapshots updated. |
Click should only register as final after mouse up with primary button.
@krassowski merged. Let's do any further improvement for cell ID conflicts display/resolution in a follow-up PR. |
Fixes #553.
Previous cell ID efforts have been focused on having them not have nbdime fall over, and make sure nbdime doesn't drop them completely (#566). This PR makes nbdime start using the IDs to make better diffs! While the PR does a bunch of extra things as well (some optimization, some refactoring to have nbdime treat the id field as atomic, some test fixes, improving the
union
merge strategy), the main logical changes for the cell IDs are relatively small (the changes tocompare_cell_X
functions in nbdime/diffing/notebooks.py). To break it down:I want to run some extra manual tests on the change of the differ for the source field, so leaving as draft for now. We also need to have a way to surface the cell ID in the web apps. Maybe include it in the [N] input prompt? This will be useful e.g. if a user has removed an empty cell at the bottom of a notebook, then automatically readded it by executing the last cell with Shift + Enter. They will look identical in the web UI, but nbdime will now insist that an empty cell was removed, and a new one added.