Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use with git gives messed git diffs #130

Open
grzegorz700 opened this issue Oct 18, 2024 · 3 comments
Open

Use with git gives messed git diffs #130

grzegorz700 opened this issue Oct 18, 2024 · 3 comments

Comments

@grzegorz700
Copy link

grzegorz700 commented Oct 18, 2024

When we use this extension with git-based sytems it produces 10(5*2) lines of diffs per every changed cell.

   "metadata": {
    "execution": {
  -   "iopub.execute_input": "2024-10-14T13:10:30.905308Z",
  -   "iopub.status.busy": "2024-10-14T13:10:30.904740Z",
  -   "iopub.status.idle": "2024-10-14T13:10:30.908169Z",
  -   "shell.execute_reply": "2024-10-14T13:10:30.907722Z",
  -   "shell.execute_reply.started": "2024-10-14T13:10:30.905290Z"
  +   "iopub.execute_input": "2024-10-14T19:16:26.414571Z",
  +   "iopub.status.busy": "2024-10-14T19:16:26.413960Z",
  +   "iopub.status.idle": "2024-10-14T19:16:26.417570Z",
  +   "shell.execute_reply": "2024-10-14T19:16:26.417137Z",
  +   "shell.execute_reply.started": "2024-10-14T19:16:26.414551Z"
    }
   },

This problem is well known without any perfect solution. Based on the many reference solutions, including the list from stackoverflow and a good advice jupyterlab/jupyterlab#9444 (comment) and other stackoverflow solutions. I propose my partial workaround setup.

Partial workaround:

We could use this extension without massive diffs is based on two stages:

Prevent from pushing to git.

  1. Create or Edit a .gitattributes file in the root of your repository:
touch .gitattributes
  1. Add the following line to the .gitattributes file:
*.ipynb filter=clean_meta_ipynb
  1. Run:
git config filter.clean_meta_ipynb.clean "jupyter nbconvert --to notebook --stdin --stdout --ClearMetadataPreprocessor.enabled=True"

or

git config filter.clean_meta_ipynb.clean "nbstripout-fast --keep-output --keep-count --textconv"

Prevent from displaying diffs in jupyterlab-git:

  1. Check where are your nbtime configs (with file name nbdime_config.json):
 jupyter --paths
  1. Create or update your nbdime_config.json (e.g. ~/.jupyter/nbdime_config.json)
  2. Add the following lines to them:
{
    "NbDiff": {
      "Ignore": {
        "/metadata": true,
        "/cells/*/metadata": true
      }
    },
    "Extension": {
      "Ignore": {
        "/metadata": true,
        "/cells/*/metadata": true
      }
    }
  }

or we could try with the more precise exclusion like: "/cells/*/metadata":['execution'].
4. Restart jupyter lab

Drawbacks:

  • we lose the execution time info when, for example, we revert git files (we don't track them in git or anywhere).
  • longer staging or differentiation time (fixed with nbstripout-fast)

I put it that solution, especially for people who want to use this extension without the need to remove other info from notebooks (e.g. outputs).

However, I would love to see a better solution.

@mlucool
Copy link
Member

mlucool commented Oct 18, 2024

This question is maybe better focused outside this plugin, but have you tried https://github.com/deshaw/nbstripout-fast? With this, nbdime does not show timestamps diff nor commit them.

@grzegorz700
Copy link
Author

Thank you for the library reference. nbstripout-fast is good for this purpose to speed it up, and remove one of the drawbacks of this partial solution. I'll update the post with the second quicker solution as well. It's hard to say/find the best place to track this type of problem. However, most discussions cover mostly removing outputs from cells, not metadata. This extension produces a substantial amount of frequently changing metadata, so I've decided to put this problem with my partial solution here.

Feel free to close this issue, if you want. I wrote my post to help others because I didn't find any good solution in other places/issues/stacks to this particular sub-problem. So now, I hope it'll be possible to find it easier, regardless of whether it is closed or not.

@mlucool
Copy link
Member

mlucool commented Oct 23, 2024

this extension produces a substantial amount of frequently changing metadata

This extension produces no metadata actually - its just a renderer. For simplicity, we simply turn on an option to produce it in JupyterLab itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants