Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough draft of a Git remote helper to store Git repositories in OSF projects #100

Closed
wants to merge 3 commits into from

Conversation

mih
Copy link
Member

@mih mih commented Jun 21, 2020

It essentially copies and adjusts https://github.com/datalad/git-remote-rclone
in that it uses a local repo mirror to push and fetch refs to and from,
and uploads a compressed archive to .git/ of an OSF project that is
identified by a URL of type osf://<projectid>.

Because request latency is high, the entire repo is represented as two
files:

  • a small text file listing the refs in the repo
  • a 7z archive containing all of the actual content

Here is what it can do:

% mkdir newrepo
% cd newrepo
% git init
Initialized empty Git repository in /tmp/newrepo/.git/
% touch some
% git add some
% git commit -m initial
[master (root-commit) c552b2b] initial
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 some
% git remote add osf osf://vtha6
% git push --set-upstream osf master
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), done.
Building bitmaps: 100% (1/1), done.
Total 3 (delta 0), reused 0 (delta 0)
Computing commit graph generation numbers: 100% (1/1), done.
Upload repository archive
To osf://vtha6
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'osf'.

% cd ..
% git clone osf://vtha6 newrepoclone
Cloning into 'newrepoclone'...
fatal: bad revision 'HEAD'
100%|██████████████████████████████████████████████████| 83.0/83.0 [00:00<00:00, 519kbytes/s]
Downloading repository archive
100%|███████████████████████████████████████████████| 7.99k/7.99k [00:00<00:00, 1.04Mbytes/s]
Extracting repository archive

% git -C newrepoclone log -1 --oneline |cat
c552b2b initial
% git -C newrepo log -1 --oneline |cat
c552b2b initial

TODO:

  • there is substantial code overlap with https://github.com/datalad/git-remote-rclone
    that should be refactored, ideally
  • there is also some overlap with the special remote implementation
  • a clone yields an immediate fatal: bad revision 'HEAD' output,
    that seems to come before any of this code is executed, no idea where
    this is coming from

mih added 2 commits June 20, 2020 20:23
…rojects

It essentially copies and adjusts https://github.com/datalad/git-remote-rclone
in that it uses a local repo mirror to push and fetch refs to and from,
and uploads a compressed archive to `.git/` of an OSF project that is
identified by a URL of type `osf://<projectid>`.

Because request latency is high, the entire repo is represented as two
files:

- a small text file listing the refs in the repo
- a 7z archive containing all of the actual content

Here is what it can do:

```
% mkdir newrepo
% cd newrepo
% git init
Initialized empty Git repository in /tmp/newrepo/.git/
% touch some
% git add some
% git commit -m initial
[master (root-commit) c552b2b] initial
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 some
% git remote add osf osf://vtha6
% git push --set-upstream osf master
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), done.
Building bitmaps: 100% (1/1), done.
Total 3 (delta 0), reused 0 (delta 0)
Computing commit graph generation numbers: 100% (1/1), done.
Upload repository archive
To osf://vtha6
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'osf'.

% cd ..
% git clone osf://vtha6 newrepoclone
Cloning into 'newrepoclone'...
fatal: bad revision 'HEAD'
100%|██████████████████████████████████████████████████| 83.0/83.0 [00:00<00:00, 519kbytes/s]
Downloading repository archive
100%|███████████████████████████████████████████████| 7.99k/7.99k [00:00<00:00, 1.04Mbytes/s]
Extracting repository archive

% git -C newrepoclone log -1 --oneline |cat
c552b2b initial
% git -C newrepo log -1 --oneline |cat
c552b2b initial
```

TODO:

- there is substantial code overlap with https://github.com/datalad/git-remote-rclone
  that should refactored, ideally
- there is also some overlap with the special remote implementation
- a `clone` yields an immediate `fatal: bad revision 'HEAD'` output,
  that seems to come before any of this code is executed, no idea where
  this is coming from
Copy link
Member

@bpoldrack bpoldrack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I didn't actually try, but went through the code only. Just two remarks. Otherwise looks good to me!

self.log('Downloading repository archive')
repo_handle = [
f for f in self.osfstorage.files
if f.path == '/.git/repo.7z'][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This

repo_handle = [
    f for f in self.osfstorage.files
    if f.path == '/.git/repo.7z'][0]

was potentially done before (Line 184) when self.get_remote_state() was called. May be get_remote_state should return that handle, too, to allow to avoid sending yet another request?

datalad_osf/git_remote.py Show resolved Hide resolved
@bpoldrack bpoldrack mentioned this pull request Jun 23, 2020
5 tasks
@mih
Copy link
Member Author

mih commented Jun 26, 2020

So the fatal: bad revision 'HEAD' is an artifact of having the remote helper deployed with pip install -e .. Git must be doing some kind of fancy analysis, and tripping over the results:

/tmp % GIT_TRACE=2 GIT_TRACE_SETUP=2 git clone osf://some here2  
10:42:37.404238 git.c:442               trace: built-in: git clone osf://some here2
Cloning into 'here2'...
10:42:37.407438 run-command.c:663       trace: run_command: git-remote-osf origin osf://some
10:42:37.525559 trace.c:375             setup: git_dir: /tmp/here2/.git
10:42:37.525589 trace.c:376             setup: git_common_dir: /tmp/here2/.git
10:42:37.525592 trace.c:377             setup: worktree: /home/mih/hacking/datalad/osf
10:42:37.525595 trace.c:378             setup: cwd: /home/mih/hacking/datalad/osf
10:42:37.525598 trace.c:379             setup: prefix: (null)
10:42:37.525600 git.c:442               trace: built-in: git describe --tags --dirty --always --long --match '*'
10:42:37.525758 chdir-notify.c:65       setup: chdir from '/home/mih/hacking/datalad/osf' to '/home/mih/hacking/datalad/osf'
10:42:37.525765 chdir-notify.c:40       setup: reparent packed-refs to '/tmp/here2/.git/packed-refs'
10:42:37.525768 chdir-notify.c:40       setup: reparent files-backend $GIT_DIR to '/tmp/here2/.git'
10:42:37.525771 chdir-notify.c:40       setup: reparent files-backend $GIT_COMMONDIR to '/tmp/here2/.git'
fatal: bad revision 'HEAD'

It sets CWD and worktree to a directory that contains the source tree of the package which is installed in editable fashion. This has nothing to do with the actual clone source or target. I have no idea how or why is happens, but the issues goes away with a non-editable deployment via pip install ..

@bpoldrack
Copy link
Member

bpoldrack commented Jun 26, 2020

Hm. Thinking about this, since this smells like we should be aware of the underlying issue.

Is the virtualenv you're installing editable into underneath that repo? Because that would suggest to me, that the remote helper is running with its location being CWD, resulting in git calls from within the remote helper checking the .git upstairs.

Edit:
Nope. That's not it. Now leaning towards the entry point loading relying on where the module is to be found.

@mih
Copy link
Member Author

mih commented Jun 26, 2020

But note that all of the Git inference is happening before the remote helper actually runs. It still happens, if all the helper does is exit(0).

@bpoldrack
Copy link
Member

But note that all of the Git inference is happening before the remote helper actually runs. It still happens, if all the helper does is exit(0).

I agree, that it looks as if that was the case in the output above, but when I change the actual executable to not import anything but sys and os and let main just exit 0, this is what I get:

14:16:15.770777 git.c:439               trace: built-in: git clone osf://dfzsr newrepoclone
Cloning into 'newrepoclone'...
14:16:15.820198 run-command.c:663       trace: run_command: git-remote-osf origin osf://dfzsr

... and exits 128

@bpoldrack
Copy link
Member

And now just loading the entry point, but not actually calling it for comparison w/ above:

14:28:02.970483 git.c:439               trace: built-in: git clone osf://dfzsr newrepoclone
Cloning into 'newrepoclone'...
14:28:02.981453 run-command.c:663       trace: run_command: git-remote-osf origin osf://dfzsr
14:28:03.166739 trace.c:375             setup: git_dir: /tmp/newrepoclone/.git
14:28:03.166765 trace.c:376             setup: git_common_dir: /tmp/newrepoclone/.git
14:28:03.166769 trace.c:377             setup: worktree: /home/ben/work/proj/datalad/datalad-osf
14:28:03.166772 trace.c:378             setup: cwd: /home/ben/work/proj/datalad/datalad-osf
14:28:03.166774 trace.c:379             setup: prefix: (null)
14:28:03.166777 git.c:439               trace: built-in: git describe --tags --dirty --always --long --match '*'
14:28:03.166917 chdir-notify.c:65       setup: chdir from '/home/ben/work/proj/datalad/datalad-osf' to '/home/ben/work/proj/datalad/datalad-osf'
14:28:03.166924 chdir-notify.c:40       setup: reparent packed-refs to '/tmp/newrepoclone/.git/packed-refs'
14:28:03.166928 chdir-notify.c:40       setup: reparent files-backend $GIT_DIR to '/tmp/newrepoclone/.git'
14:28:03.166932 chdir-notify.c:40       setup: reparent files-backend $GIT_COMMONDIR to '/tmp/newrepoclone/.git'
fatal: bad revision 'HEAD'

So, it is happening not before the remote helper is executed, but before our code is executed: during loading the entry point via importlib. Dark magic.

@kyleam
Copy link
Collaborator

kyleam commented Jun 26, 2020

fatal: bad revision 'HEAD'

[...] I have no idea how or why is happens, but the issues goes away with a non-editable deployment via pip install ..

It's due to versioneer's logic that runs git underneath, in particular _version.git_pieces_from_vcs.

Here's a minimal example that triggers it.

demo
#!/bin/sh

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/git-remote-chdir-XXXXXXX)"
git init src
(
    cd src
    cat >setup.py <<'EOF'
from setuptools import setup
setup(name="foo",
      entry_points={"console_scripts": ["git-remote-foo=foo.bar:main"]})
EOF

    mkdir foo
    cat >foo/__init__.py <<'EOF'
import os
import subprocess
subprocess.run(["git", "describe", "--always"], cwd=os.path.dirname(__file__))
EOF

    cat >foo/bar.py <<'EOF'
def main():
    assert 0
EOF

    git add foo setup.py
    git commit -m'c0'
)

python3 -m venv ./env
. ./env/bin/activate
pip install -e src

export GIT_TRACE2=1
export GIT_TRACE_SETUP=1
git clone foo://b c

My understanding, based on looking into git rebase -x <tests that invoke git> failures in the past, is that the proper fix is something like this:

diff --git a/scratch.sh b/scratch.sh
index 3045667..a4faaba 100755
--- a/scratch.sh
+++ b/scratch.sh
@@ -16,6 +16,12 @@ EOF
     cat >foo/__init__.py <<'EOF'
 import os
 import subprocess
+
+out = subprocess.run(["git", "rev-parse", "--local-env-vars"],
+                     capture_output=True, encoding="utf8")
+for ev in out.stdout.splitlines():
+    os.environ.pop(ev, None)
+
 subprocess.run(["git", "describe", "--always"], cwd=os.path.dirname(__file__))
 EOF
 

Or in the context of _version.py:

diff --git a/datalad_osf/_version.py b/datalad_osf/_version.py
index 7ae8c4b..6d65063 100644
--- a/datalad_osf/_version.py
+++ b/datalad_osf/_version.py
@@ -225,7 +225,15 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
     if sys.platform == "win32":
         GITS = ["git.cmd", "git.exe"]
 
-    out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root,
+    local_envs, rc = run_command(GITS, ["rev-parse", "--local-env-vars"],
+                                 cwd=root, hide_stderr=True)
+    if rc != 0:
+        raise NotThisMethod("'git rev-parse --local-env-vars' returned error")
+    env = os.environ.copy()
+    for ev in local_envs.splitlines():
+        env.pop(ev, None)
+
+    out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root, env=env,
                           hide_stderr=True)
     if rc != 0:
         if verbose:
@@ -237,12 +245,12 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
     describe_out, rc = run_command(GITS, ["describe", "--tags", "--dirty",
                                           "--always", "--long",
                                           "--match", "%s*" % tag_prefix],
-                                   cwd=root)
+                                   cwd=root, env=env)
     # --long was added in git-1.5.5
     if describe_out is None:
         raise NotThisMethod("'git describe' failed")
     describe_out = describe_out.strip()
-    full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root)
+    full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root, env=env)
     if full_out is None:
         raise NotThisMethod("'git rev-parse' failed")
     full_out = full_out.strip()
@@ -294,12 +302,12 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
         # HEX: no tags
         pieces["closest-tag"] = None
         count_out, rc = run_command(GITS, ["rev-list", "HEAD", "--count"],
-                                    cwd=root)
+                                    cwd=root, env=env)
         pieces["distance"] = int(count_out)  # total number of commits
 
     # commit date: see ISO-8601 comment in git_versions_from_keywords()
     date = run_command(GITS, ["show", "-s", "--format=%ci", "HEAD"],
-                       cwd=root)[0].strip()
+                       cwd=root, env=env)[0].strip()
     pieces["date"] = date.strip().replace(" ", "T", 1).replace(" ", "", 1)
 
     return pieces

@mih
Copy link
Member Author

mih commented Jul 2, 2020

Thanks @kyleam ! I filed python-versioneer/python-versioneer#210

I will close this PR here to not add additional distraction on top. The content is included in #106 where I will also address the open point raised by @bpoldrack

@mih mih closed this Jul 2, 2020
mih added a commit that referenced this pull request Jul 2, 2020
@mih mih deleted the nf-gitremote branch June 9, 2023 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants