-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rough draft of a Git remote helper to store Git repositories in OSF projects #100
Conversation
…rojects It essentially copies and adjusts https://github.com/datalad/git-remote-rclone in that it uses a local repo mirror to push and fetch refs to and from, and uploads a compressed archive to `.git/` of an OSF project that is identified by a URL of type `osf://<projectid>`. Because request latency is high, the entire repo is represented as two files: - a small text file listing the refs in the repo - a 7z archive containing all of the actual content Here is what it can do: ``` % mkdir newrepo % cd newrepo % git init Initialized empty Git repository in /tmp/newrepo/.git/ % touch some % git add some % git commit -m initial [master (root-commit) c552b2b] initial 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 some % git remote add osf osf://vtha6 % git push --set-upstream osf master Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Writing objects: 100% (3/3), done. Building bitmaps: 100% (1/1), done. Total 3 (delta 0), reused 0 (delta 0) Computing commit graph generation numbers: 100% (1/1), done. Upload repository archive To osf://vtha6 * [new branch] master -> master Branch 'master' set up to track remote branch 'master' from 'osf'. % cd .. % git clone osf://vtha6 newrepoclone Cloning into 'newrepoclone'... fatal: bad revision 'HEAD' 100%|██████████████████████████████████████████████████| 83.0/83.0 [00:00<00:00, 519kbytes/s] Downloading repository archive 100%|███████████████████████████████████████████████| 7.99k/7.99k [00:00<00:00, 1.04Mbytes/s] Extracting repository archive % git -C newrepoclone log -1 --oneline |cat c552b2b initial % git -C newrepo log -1 --oneline |cat c552b2b initial ``` TODO: - there is substantial code overlap with https://github.com/datalad/git-remote-rclone that should refactored, ideally - there is also some overlap with the special remote implementation - a `clone` yields an immediate `fatal: bad revision 'HEAD'` output, that seems to come before any of this code is executed, no idea where this is coming from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I didn't actually try, but went through the code only. Just two remarks. Otherwise looks good to me!
self.log('Downloading repository archive') | ||
repo_handle = [ | ||
f for f in self.osfstorage.files | ||
if f.path == '/.git/repo.7z'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
repo_handle = [
f for f in self.osfstorage.files
if f.path == '/.git/repo.7z'][0]
was potentially done before (Line 184) when self.get_remote_state()
was called. May be get_remote_state
should return that handle, too, to allow to avoid sending yet another request?
So the
It sets CWD and worktree to a directory that contains the source tree of the package which is installed in editable fashion. This has nothing to do with the actual clone source or target. I have no idea how or why is happens, but the issues goes away with a non-editable deployment via |
Hm. Thinking about this, since this smells like we should be aware of the underlying issue. Is the virtualenv you're installing editable into underneath that repo? Because that would suggest to me, that the remote helper is running with its location being CWD, resulting in git calls from within the remote helper checking the .git upstairs. Edit: |
But note that all of the Git inference is happening before the remote helper actually runs. It still happens, if all the helper does is |
I agree, that it looks as if that was the case in the output above, but when I change the actual executable to not import anything but
... and exits 128 |
And now just loading the entry point, but not actually calling it for comparison w/ above:
So, it is happening not before the remote helper is executed, but before our code is executed: during loading the entry point via importlib. Dark magic. |
It's due to versioneer's logic that runs git underneath, in particular Here's a minimal example that triggers it. demo#!/bin/sh
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/git-remote-chdir-XXXXXXX)"
git init src
(
cd src
cat >setup.py <<'EOF'
from setuptools import setup
setup(name="foo",
entry_points={"console_scripts": ["git-remote-foo=foo.bar:main"]})
EOF
mkdir foo
cat >foo/__init__.py <<'EOF'
import os
import subprocess
subprocess.run(["git", "describe", "--always"], cwd=os.path.dirname(__file__))
EOF
cat >foo/bar.py <<'EOF'
def main():
assert 0
EOF
git add foo setup.py
git commit -m'c0'
)
python3 -m venv ./env
. ./env/bin/activate
pip install -e src
export GIT_TRACE2=1
export GIT_TRACE_SETUP=1
git clone foo://b c My understanding, based on looking into diff --git a/scratch.sh b/scratch.sh
index 3045667..a4faaba 100755
--- a/scratch.sh
+++ b/scratch.sh
@@ -16,6 +16,12 @@ EOF
cat >foo/__init__.py <<'EOF'
import os
import subprocess
+
+out = subprocess.run(["git", "rev-parse", "--local-env-vars"],
+ capture_output=True, encoding="utf8")
+for ev in out.stdout.splitlines():
+ os.environ.pop(ev, None)
+
subprocess.run(["git", "describe", "--always"], cwd=os.path.dirname(__file__))
EOF
Or in the context of _version.py: diff --git a/datalad_osf/_version.py b/datalad_osf/_version.py
index 7ae8c4b..6d65063 100644
--- a/datalad_osf/_version.py
+++ b/datalad_osf/_version.py
@@ -225,7 +225,15 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if sys.platform == "win32":
GITS = ["git.cmd", "git.exe"]
- out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root,
+ local_envs, rc = run_command(GITS, ["rev-parse", "--local-env-vars"],
+ cwd=root, hide_stderr=True)
+ if rc != 0:
+ raise NotThisMethod("'git rev-parse --local-env-vars' returned error")
+ env = os.environ.copy()
+ for ev in local_envs.splitlines():
+ env.pop(ev, None)
+
+ out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root, env=env,
hide_stderr=True)
if rc != 0:
if verbose:
@@ -237,12 +245,12 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
describe_out, rc = run_command(GITS, ["describe", "--tags", "--dirty",
"--always", "--long",
"--match", "%s*" % tag_prefix],
- cwd=root)
+ cwd=root, env=env)
# --long was added in git-1.5.5
if describe_out is None:
raise NotThisMethod("'git describe' failed")
describe_out = describe_out.strip()
- full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root)
+ full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root, env=env)
if full_out is None:
raise NotThisMethod("'git rev-parse' failed")
full_out = full_out.strip()
@@ -294,12 +302,12 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
# HEX: no tags
pieces["closest-tag"] = None
count_out, rc = run_command(GITS, ["rev-list", "HEAD", "--count"],
- cwd=root)
+ cwd=root, env=env)
pieces["distance"] = int(count_out) # total number of commits
# commit date: see ISO-8601 comment in git_versions_from_keywords()
date = run_command(GITS, ["show", "-s", "--format=%ci", "HEAD"],
- cwd=root)[0].strip()
+ cwd=root, env=env)[0].strip()
pieces["date"] = date.strip().replace(" ", "T", 1).replace(" ", "", 1)
return pieces |
Thanks @kyleam ! I filed python-versioneer/python-versioneer#210 I will close this PR here to not add additional distraction on top. The content is included in #106 where I will also address the open point raised by @bpoldrack |
Pointed out by @bpoldrack in #100 (comment)
It essentially copies and adjusts https://github.com/datalad/git-remote-rclone
in that it uses a local repo mirror to push and fetch refs to and from,
and uploads a compressed archive to
.git/
of an OSF project that isidentified by a URL of type
osf://<projectid>
.Because request latency is high, the entire repo is represented as two
files:
Here is what it can do:
TODO:
that should be refactored, ideally
clone
yields an immediatefatal: bad revision 'HEAD'
output,that seems to come before any of this code is executed, no idea where
this is coming from