Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc.api: doesn't work with repo with SSH URL #3111

Closed
jorgeorpinel opened this issue Jan 11, 2020 · 7 comments · Fixed by #3166
Closed

dvc.api: doesn't work with repo with SSH URL #3111

jorgeorpinel opened this issue Jan 11, 2020 · 7 comments · Fixed by #3166
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do

Comments

@jorgeorpinel
Copy link
Contributor

$ dvc version
DVC version: 0.80.0
Python version: 3.7.6
Platform: Darwin-19.0.0-x86_64-i386-64bit
Binary: False
Package: brew

This script

import csv
import dvc.api

with dvc.api.open(
  "sea_ice.csv",
  repo="[email protected]:iterative/df_sea_ice_no_header.git"
) as fd:
  reader = csv.reader(fd)
  for row in reader:
    print(row[0])

Throws

Traceback (most recent call last):
  File "tests/open-test.py", line 6, in <module>
    repo="[email protected]:iterative/df_sea_ice_no_header.git"
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/api.py", line 78, in _open
    with _make_repo(repo, rev=rev) as _repo:
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/api.py", line 98, in _make_repo
    yield Repo(repo_url)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/repo/__init__.py", line 77, in __init__
    root_dir = self.find_root(root_dir)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/repo/__init__.py", line 145, in find_root
    raise NotDvcRepoError(root)
dvc.exceptions.NotDvcRepoError: you are not inside of a dvc repository (checked up to mount point '/')

It looks like it's trying to use my cwd . as the repo (and I'm not working from a DVC repo).

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 11, 2020
@ghost ghost added the bug Did we break something? label Jan 11, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 11, 2020
@jorgeorpinel
Copy link
Contributor Author

p.s. If I try the HTTP URL it throws a different error: That it can't find the repo.

@ghost
Copy link

ghost commented Jan 11, 2020

Caused by urlparse not being able to understand the git URL

dvc/dvc/api.py

Lines 64 to 71 in 8787ac0

@contextmanager
def _make_repo(repo_url, rev=None):
if not repo_url or urlparse(repo_url).scheme == "":
assert rev is None, "Custom revision is not supported for local repo"
yield Repo(repo_url)
else:
with external_repo(url=repo_url, rev=rev) as repo:
yield repo

An ugly patch would be to do:

diff --git a/dvc/api.py b/dvc/api.py
index ffb044a0..ce9f3b44 100644
--- a/dvc/api.py
+++ b/dvc/api.py
@@ -87,7 +87,7 @@ def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
 
 @contextmanager
 def _make_repo(repo_url, rev=None):
-    if not repo_url or urlparse(repo_url).scheme == "":
+    if not repo_url or (urlparse(repo_url).scheme == "" and not urlparse(repo_url).path.startswith('git@')):
         assert rev is None, "Custom revision is not supported for local repo"
         yield Repo(repo_url)
     else:

@ghost
Copy link

ghost commented Jan 11, 2020

Need to research the correct way to parse those Git URLs.

@ghost
Copy link

ghost commented Jan 11, 2020

https://github.com/coala/git-url-parse

looks good 👌


EDIT: It doesn't work thaaat well, it has some issues parsing local URL's and https

@Suor
Copy link
Contributor

Suor commented Jan 13, 2020

We might want to always use external_repo() instead.

@efiop efiop added the p2-medium Medium priority, should be done, but less important label Jan 13, 2020
@jorgeorpinel jorgeorpinel changed the title api: open doesn't work with private GH repo (SSH URL) dvc.api.open: doesn't work with private GH repo (SSH URL) Jan 13, 2020
@efiop efiop added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Jan 14, 2020
@skshetry
Copy link
Member

This issue also affects public GH repo accessed through SSH URL, and not just dvc.api.open but, all of the public APIs, i.e. open, read, get_url, and summon, etc. that uses _make_repo() (ref: #3111 (comment)).

So, the following throws NotDvcRepoError as well:

import dvc.api

url = dvc.api.get_url("get-started/data.xml", repo="[email protected]:iterative/dataset-registry")

@skshetry skshetry changed the title dvc.api.open: doesn't work with private GH repo (SSH URL) dvc.api: doesn't work with repo with SSH URL Jan 15, 2020
@skshetry
Copy link
Member

@jorgeorpinel, I renamed the title, to reflect that it affects all public apis for all SSH urls. 🙂

@ghost ghost self-assigned this Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants