Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc list: handle local repos differently? #3590

Closed
ahmed-shariff opened this issue Apr 4, 2020 · 25 comments
Closed

dvc list: handle local repos differently? #3590

ahmed-shariff opened this issue Apr 4, 2020 · 25 comments
Assignees
Labels
feature request Requesting a new feature help wanted p3-nice-to-have It should be done this or next sprint product: VSCode Integration with VSCode extension research

Comments

@ahmed-shariff
Copy link

ahmed-shariff commented Apr 4, 2020

When I run the command dvc list . from any sub-directory of the project I get the following error:

ERROR: failed to list '.' - Failed to clone repo '.' to '/tmp/tmp2qmfnem7dvc-clone': Cmd('git') failed due to: exit code(128)
  cmdline: git clone --no-single-branch -v . /tmp/tmp2qmfnem7dvc-clone
  stderr: 'fatal: repository '.' does not exist
'

Though it works when executed from the root directory of the project

DVC: 0.91.1 (arch linux;pip)


UPDATED (@shcheklein):

repurposed it a bit - #3590 (comment)

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Apr 4, 2020
@efiop
Copy link
Contributor

efiop commented Apr 5, 2020

Hi @ahmed-shariff !

dvc list expects to receive a URL to a git repo, which . isn't when you are not in the git repo root. Same as you can't git clone that directory. Theoretically, we could check if URL that you pass is a git repo subdir, but I'm not sure if it is worth the effort and it also will probably lead to misuse, where people would try to use it as ls in their subdirs.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Apr 5, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Apr 5, 2020
@ahmed-shariff
Copy link
Author

I see. Thank you for the clarification.

@shcheklein
Copy link
Member

To be honest I think it makes sense to handle this as a special case:

  • don't clone
  • if URL is a local path that is part of the DVC repo - show the result of dvc list <path-to-Git-root> URL. It means if I run dvc list . I just see the files in the current location - it is probably the most expected result.

@iterative/engineering @casperdcl thoughts?

(reopening, since it's annoying to remember the special syntax when I deal with the local repo, and other users were caught by surprise)

@shcheklein shcheklein reopened this Apr 12, 2020
@shcheklein shcheklein changed the title dvc list from project subdirectory throws error dvc list: handle local repos differently? Apr 12, 2020
@casperdcl
Copy link
Contributor

@shcheklein I agree

@jamessergeant
Copy link

jamessergeant commented Apr 23, 2020

I'm using a local remote and I receive the same error when running dvc list <path-to-local-remote>. Is this expected behaviour?

For reference the "local remote" is actually a mounted network drive.

ERROR: failed to list '/mnt/dr_dvc/vision/dataset_registry' - Failed to clone repo '/mnt/dr_dvc/vision/dataset_registry' to '/tmp/tmp9bkq340cdvc-clone': Cmd('git') failed due to: exit code(128)
  cmdline: git clone --no-single-branch -v /mnt/dr_dvc/vision/dataset_registry /tmp/tmp9bkq340cdvc-clone
  stderr: 'fatal: repository '/mnt/dr_dvc/vision/dataset_registry' does not exist

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Apr 23, 2020

Hi @jamessergeant, dvc list expects the path or URL to the DVC repository itself, not to a remote storage location. In fact I believe it doesn't check remotes at all to produce the list. Whether the data exists in remote storage is not guaranteed by dvc list. You have to attempt dvc get or dvc import to find out.

jorgeorpinel added a commit to iterative/dvc.org that referenced this issue Apr 23, 2020
@jorgeorpinel

This comment has been minimized.

@efiop efiop added p3-nice-to-have It should be done this or next sprint feature request Requesting a new feature labels Apr 23, 2020
@efiop efiop added help wanted and removed awaiting response we are waiting for your reply, please respond! :) labels Jun 23, 2020
@andrewcstewart

This comment has been minimized.

@efiop

This comment has been minimized.

@efiop
Copy link
Contributor

efiop commented Mar 8, 2021

For the record: we are no longer cloning local repos, opening them directly instead. The only thing left is to make the CLI convenient for local use. E.g.

dvc list # should be same as dvc list .
cd subdir && dvc list # should be same as dvc list . subdir
dvc list dir # should be the same as dvc list . subdir

it is a bit odd from the CLI argument semantics, as it will have to rely on some heuristics, but still should be pretty convenient. Alternative might be to make the url explicit, similar to early dvc list implementations, e.g. dvc list path_in_repo --url url, but that might be an even harder pill to swallow. Both approaches will raise questions about dvc import/get too, but those are clearly unusual to use locally.

@efiop
Copy link
Contributor

efiop commented Mar 8, 2021

Current CLI:

usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] url [path]      

and with the proposed heuristics it will be:

usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev [<commit>]] [url] [path]

and if

  • no url and no path - try to list current directory in the current dvc project
  • no path and url is a local dir or vice-versa - try to list url directory

a problem that we are creating here for future us - not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc for that would make it clearer. CC @dberenbaum @jorgeorpinel

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 9, 2021

I'm all for unifying list, get, import UI

not being able to accept multiple targets (same problem as we have right now in list/get/import but now worse). An explicit --project/--url/etc for that would make it clearer

Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url includes location and path? That makes it easy to accept several ones.

BTW is this issue solved/outdated?

@efiop
Copy link
Contributor

efiop commented Mar 9, 2021

Not seeing a need to list multiple targets. Maybe get/import but what about using the import-url interface instead, where url includes location and path? That makes it easy to accept several ones.

@jorgeorpinel Doesn't work with git urls.

BTW is this issue solved/outdated?

Not the last part of it regarding handling local path as a target. Hence my questions.

@dberenbaum
Copy link
Collaborator

Not seeing a need to list multiple targets.

That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url, but I wouldn't push to change it if there's no need.

Both approaches will raise questions about dvc import/get too, but those are clearly unusual to use locally.

By locally, you mean from inside the repo itself? I can't imagine dvc import . path being useful. Or are there other questions these changes raise about import/get?

@efiop
Copy link
Contributor

efiop commented Mar 9, 2021

That's my initial thought. Are we aware of a need for this? If this was a new command, I'd prefer --url, but I wouldn't push to change it if there's no need.

@dberenbaum No requests or anything yet. Just looking in the possible future 🙂

By locally, you mean from inside the repo itself? I can't imagine dvc import . path being useful. Or are there other questions these changes raise about import/get?

Yep, from within the project or from another local project.

Btw, another interesting confusion is that people tend to use gs:// or s3:// or other dvc remote as an argument instead of git url. So maybe explicit --url or, better, --project flag would clarify the confusion in all of the commands. Btw, that would even open a possibility for future import import-url (and get get-url) unification into one command(dvc import and dvc get), since we'll have an explicit flag to differenciate the use cases of otherwise very similar commands. Though there has been some arguing about it even back when it was introduced (wish we had rfcs from back then 😉 ).

Anyway, a quick, local and intuitive solution is to go with that [url] [path] solution I've suggested above. If everyone is okay with it, of course.

@jorgeorpinel
Copy link
Contributor

In short, make url optional, default to .? Sounds good, but ideally should apply to get/import* too (for UI consistency).

For the future, if --url helps unify get and import interfaces I'm all for it.

@shcheklein
Copy link
Member

Getting back to this (as I'm playing more with it). It would significantly improve usability of the dvc list locally if make it ls semantics (recognized cwd automatically).

E.g. I was trying to see what outputs exist in the https://github.com/iterative/get-started-experiments/:

cd data
cd fashion-mnist
dvc list .

It returns root:

.dvcignore
.env
.gitignore
README.md
...
dvc.yaml
src

Trying:

dvc list . .

Also returns root.

dvc list . data/fashion-mnist/prepared

Fails:

ERROR: failed to list '.' - The path 'data/fashion-mnist/prepared' does not exist in the target repository '/Users/ivan/Projects/get-started-experiments' neither as a DVC output nor as a Git-tracked file.

dvc list -R . data/fashion-mnist

Also fails

and so on ... to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair

@shcheklein
Copy link
Member

I think this will be also part of the making list, diff, etc stable to integrate properly with VS Code.

@skshetry
Copy link
Member

skshetry commented Jun 7, 2021

to be honest, I'm lost how can I list them at this point ... looks there are a few bugs + this behavior that is inconsistent depending on the (path, cwd) pair

The workaround for now is to do dvc list $(git root) <path relative to git root>.

@machalx
Copy link

machalx commented Aug 28, 2021

Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.

@efiop
Copy link
Contributor

efiop commented Aug 29, 2021

@machalx No updates so far, unfortunately 🙁

@Toekan
Copy link

Toekan commented Jan 14, 2022

Hi,

Thanks for your work on dvc.

I have a related question: I'm trying to use dvc get path/to/local_dvc_project name_of_file_to_download and I'm getting:

ERROR: failed to get 'name_of_file_to_download' from 'local_dvc_project' - Failed to clone repo 'local_dvc_project'.

Has there been work done on allowing to use dvc get without git? I saw people wondering about use cases: I want to use dvc to download some files during runtime of a docker container, at which point I have no git credentials. Ideally I would just use boto3, but at the moment I'm not sure how to reconstruct the path to the file I want to download.

@pared
Copy link
Contributor

pared commented Jan 19, 2022

@Toekan lets move the conversation to #7270

@dberenbaum
Copy link
Collaborator

Hi, any update on dvc list for the already cloned repo? It still retuns the project's root.

You should be able to list the contents of the relative path now although the first argument will still be interpreted as the repo url (like dvc list . [relative_path]).

@efiop efiop closed this as completed Jul 27, 2023
@efiop
Copy link
Contributor

efiop commented Jul 27, 2023

I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature help wanted p3-nice-to-have It should be done this or next sprint product: VSCode Integration with VSCode extension research
Projects
None yet
Development

No branches or pull requests