Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using path relative to DVC project root #779

Closed
prihoda opened this issue Jun 15, 2018 · 3 comments
Closed

Using path relative to DVC project root #779

prihoda opened this issue Jun 15, 2018 · 3 comments
Assignees
Labels
enhancement Enhances DVC
Milestone

Comments

@prihoda
Copy link
Contributor

prihoda commented Jun 15, 2018

Hi guys. I was wondering whether we could make use of DVC's knowledge of the project root path.

I'm regularly referencing my /scripts folder from DVC files inside my /data folder using relative paths such as ../../../scripts/something.py, which can get quite annoying.

Would it be possible to reference paths relative to the DVC project root?

  • In dependencies and outputs, paths starting with / would be handled as relative to project root (since we cannot reference paths outside the DVC project anyway)
  • In commands, an environment variable could be set to be able to use $DVCROOT/scripts/something.py.
@efiop
Copy link
Contributor

efiop commented Jun 15, 2018

Hi @prihoda !

In dependencies and outputs, paths starting with / would be handled as relative to project root (since we cannot reference paths outside the DVC project anyway)

Actually, we have support for external local files merged into master and for those /path is a valid one.

In commands, an environment variable could be set to be able to use $DVCROOT/scripts/something.py.

Unfortunately I can't see a good way for dvc to setup those env vars automatically without getting into your bashrc. Maybe I'm missing something.

In the upcoming 0.9.8 you could setup a remote with the project's root directory and reference that in dependencies/outputs of your stages. I.e.:

$ pwd
$ /path/to/myrepo # DVC project's root directory
$ dvc remote add dvcroot /path/to/myrepo
$ dvc run -d remote://dvcroot/something.py ...

But it still looks a bit too long. Though, we could consider treating dvcroot as a scheme(i.e. dvcroot://scripts/something), which looks a bit better IMHO. That being said, this method is rather applicable to -d/-o notation and probably not to the cmd itself, as I'm not sure we should touch it. Do you specify ../../../scripts/something.py in your cmd as well or just in -d/-o?

Another option would be to do as git does(i.e. git rev-parse --git-dir or smth like that) and introduce some dvc command that will spit out root dir, so that you could reference it in your commands(both -d/-o and the cmd itself). I.e. dvc run -d $(dvc config root)/scripts/something.py ... python $(dvc config root)/scripts/something.py. That sounds like the best way to go in my opinion. Would that be suitable for you?

@prihoda
Copy link
Contributor Author

prihoda commented Jun 15, 2018

Yup I'm using the path to the script also in the command.

Not sure how DVC runs the command, but you can pass environment variables to commands from python like so:

import subprocess, os

my_env = os.environ.copy()
my_env["DVCPATH"] = "/some/path"
subprocess.Popen("echo $DVCPATH/file.txt", env=my_env, shell=True)
# /some/path/file.txt

The remotes look cool, I'll look into the docs when they are ready.

@efiop
Copy link
Contributor

efiop commented Jun 15, 2018

Yup I'm using the path to the script also in the command

Thanks for clarifying. Looks like dvc config root(or maybe dvc root smth similar) would be the best choice. I'll take a closer look into it soon.

Not sure how DVC runs the command, but you can pass environment variables to commands from python like so:

Unfortunately that would not work in a general case, because you also need to be able to use that env in -d/-o notation for dvc run and we would not be able intercept that, because the evaluation is performed by sh and the only way to combat that would be to wrap $DVCPATH(or DVCROOT) in single quotes so that dvc could parse it itself which would cause a lot of confusion. E.g. here are two commands:

dvc run -d $DVCROOT/scripts/something.py ... python $DVCROOT/scripts/something.py
dvc run -d '$DVCROOT'/scripts/something.py ... 'python $DVCROOT/scripts/something.py'

First one will evaluate to:

dvc run -d /scripts/something.py ... python  /scripts/something.py

because shell doesn't know DVCROOT env var. So this method would cause a lot annoyances.

On the other hand the proposed dvc root would have to return relative path to dvc root and would work like so:

$ dvc root
# ../../../
$ dvc run -d $(dvc root)/scripts/something.py ... python $(dvc root)/scripts/something.py

which is acceptable.

Thanks,
Ruslan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

No branches or pull requests

2 participants