Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stating dependencies between scripts/modules #1401

Closed
erdnaavlis opened this issue Dec 3, 2018 · 5 comments
Closed

Stating dependencies between scripts/modules #1401

erdnaavlis opened this issue Dec 3, 2018 · 5 comments
Labels
feature request Requesting a new feature

Comments

@erdnaavlis
Copy link

Hello!

I hope the explanation below is clear. Please let me know otherwise.

Say I have a utils.py with some awesome helpful classes that I reuse frequently in a certain dvc tracked repo.

Say I have:

  • script1.py that uses utils.py and that takes data0.csv and processes it to data1.csv
  • script2.py that uses utils.py and that takes data1.csv and processes it to data2.csv
  • script3.py that uses utils.py and that takes data2.csv and processes it to data3.csv
  • etc ...

(In the example above all scripts are part of the same pipeline but it they could be from different pipelines.)

The point that perhaps could be improved is that, as far as I know, for each data*.csv I have to add to its dependencies the correspondent script and utils.py. And of course that this can cascade if utils.py depends on utils1.py which depends on utils2.py, etc... If that is the case, then I have to remember to, every time utils.py is a dependency, to include the others utils*.py as dependencies as well.

Is there a way in the dvc to say that a scriptB.py depends on scriptA.py so that every time scriptB.py is a dependency, then scriptA.py is also an implicit dependency?
Like a variant or an alternative to dvc run where the "output" is not a data file but a .py?

Thanks is advance!

@efiop
Copy link
Contributor

efiop commented Dec 4, 2018

Hi @andrethrill !

Amazing idea! We could totally support that, by implementing something like --no-remove-outs(we will figure out a better name, suggestions are welcome 🙂 ) #1214 and using it with the already existing -O|--outs-no-cache option. So your command would look like dvc run -d scriptA.py -O scriptB.py --no-remove-outs.

Thanks,
Ruslan

@ghost ghost added the feature request Requesting a new feature label Dec 5, 2018
@efiop
Copy link
Contributor

efiop commented Apr 4, 2019

#1214 implemented --persist-no-cache, which is basically the same as --no-remove-outs described above. Now we have such problems that keep us from supporting these stating deps feature: when dvc sees that it has 1 dependency and 1 output, it assumes that it is dvc import, so a workaround for that would be to either use a dummy dependency/output or a dummy command (something like echo ""). The latter workaround, would allow stating dependencies even in the current state:

dvc run -d scriptA.py --perist-no-cache scriptB.py 'echo ""'

Feel free to try it out and let us know what you think 🙂

Created #1830 for that import issue

@efiop efiop closed this as completed Apr 4, 2019
@anotherbugmaster
Copy link
Contributor

I'm sorry, but how exactly does this solve the problem stated above?

Topic starter asked for a way to automatically specify python imports dependencies. Could you implement some kind of language specific plugin for that?

Thanks.

@anotherbugmaster
Copy link
Contributor

anotherbugmaster commented Jan 27, 2020

I think the workaround for now is to manually compile python endpoint like that:

python -m compileall script_to_run.py

and to add a script_to_run.pyc as a dependency to consequent scripts. Python interpreter doesn't re-compile .py whose dependencies weren't changed, which is exactly what we need in this case.

@efiop
Copy link
Contributor

efiop commented Jan 27, 2020

Hi @anotherbugmaster !

Looks like I've indeed missed something in the original request, but for the issue you've described we have another ticket #1577 . Please take a look and let us know if that is what you mean :)

Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

3 participants