Replies: 13 comments
-
Related core issues: #3439, #2378.
No need for the Something else we've recommended in the past is to keep a version number in a code comment (and update it when imports change, which can be automated). That way |
Beta Was this translation helpful? Give feedback.
-
p.s. are you planning to contribute a draft of this short how-to @aguschin? Feel free to. I would only like to clarify when/which imports can be expected to cause results to change. Ideally only internal project imports, but not really external libraries, right? i.e. I'm not sure about recommending to depend on package config files (like |
Beta Was this translation helpful? Give feedback.
-
It's a good question, @jorgeorpinel ! For the minimal example, we could easily add
If it seems goods for the "how to", I can create a draft covering both cases with the explanation why we need the more advanced one.
You are correct, some libraries can't change the result usually (one obvious example would be I hope it is more clear now. Please provide me your thoughts on this :) |
Beta Was this translation helpful? Give feedback.
-
We can def mention this as a tip, but I wouldn't have it as an "official" part of the how-to steps. Also DVC is lang-agnostic and that it's very Python-centric. Prob not every platform has a command to print a list of req paks.
So why not use a pre-commit hook? 🙂 Sorry, I'm just still not convinced that it's the intended use for DVC stages. It can def. be a nice trick though.
The 2 cases I see are: a) Make the pipeline depend on a file containing the list of required packages and versions (with
TBH if this is very rare, more of an error or edge case, I wouldn't recommend that route (a) in our docs. WDYT @dberenbaum ? (And in that case, having only (b), would the doc be too short to be worth it?) Final though: this may be more of a Best Practice than a how-to. How tos are more like a FAQ — something users ask about regularly. I'd check with the core team first to get a better impression of how recurrent this Q is. They handle most support cases. |
Beta Was this translation helpful? Give feedback.
-
p.s. Best Practices is a hypothetical new docs section we've been planning for years 😆 So if this is a BP, we can just add a checkbox to iterative/dvc.org#72 for now. |
Beta Was this translation helpful? Give feedback.
-
I think it's a really useful suggestion that could help in some circumstances, but I also understand the hesitancy to introduce it as a definitive how-to. A few questions that come to mind:
Despite those concerns, this suggestion doesn't need to be universally applicable to be useful. It's more about what methods make sense to mention, and what is the right way to frame them? I like the best practices suggestion, or maybe a blog post, but interested to hear what others think. |
Beta Was this translation helpful? Give feedback.
-
@dberenbaum, thank you for good points on this!
@jorgeorpinel, thank you for the concerns! I think few of them are answered above, few more:
The pre-commit hook is used just before commit and thus don't solve the problem. Let's suppose I've updated environment, run several experiments and then stage a commit. I could write a hook that will at this points will update my requirements.txt (or whatever file we use), but my experiment which I ran locally, already associated with the previous commit which has outdated environment to this moment.
In this case I don't think it is how-to, more like a practice which is useful if you want to prevent this issue (usually after few cases when you stumble upon this). For the blog post this also could be a little short - but I don't know what length a blog post is supposed to have.
Do we have any example on this? I couldn't find one, unfortunately. As of myself, I never used this kind of practice, so it's a bit hard to imagine the exact reasons why this could be preferable. It would be great to see one to understand how this practice can be applied. |
Beta Was this translation helpful? Give feedback.
-
"""My beautiful script
Version 0.1.7 # Bump patch no. if dependencies were updated.
...
"""
import ... |
Beta Was this translation helpful? Give feedback.
-
What I meant by this is that if you want a stage to handle dependencies, why not have that stage create the desired environment instead of only report the environment? That way, the environment becomes not only a versioned dependency but also reproducible.
Yes, sounds good. My question (directed at anyone who's interested) is to what extent should reproducibility of the environment and infrastructure be a goal of DVC? Maybe this belongs in a separate discussion. |
Beta Was this translation helpful? Give feedback.
-
My 2cs on this. I see two different problems:
I would consider them separately (even if they might overlap). To my mind, if we have a On the other hand, making a pipeline that can detect changes in the env can be beneficial for the development phase so that |
Beta Was this translation helpful? Give feedback.
-
Good point. I took the conversation towards reproducibility because in other parts of DVC, we generally have stages reproduce the versioned outputs, so a stage might:
However, it makes sense to start with I still prefer it as a blog post or best practice for a specific use case rather than a definitive how-to in part because it feels a bit like a hack (not a bad thing, it's probably the best workaround given the constraints of dvc). If we make it too "official," we may signal that dvc supports tracking package dependencies, which I don't think is accurate right now. For example, if the output of the next stage in the pipeline doesn't change, I think |
Beta Was this translation helpful? Give feedback.
-
Interesting debate, I'm coming from #6115 Just a quick comment about the This might look like an edge case but switching versions of CUDA/cuDNN could result in significant impacts regarding model performance, training/inference speed and could even break the In the end we opt for combining At least until an alternative solution is implemented in the DVC side or a fully agreement is achieved regarding what "best practice" should be recommended. |
Beta Was this translation helpful? Give feedback.
-
Should this be moved to the dvc repo discussions for now? |
Beta Was this translation helpful? Give feedback.
-
The results of
dvc repro
could be different if we update libraries in our environment, but stilldvc repro
won't spot that. It would be great to have some suggested solutions for this in the docs, for example, in "User Guide > How to". For typical python setup the workaround would to add stage withpip freeze
, something like this:It there are other good solutions, would be great to add them too.
Beta Was this translation helpful? Give feedback.
All reactions