-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using parameters on pipeline commands #4525
Comments
The idea sounds like a nice addition to current functionality.
@fredtcaroli
(similar for eval)
|
That could be the case if you're running a
I'm aware there are a few ways to work around this, all I'm saying is that none of them are simple and they all require some amount of boilerplate code. Let me tell you a bit about my context... I'm one of the people responsible for my (large sized) company's data science standards, and we have recently adopted DVC as our data and model versioning tool. |
@fredtcaroli you are right, upon repro, we should not require user to provide the params that he/she already provided during
You are right about this one, too. We do lack straightforward way of passing the parameters to stage command. Allowing passing the parameters in a way that (for example) Github actions allow would be a nice enhancement. |
I wonder if it can be done on top of this PR #4463 ? @fredtcaroli @pared WDYT? |
@fredtcaroli that's a great scenario! We had this discussion - #3633. It seems like this issue is a duplicate of #3633. But your proposal (the bestest way) that is close to the one we come up with during the discussion. I even like your description more (we just need to decide if jinja is the best way to expand variables). There are plans to implement this in the near future all together with more advanced |
Closing as the parametrization can also track used variables. See #3633 |
I currently have a couple of stages that use the same script. For example, I have a single script for fetching data from my data lake, but I call it twice: once for training data and another for testing data. There are a couple of ways I can accomplish that.
Naive way
So my
fetch_data.py
receives a date range. Training data is the entire month of April, and testing data is the first week of May.This works OK, but I'd like to run retraining every month, so these dates need to be parameterized somehow.
A better way
Let's set up a
param.yaml
file this time. We'll need to tell our script how to read that parameter thoughThis is a bit better. There's definitely a couple of hoops you need to go through, but it gets the job done.
You could also instead bake a
stage
parameter that tells you which parameters to read. That might be better for some cases.The bestest way
Now here's my feature request. Imagine a world where I can simply setup my script parameters to read directly from the parameters file. Here's my suggestion:
See what I did there? I wouldn't even need to specify the
params
specification, since you can infer it from thecmd
. That could also play nicely with other parameter files, like so:So what do you think? That would greatly simplify my workflow, and I think it would be a great addition to this already awesome tool.
The text was updated successfully, but these errors were encountered: