-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add only-missing option to kedro run command #30
Comments
Hi @gotin! We're so glad you've been able to extend the functionality of CLI. We have left it open for users to extend uses of the CLI. Could you talk to us about how only_missing works for you? |
When I'm developing a pipeline incrementally with huge size of data, run a whole pipeline from the beginning is time-consuming, and less productive. So, I want to run only nodes which is just created and haven't been run by using 'only-missing' flag. It can omit running nodes which have already been run. |
I almost filed a duplicate of this issue.
I would suggest kedro add run_only_missing option to kedro run command too. |
@gotin @Minyus We're in the process of reorganising some things around how Kedro projects are ran and ideally we would like to add an easy way for people to add their own options for running. We also plan to extend the set of options to the |
I’m sure node option and from inputs option are good for some situations. But those can’t work for my situation. What I want really is nothing but invoking run_only_missing from kedro command. I just want to run new nodes which haven’t run since they added. |
Same opinion as @gotin @idanov Regarding the extended options, I would suggest the interface like this.
This is an example to run only missing nodes AND "mymodel" node AND nodes starting with "myreport_" EXCEPT nodes starting with "myreport_detailed_" after showing a warning prompt (y/N) if a file already exists and will be overwritten. |
@Minyus @gotin We don't intent to add |
Closing this as per comment above and under #60. |
Description
While Runner has the run_only_missing method which is very useful especially when a pipeline is being developed incrementally, but kedro run command doesn't have an option for it. If it has the option do Runner#run_only_missing, it may be useful.
Context
If kedro run has an option for doing Runner#run_only_missing, we can skip steps which have been already done before, it makes pipeline developments' more productive.
Possible Implementation
Add the run function of kedro_cli.py an argument for only_missing behavior, and add the run function of main function of run.py an argument for only_missing as well.
(This implementation has already been tried at my local environment, and it's working well)
Checklist
Include labels so that we can categorise your issue:
The text was updated successfully, but these errors were encountered: