Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add only-missing option to kedro run command #30

Closed
gotin opened this issue Jun 24, 2019 · 8 comments
Closed

Add only-missing option to kedro run command #30

gotin opened this issue Jun 24, 2019 · 8 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@gotin
Copy link

gotin commented Jun 24, 2019

Description

While Runner has the run_only_missing method which is very useful especially when a pipeline is being developed incrementally, but kedro run command doesn't have an option for it. If it has the option do Runner#run_only_missing, it may be useful.

Context

If kedro run has an option for doing Runner#run_only_missing, we can skip steps which have been already done before, it makes pipeline developments' more productive.

Possible Implementation

Add the run function of kedro_cli.py an argument for only_missing behavior, and add the run function of main function of run.py an argument for only_missing as well.
(This implementation has already been tried at my local environment, and it's working well)

Checklist

Include labels so that we can categorise your issue:

  • [Type: Enhancement]
  • [Priority: Medium]
@gotin gotin added the Issue: Feature Request New feature or improvement to existing feature label Jun 24, 2019
@yetudada
Copy link
Contributor

yetudada commented Jul 2, 2019

Hi @gotin! We're so glad you've been able to extend the functionality of CLI. We have left it open for users to extend uses of the CLI. Could you talk to us about how only_missing works for you?

@gotin
Copy link
Author

gotin commented Jul 8, 2019

When I'm developing a pipeline incrementally with huge size of data, run a whole pipeline from the beginning is time-consuming, and less productive. So, I want to run only nodes which is just created and haven't been run by using 'only-missing' flag. It can omit running nodes which have already been run.
I think we utilize --tag option of kedro run command for this purpose, but sometimes it's difficult for me to tag each node with a suitable name.

@Minyus
Copy link
Contributor

Minyus commented Jul 12, 2019

I almost filed a duplicate of this issue.
I totally agree with @gotin .

run_only_missing feature is a big advantage of kedro and it is inconvenient to modify 2 modules (run.py and kedro_cli.py) every time after running kedro new to use the feature.

I would suggest kedro add run_only_missing option to kedro run command too.

@idanov
Copy link
Member

idanov commented Jul 16, 2019

@gotin @Minyus We're in the process of reorganising some things around how Kedro projects are ran and ideally we would like to add an easy way for people to add their own options for running. We also plan to extend the set of options to the run command with options like kedro run --node <node-name> or kedro run --from-inputs <input1>,<input2>, would that solve the problem for you?

@gotin
Copy link
Author

gotin commented Jul 17, 2019

I’m sure node option and from inputs option are good for some situations. But those can’t work for my situation. What I want really is nothing but invoking run_only_missing from kedro command. I just want to run new nodes which haven’t run since they added.

@Minyus
Copy link
Contributor

Minyus commented Jul 17, 2019

Same opinion as @gotin
run_only_missing is what we want.

@idanov Regarding the extended options, I would suggest the interface like this.

kedro run \
--only_missing \
--force_include mymodel \
--force_include myreport_* \
--force_exclude myreport_detailed_* \ 
--warn_overwriting

This is an example to run only missing nodes AND "mymodel" node AND nodes starting with "myreport_" EXCEPT nodes starting with "myreport_detailed_" after showing a warning prompt (y/N) if a file already exists and will be overwritten.

@idanov
Copy link
Member

idanov commented Jul 19, 2019

@Minyus @gotin We don't intent to add --only_missing as an option to kedro run command. However the addition of KedroContext here will enable us to provide users with the means to easily add arguments to their own projects, so you can add --only_missing to your project if you want. I hope that will be helpful for your usecase.

@lorenabalan
Copy link
Contributor

Closing this as per comment above and under #60.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

5 participants