Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-974] Running the pipeline from the last failed node #82

Closed
anuarora1990 opened this issue Aug 28, 2019 · 3 comments
Closed

[KED-974] Running the pipeline from the last failed node #82

anuarora1990 opened this issue Aug 28, 2019 · 3 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@anuarora1990
Copy link

When you are running the pipeline with 100+ nodes and if it fails in between, it is very difficult to identify the nodes that didn't run and have to run from the starting which could be very time consuming. Example: there are about 250 tasks that a pipeline needs to complete and it failed on 201 task then there is no functionality to run the pipeline by picking the remaining + failed tasks from the last run without manually identifying what's not run.

Context

This could save a lot of time when large number of nodes are involved.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.
For the last run, capture the nodes that are left and the one that failed and there should be an option which says run from the failed node from the last run.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

  1. Run it from starting again after code fix in that node.
@anuarora1990 anuarora1990 added the Issue: Feature Request New feature or improvement to existing feature label Aug 28, 2019
@lorenabalan lorenabalan changed the title Running the pipeline from the last failed node [KED-974] Running the pipeline from the last failed node Aug 28, 2019
@lorenabalan
Copy link
Contributor

lorenabalan commented Aug 28, 2019

Thanks for raising this @anuarora1990, we'll look into finding a suitable solution! I've updated the title with our internal ticket number to make it easier to track.

@lorenabalan
Copy link
Contributor

Sorry @anuarora1990 this should've been closed a while ago. Starting with Kedro version 0.15.1 there is a resume command that's suggested in the logs after a pipeline fails (kedro run --from-inputs a,b,c). It's important to note that the command won't work if the inputs are MemoryDatasets. We might revisit this in the future, but for now I will be closing this issue, please feel free to open a new one if you have additional feedback.

@Minyus
Copy link
Contributor

Minyus commented Jan 29, 2020

@anuarora1990

I suffered from the same issue and discussed at:

#30
#60

To solve the pain point, I implemented pipeline resuming option as a wrapper of Kedro.
Here is the GitHub repository:

https://github.com/Minyus/pipelinex

You do not need to specify the Kedro pipeline nodes using from-inputs option if you set only_missing option to True.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

4 participants