-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental runs/"Run only missing" #221
Comments
I'm going to link some of #225 to this. It had some great ideas that we could use here. |
This was brought up by a user recently (cc @pascalwhoop), but the title of the issue might make it difficult to locate. "Run only missing", "incremental runs", "change detection" could be some possible themes. It is worth noting that to make this feasible, However, this would also move us closer to the "actually-an-orchestrator" territory, which we've been trying to avoid. I think making |
This would be useful for interactive run too. Stateful runs will also open up to a "lineage" problem. i.e. pipeline_1 create dataset_1 and pipeline_2 depend on dataset_1, is it possible to re-create the whole run history. These are all interesting and useful features, but they are also very challenging. |
Related: https://openlineage.io/ |
After reading more on data pipelines and Change Data Capture (this is the blog post that prompted me to come here https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/) I think calling this "Change Capture" is quite confusing. I will rename the issue for clarity. |
This is essentially a duplicate of #2307. |
Description
We're taking the principle of Change Data Capture a step further and looking at a way for Kedro to recognise code, parameter and data changes and only re-run the sections that need to be rebuilt to affect the downstream pipelines.
You have called this run-only missing in #82 and #30, and we're finally getting smart about it.
Context
We're going to help shorten your development time when running your pipeline because you don't have to worry about re-running the entire pipeline anymore.
The text was updated successfully, but these errors were encountered: