Improve Node Grouping in Kedro Deployment #4319

DimedS · 2024-11-11T11:43:02Z

Overview

Part of #4317. Users have expressed the need to merge multiple Kedro nodes into a single task on deployment platforms for better clarity and efficiency. Current plugins offer limited support for this, often requiring manual grouping, which complicates deployment and reduces performance.

User Insights and Challenges

"Combining nodes into single tasks improves overview, but we currently have to manually group them in Databricks."
"We can convert a single node to a Kubeflow Component, but deploying 400 nodes as separate containers adds complexity."
"Running each Kedro node in a separate container could make a small node execute in one or two seconds, but Argo’s longer pod startup time would make this inefficient."

Problem Statement

How can we design a flexible and efficient node grouping mechanism - using tags, namespaces, pipelines, or other methods - to maximise usefulness for users and streamline the deployment process?

Proposed Solution

Centralised Grouping Functionality: Instead of developing node grouping features separately for each plugin, centralise this functionality within the Kedro framework. This approach would standardise and simplify node grouping, making it easier to implement and maintain across different deployment platforms.

Tech Design - 4th & 6th December

It was decided to use namespaces for node grouping purposes and to implement helper functions within Kedro to simplify the deployment of grouped nodes with namespaces. Full summary of TD

Next steps:

datajoely · 2024-11-11T12:02:54Z

This is the most important problem for me. It's also tightly coupled with dependency management - the minute we make it easier to isolate different parts of the pipeline to be run on different containers you get into dependency isolation questions.

datajoely · 2024-11-11T12:03:19Z

Users today also tend towards tags because namespaces are a pain to use

DimedS · 2024-12-09T10:18:24Z

Summary of Tech Design Sessions on Node Grouping (4, 6 Dec 2024)

Video

Discussed:

Node Grouping Methods: Reviewed different methods for grouping nodes in Kedro, including tags, pipelines, and namespaces. Discussed the pros and cons of each approach and how they affect deployment when converting Kedro pipelines to various platforms.
Deployment Options: Considered two options for deploying grouped nodes:
1. Helper functions within Kedro combined with modifications to deployment plugins.
2. Using synthetic nodes.

Decisions:

Adopting Namespaces for Deployment:
- Reasoning:
  Despite observed challenges with namespaces (e.g., limited adoption and UI complexities), namespaces are considered the right tool for this use case. They offer value to users and should be more widely utilized. Enhancing deployment functionality based on namespaces, alongside improving the UI, will likely boost their adoption.
Choosing Helper Functions + Plugins Over Synthetic Nodes:
- Reasoning:
  Synthetic nodes are technically complex, less useful for users, and harder to implement. Instead, focusing on helper functions and plugin modifications aligns better with Kedro’s approach and encourages collaboration with plugin maintainers to ensure proper maintenance.

Next Steps:

Prototype Development:
- Develop a deployment solution prototype using helper functions in Kedro, tested with the kedro-airflow plugin to enable namespace deployment to Airflow. Results will be shared in the Tech Design session, approximately in January.
Namespace UI Improvements:
- Collaborate with Kedro-Viz (@Huongg) to improve the UI for namespaces and drive increased adoption. (See issue Enhance Kedro Namespaces adoption #4343.)
Engagement with Plugin Developers:
- Continue engaging with deployment plugin developers to establish a stable plugin maintenance model. (See issue Improve Third-Party Deployment Plugins Reliability and Compatibility #4318.)

Additional Information:

Feel free to add your thoughts or suggestions here. If there’s anything to update or clarify in the summary, please let me know.
A link to the video recordings of both sessions will be published soon.

DimedS · 2024-12-09T11:43:33Z

from @marrrcin :
_I was not able to attend the follow up :sadcat2:
For plugin development, a functionality that would be super helpful would be an utility function in official Kedro API that will allow to pass in the:

pipeline
grouping criterion
and result in list of grouped nodes on the grouping criterion.
I don't know if that's how you plan to expose that :thinking_face:
By iterating on the results of such grouped pipeline, plugins will be able to easily do the M:N mapping instead of 1:1 . Then plugins will be able to create orchestrator-specific pipelines which will invoke sth like kedro run --from-nodes --to-nodes .

astrojuanlu · 2024-12-09T11:49:51Z

Despite observed challenges with namespaces (e.g., limited adoption and UI complexities), namespaces are considered the right tool for this use case.

I value the desire of keeping momentum and devoting more time to understanding what namespaces do and how do they work so at least we can discuss more intelligently about them.

But going forward I think there's an opportunity to explore innovative new solutions, or just make namespaces an implementation detail so that they continue to exist but they become invisible for the user and get swept under a more usable API layer. I know I sound like a broken record but I'll say it again: more documentation will not fix bad Developer Experience.

In short: agree to continue exploring them (on the grounds of keeping the momentum on this topic and not having to do another knowledge sharing session in ~12 months), but I think we should timebox this effort, and put a deadline on when do we think we're ready to go back to the drawing board and continue iterating as a team.

datajoely · 2024-12-09T14:41:40Z

I think namespaces are critical for any way we eventually unify and simplify deployment to orchestrators, I'm arguing dependency isolation is the basically the same problem - but would like to see if others agree?

marrrcin · 2024-12-10T08:30:02Z

I agree with the need for dependency isolation for large projects 👍🏻

For grouping by the namespace - although it's fine to have anything to group on, namespaces were (at least for me) used to group larger chunks of the pipeline, e.g.:

<ns1 = data processing part> --> <ns2 = modeling variant 1>
                             --> <ns2 = modeling variant 2>

etc. (like in the dynamic pipelines https://getindata.com/blog/kedro-dynamic-pipelines/ ).

So grouping by the namespace will actually require thinking about the target deployment as soon as you start writing the pipeline, which means it will impact the data catalog/parameters creation too (because of namespace prefixes) = more cognitive load.

Plus the projects might end up with having a lot of "synthetic" namespaces, just for the sake of preparing the pipeline for an orchestrator.

astrojuanlu · 2024-12-10T11:46:59Z

Copy-pasting some comments about dependency isolation to #4147 and collapsing them here

deepyaman · 2024-12-10T18:41:48Z

Copy-pasting some comments about dependency isolation to #4147 and collapsing them here

OK, but I'm going to just reiterate my points specifically related to node grouping and deployment, so that they don't get skipped/misconstrued as being only relevant for dependency management:

Supporting complex node grouping is unnecessary, at least to start with; that's akin to the micro-packaging over-engineering problem. The vast majority of users would benefit from being able to deploy each modular pipeline (with it's own namespace) separately. This also creates pretty clear boundaries where node persistence, etc. may be required.

TL;DR I agree with the idea of namespaces and supporting deployment based on namespaces, but I further posit that namespaces just being per modular pipeline is sufficient, even for deployment, and that will also make them easier to adopt.

merelcht added this to the Deployment milestone Nov 11, 2024

merelcht added this to Kedro Framework Nov 11, 2024

merelcht moved this to To Do in Kedro Framework Nov 11, 2024

merelcht assigned DimedS Nov 11, 2024

DimedS mentioned this issue Nov 11, 2024

Enhance Kedro Deployment #4317

Open

iamelijahko self-assigned this Nov 11, 2024

DimedS mentioned this issue Nov 21, 2024

Enhance Kedro Namespaces adoption #4343

Open

6 tasks

Galileo-Galilei mentioned this issue Nov 30, 2024

MLflow Child Runs per Pipeline Galileo-Galilei/kedro-mlflow#448

Open

github-actions bot mentioned this issue Dec 1, 2024

Monthly issue metrics report #4358

Open

This comment was marked as off-topic.

Sign in to view

This was referenced Dec 11, 2024

Prototype for node grouping deployment solution using namespaces #4376

Open

Modify kedro-airflow plugin to support namespace-based DAG grouping kedro-org/kedro-plugins#962

Open

DimedS added the Type: Parent Issue label Dec 11, 2024

This was referenced Dec 13, 2024

User research of low user-adoption of namespace (TBD with PM and Design Lead) #4382

Open

Investigate using Kedro-viz to increase user-adoption of namespace and other node grouping methods #4383

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Node Grouping in Kedro Deployment #4319

Improve Node Grouping in Kedro Deployment #4319

DimedS commented Nov 11, 2024 •

edited

Loading

datajoely commented Nov 11, 2024

datajoely commented Nov 11, 2024

DimedS commented Dec 9, 2024 •

edited

Loading

DimedS commented Dec 9, 2024 •

edited by iamelijahko

Loading

astrojuanlu commented Dec 9, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

datajoely commented Dec 9, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

marrrcin commented Dec 10, 2024 •

edited

Loading

astrojuanlu commented Dec 10, 2024

deepyaman commented Dec 10, 2024

Improve Node Grouping in Kedro Deployment #4319

Improve Node Grouping in Kedro Deployment #4319

Comments

DimedS commented Nov 11, 2024 • edited Loading

Overview

User Insights and Challenges

Problem Statement

Proposed Solution

Tech Design - 4th & 6th December

Next steps:

datajoely commented Nov 11, 2024

datajoely commented Nov 11, 2024

DimedS commented Dec 9, 2024 • edited Loading

Summary of Tech Design Sessions on Node Grouping (4, 6 Dec 2024)

Video

Discussed:

Decisions:

Next Steps:

Additional Information:

DimedS commented Dec 9, 2024 • edited by iamelijahko Loading

astrojuanlu commented Dec 9, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

datajoely commented Dec 9, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

marrrcin commented Dec 10, 2024 • edited Loading

astrojuanlu commented Dec 10, 2024

deepyaman commented Dec 10, 2024

DimedS commented Nov 11, 2024 •

edited

Loading

DimedS commented Dec 9, 2024 •

edited

Loading

DimedS commented Dec 9, 2024 •

edited by iamelijahko

Loading

marrrcin commented Dec 10, 2024 •

edited

Loading