Cache node output in generic pipeline flow local execution #3126

NeethuES-intel · 2023-03-13T19:11:08Z

NeethuES-intel
Mar 13, 2023

Hi,

I am new to Elyra & getting started on it. I am trying to build a generic pipeline which includes some image pre-processing steps as node components. The last node is the execution which does the image inference using Intel Openvino tool. Is there a way I can cache the image output from node components to be used in the last node ? Storing & reading image form file system will slow down the processing since these images are large data. Any inputs with regard to caching images will be really helful.

Thanks.

ptitzler · 2023-03-13T23:14:48Z

ptitzler
Mar 13, 2023
Maintainer

Hi @NeethuES-intel,

TL;DR: It is not possible to exchange [large] data between two nodes without persisting it. This applies to all runtimes (local execution in JupyterLab, Kubeflow Pipelines, and Apache Airflow.)

Conceptually every pipeline node is executed in isolation from other nodes. Even though Kubeflow Pipelines and Apache Airflow support direct data exchange (via inputs and outputs) and indirect data exchange (via references to persistent storage) between nodes, they explicitly restrict the amount of data that can be exchanged directly. Consider, for example, a scenario where one node produces several gigabytes of data that is consumed by another node. If one were to pass the data directly this can cause major issues depending on the kinds of 'communication channels' that the runtimes use.

Support for local pipeline execution was primarily introduced to streamline initial pipeline development, whereas the other runtimes would be used in production where large amounts of data need to be processed. We therefore didn't implement direct data exchange for this runtime.

Hope this helps!

2 replies

NeethuES-intel Mar 14, 2023
Author

Thank you @ptitzler. Are you aware of any pipeline visualization tool that can be used for our memory sharing (cache) requirement at the inference/execution stage ?

ptitzler Mar 15, 2023
Maintainer

Unfortunately I do not

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache node output in generic pipeline flow local execution #3126

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Cache node output in generic pipeline flow local execution #3126

NeethuES-intel Mar 13, 2023

Replies: 1 comment · 2 replies

ptitzler Mar 13, 2023 Maintainer

NeethuES-intel Mar 14, 2023 Author

ptitzler Mar 15, 2023 Maintainer

NeethuES-intel
Mar 13, 2023

Replies: 1 comment 2 replies

ptitzler
Mar 13, 2023
Maintainer

NeethuES-intel Mar 14, 2023
Author

ptitzler Mar 15, 2023
Maintainer