Cache node output in generic pipeline flow local execution #3126
Replies: 1 comment 2 replies
-
Hi @NeethuES-intel, TL;DR: It is not possible to exchange [large] data between two nodes without persisting it. This applies to all runtimes (local execution in JupyterLab, Kubeflow Pipelines, and Apache Airflow.) Conceptually every pipeline node is executed in isolation from other nodes. Even though Kubeflow Pipelines and Apache Airflow support direct data exchange (via inputs and outputs) and indirect data exchange (via references to persistent storage) between nodes, they explicitly restrict the amount of data that can be exchanged directly. Consider, for example, a scenario where one node produces several gigabytes of data that is consumed by another node. If one were to pass the data directly this can cause major issues depending on the kinds of 'communication channels' that the runtimes use. Support for local pipeline execution was primarily introduced to streamline initial pipeline development, whereas the other runtimes would be used in production where large amounts of data need to be processed. We therefore didn't implement direct data exchange for this runtime. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am new to Elyra & getting started on it. I am trying to build a generic pipeline which includes some image pre-processing steps as node components. The last node is the execution which does the image inference using Intel Openvino tool. Is there a way I can cache the image output from node components to be used in the last node ? Storing & reading image form file system will slow down the processing since these images are large data. Any inputs with regard to caching images will be really helful.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions