Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to dynamically register a pipeline #1853

Closed
niartnelis opened this issue Sep 16, 2022 · 5 comments
Closed

How to dynamically register a pipeline #1853

niartnelis opened this issue Sep 16, 2022 · 5 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@niartnelis
Copy link

Description

How to complete the dynamic registration of pipelines without registering pipelines through the register_pipelines method of pipeline_registry.py

Context

I hope to be able to dynamically add nodes or pipelines with the help of API, instead of building fixed pipelines through code. At present, it is possible to dynamically add nodes and pipelines, and dynamically execute the above pipelines, but I cannot dynamically register the pipelines, which makes it impossible to execute through the command line. Or visualize the process

Possible Implementation

I hope that the directory structure is not too fixed. Can I remove the pipeline directory and the pipeline.py and node.py below? In fact, I don’t quite understand the role of this layer of fixed directory structure. Structural organization impact, e.g. on deployment, or operational impact

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

@niartnelis niartnelis added the Issue: Feature Request New feature or improvement to existing feature label Sep 16, 2022
@deepyaman deepyaman changed the title <Title>How to dynamically register a pipeline How to dynamically register a pipeline Sep 16, 2022
@deepyaman
Copy link
Member

I haven't heard of this use case previously, but I think I can point you in the right direction:

  • The registry references the pipelines variable in kedro.framework.project; see
    pipelines = _ProjectPipelines()
  • This variable call the register_pipelines method, as you stated; however, nothing technically says that function can't access some other data (for reference, here's the code where register_pipelines is picked up:
    register_pipelines = self._get_pipelines_registry_callable(
    self._pipelines_module
    )
    project_pipelines = register_pipelines()
    )
  • Finally, you would have to call pipelines.configure() each time to pick up the latest state; see
    register_pipelines = self._get_pipelines_registry_callable(
    self._pipelines_module
    )
    project_pipelines = register_pipelines()

I hope that the directory structure is not too fixed. Can I remove the pipeline directory and the pipeline.py and node.py below? In fact, I don’t quite understand the role of this layer of fixed directory structure. Structural organization impact, e.g. on deployment, or operational impact

In order to work with some built-in functionality, a certain structure is expected, but you can technically change it (e.g. compare the "standard" structure in spaceflights to the "simplified" one in the pandas-iris starter).

Sorry this isn't a complete answer; please feel free to follow up if it's not helpful in terms of next steps to investigate. Perhaps sharing a bit more context on what the dynamic pipeline definition and registration workflow would look like in practice would also help.

@niartnelis
Copy link
Author

niartnelis commented Sep 18, 2022

I have another question to ask you, in the newly created project "kedro new --starter=pandas-iris", in the src/{projectname}/main.py file, how to pass in a custom DataCatalog without configuring The file catalog.yml is defined, which is also the step for me to realize the dynamic construction of the pipeline

When I pass in params, the internal feed_dict does add the param prefix and cannot become a valid variable
image
image
image

@deepyaman
Copy link
Member

I'm not sure I fully follow your question. If you want to define a catalog (and pipelines) dynamically, maybe you shouldn't use the CLI that expects a more standard Kedro project structure.

Instead, you can construct a runner object and pass your pipeline and catalog objects to it, as in the examples in https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/run_a_pipeline.html#run-pipelines-with-io (if you expand the hidden block).

@niartnelis
Copy link
Author

Thank you very much for your reply. It is indeed possible to do as you said. I will ask you if you encounter any problems in the future.

@deepyaman
Copy link
Member

Great! I'm going to close this, then; please feel free to open a new issue (and reference this one, if necessary), should you run into related issues down the road!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants