Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of Exceptions in MLPipeline #377

Closed
daniel-ressi opened this issue Nov 30, 2022 · 4 comments
Closed

Handling of Exceptions in MLPipeline #377

daniel-ressi opened this issue Nov 30, 2022 · 4 comments
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen

Comments

@daniel-ressi
Copy link

Description

Errors in the MLPipeline are overshadowed by a NotImplementedError Exception, which makes debugging more complex than necessary

Context

This bug occurs only if there is an Exception in the MLPipeline.training pipeline.
It is not critical as the relevant Error message is still shown above

Steps to Reproduce

If required I can prepare a better example, but this should actually be enough to reproduce the issue.

  1. Add raise ValueError("My debug message") to any node which is part of an MLPipeline (training) using kedro > 0.11

Expected Result

I expect a ValueError to be raised with "My debug message". In addition kedro provides a resume from nodes preview functionality. And this is actually cause of the issue.

Actual Result

During handling of the above exception, another exception occurred


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ kedro:8 in <module>                     │
│                                                                                                  │
│   5 from kedro.framework.cli import main                                                         │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ python3.9/site-packages/kedro/framework/cli/cli. │
│ py:211 in main                                                                                   │
│                                                                                                  │
│   208 │   """                                                                                    │
│   209 │   _init_plugins()                                                                        │
│   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 211 │   cli_collection()                                                                       │
│   212                                                                                            │
│                                                                                                  │
│ python3.9/site-packages/click/core.py:1130 in    │
│ __call__                                                                                         │
│                                                                                                  │
│ python3.9/site-packages/kedro/framework/cli/cli. │
│ py:139 in main                                                                                   │
│                                                                                                  │
│   136 │   │   )                                                                                  │
│   137 │   │                                                                                      │
│   138 │   │   try:                                                                               │
│ ❱ 139 │   │   │   super().main(                                                                  │
│   140 │   │   │   │   args=args,                                                                 │
│   141 │   │   │   │   prog_name=prog_name,                                                       │
│   142 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ python3.9/site-packages/click/core.py:1055 in    │
│ main                                                                                             │
│                                                                                                  │
│ python3.9/site-packages/click/core.py:1657 in    │
│ invoke                                                                                           │
│                                                                                                  │
│ python3.9/site-packages/click/core.py:1404 in    │
│ invoke                                                                                           │
│                                                                                                  │
│ python3.9/site-packages/click/core.py:760 in     │
│ invoke                                                                                           │
│                                                                                                  │
│ python3.9/site-packages/kedro/framework/cli/proj │
│ ect.py:366 in run                                                                                │
│                                                                                                  │
│   363 │   node_names = _get_values_as_tuple(node_names) if node_names else node_names            │
│   364 │                                                                                          │
│   365 │   with KedroSession.create(env=env, extra_params=params) as session:                     │
│ ❱ 366 │   │   session.run(                                                                       │
│   367 │   │   │   tags=tag,                                                                      │
│   368 │   │   │   runner=runner(is_async=is_async),                                              │
│   369 │   │   │   node_names=node_names,                                                         │
│                                                                                                  │
│ python3.9/site-packages/kedro/framework/session/ │
│ session.py:407 in run                                                                            │
│                                                                                                  │
│   404 │   │   )                                                                                  │
│   405 │   │                                                                                      │
│   406 │   │   try:                                                                               │
│ ❱ 407 │   │   │   run_result = runner.run(                                                       │
│   408 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   409 │   │   │   )                                                                              │
│   410 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ python3.9/site-packages/kedro/runner/runner.py:8 │
│ 8 in run                                                                                         │
│                                                                                                  │
│    85 │   │   │   self._logger.info(                                                             │
│    86 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    87 │   │   │   )                                                                              │
│ ❱  88 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    89 │   │                                                                                      │
│    90 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    91                                                                                            │
│                                                                                                  │
│ python3.9/site-packages/kedro/runner/sequential_ │
│ runner.py:73 in _run                                                                             │
│                                                                                                  │
│   70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│ ❱ 73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│   74 │   │   │   │   raise                                                                       │
│   75 │   │   │                                                                                   │
│   76 │   │   │   # decrement load counts and release any data sets we've finished with           │
│                                                                                                  │
│ python3.9/site-packages/kedro/runner/runner.py:1 │
│ 86 in _suggest_resume_scenario                                                                   │
│                                                                                                  │
│   183 │   │   postfix = ""                                                                       │
│   184 │   │   if done_nodes:                                                                     │
│   185 │   │   │   node_names = (n.name for n in remaining_nodes)                                 │
│ ❱ 186 │   │   │   resume_p = pipeline.only_nodes(*node_names)                                    │
│   187 │   │   │   start_p = resume_p.only_nodes_with_inputs(*resume_p.inputs())                  │
│   188 │   │   │                                                                                  │
│   189 │   │   │   # find the nearest persistent ancestors of the nodes in start_p                │
│                                                                                                  │
│ python3.9/site-packages/kedro_mlflow/pipeline/pi │
│ peline_ml.py:173 in only_nodes                                                                   │
│                                                                                                  │
│   170 │   │   )                                                                                  │
│   171 │                                                                                          │
│   172 │   def only_nodes(self, *node_names: str) -> "Pipeline":  # pragma: no cover              │
│ ❱ 173 │   │   raise NotImplementedError(MSG_NOT_IMPLEMENTED)                                     │
│   174 │                                                                                          │
│   175 │   def only_nodes_with_namespace(                                                         │
│   176 │   │   self, node_namespace: str                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: This method is not implemented because it does not make sense for 'PipelineML'. Manipulate directly the training pipeline and recreate the 'PipelineML' with 'pipeline_ml_factory' factory.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • kedro and kedro-mlflow version used (pip show kedro and pip show kedro-mlflow): 0.18.3 and 0.11.4
  • Python version used (python -V): 3.9
  • Operating system and version: macOS 12.5.1

Does the bug also happen with the last version on master?

Yes, tried it out

@daniel-ressi
Copy link
Author

Thank you already for your support. My suggestion would be to just call only_nodes on the training pipeline of the MLPipeline.

@Galileo-Galilei
Copy link
Owner

Hi @daniel-ressi, you're not the first one to notice this behaviour. Unfortunately, kedro filters your pipeline to suggest a resume scenario, and this breaks PipelineML object. This is the correct behaviour: you should not use the suggested command because it will not work with PipelineML which assumes you are running the entire pipeline and not part of it.

However, given how annoying this stacktrace is, I am considering changing the behaviour and only issuing a warning. The risk is that some people will run their entire pipeline before noticing PipelineML object does not work as intended.

I will try to find a way to not hinder the entire stacktrace, but I have no straighforward solution for now, sorry.

@daniel-ressi
Copy link
Author

daniel-ressi commented Dec 1, 2022

thanks for you swift response. Is the issue that kedro's resume scenario would relate to to running only the training pipeline and not the PipelineML? I would upvote a solution that just warns the user about these implications.

I guess ideally it would be possible to disable the resume scenario suggestion for a PipelineML run, but this seems not possible as it's not called through a hook butwith the Runner.

Eitherway great work @Galileo-Galilei !

@Galileo-Galilei Galileo-Galilei added enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen labels Oct 25, 2023
@Galileo-Galilei Galileo-Galilei moved this from 🆕 New to 🔖 Ready in kedro-mlflow roadmap Oct 28, 2023
Galileo-Galilei added a commit that referenced this issue Oct 22, 2024
* implement missing pipeline ml slicing functionalities

* pass tests

---------

Co-authored-by: Yolan Honoré-Rougé <[email protected]>
@Galileo-Galilei
Copy link
Owner

Closed by #601

@github-project-automation github-project-automation bot moved this from 🔖 Ready to ✅ Done in kedro-mlflow roadmap Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants