Refactor run methods more into abstract method #4353

merelcht · 2024-11-26T15:00:34Z

Description

Resolves #4290

Development notes

Refactored and merged logic of the various _run method implementations into the abstract _run method.
Added abstract _get_executor method, implemented by each runner.
Refactored logic to determine number of workers from ParallelRunner and ThreadRunner into shared validate_max_workers method.
Changed hook_manager argument in runners to allow it to be None, which is needed for the ParallelRunner, because hook manager can't be serialised.
Updated TestSuggestResumeScenario for SequentialRunner, because it's now using a ThreadPoolExecutor, the suggestions can vary per run. I've manually created a project with the same pipelines and verified that the suggestions (even if they vary) do always work.
Added TestSuggestResumeScenario tests for ThreadRunner.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Read the contributing guidelines
Signed off each commit with a Developer Certificate of Origin (DCO)
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes
Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Merel Theisen <[email protected]>

…l runner as well Signed-off-by: Merel Theisen <[email protected]>

Signed-off-by: Merel Theisen <[email protected]>

tests/runner/test_sequential_runner.py

Signed-off-by: Merel Theisen <[email protected]>

noklam · 2024-11-29T09:24:01Z

kedro/runner/runner.py

+    ) -> ThreadPoolExecutor | ProcessPoolExecutor:
+        """Abstract method to provide the correct executor (e.g., ThreadPoolExecutor or ProcessPoolExecutor)."""
+        pass
+
    @abstractmethod  # pragma: no cover


is this still an abstractmethod?

Yes, because it's still necessary to have _run() for any Runner class and you can still overwrite this when creating a custom Runner. But now it's also easier to create a custom runner, because the _run() method has more in place to get started from.

kedro/runner/sequential_runner.py

kedro/runner/runner.py

noklam · 2024-11-29T09:45:27Z

kedro/runner/thread_runner.py

-                    )
-
-                    self._release_datasets(node, catalog, load_counts, pipeline)
+        super()._run(


Is it still needed if it's just using the base method?

Yes, because _run() is still an abstract method.

noklam · 2024-11-29T09:50:34Z

kedro/runner/runner.py

+from collections import Counter, deque
+from concurrent.futures import (
+    FIRST_COMPLETED,
+    ProcessPoolExecutor,
+    ThreadPoolExecutor,
+    wait,


Would this import multiprocessing as a dependencies? I recalled in the past we have issues with ShelveStore because even importing the library cause issues on restricted environment like AWS Lambda.

Hmm yeah I think it does.. Do you remember what the problem was with that?

idanov

It's amazing how similar now all the runners start to look like, I love it!

kedro/runner/runner.py

kedro/runner/sequential_runner.py

kedro/runner/runner.py

Signed-off-by: Merel Theisen <[email protected]>

idanov

Awesome, this makes all the runners look almost identical!

kedro/runner/thread_runner.py

kedro/runner/sequential_runner.py

kedro/runner/parallel_runner.py

Co-authored-by: Ivan Danov <[email protected]> Signed-off-by: Merel Theisen <[email protected]>

Signed-off-by: Merel Theisen <[email protected]>

DimedS

Thank you for the PR, @merelcht! Great job on unifying and simplifying the runner's code - it looks great to me! I’ve left one comment suggesting a potential complexity optimisation.

DimedS · 2024-12-10T15:05:26Z

kedro/runner/runner.py

+
+        with self._get_executor(max_workers) as pool:
+            while True:
+                ready = {n for n in todo_nodes if node_dependencies[n] <= done_nodes}


I propose reducing the complexity of that line in a separate PR because this becomes more critical with the unification of runners. As I understand, the previous implementation of the SequentialRunner (most popular runner) simply executed an ordered list of nodes sequentially. However, the current approach performs a full loop over all rest nodes after each executed node, this results in an overall complexity greater than quadratic.

I believe we can achieve linear complexity by maintaining a count of unmet dependencies for each node. As nodes are completed, we decrement the counter for their dependents and mark a node as ready when its counter reaches zero.

That's a good point @DimedS. I'll create a new issue for this, so it's not forgotten.

merelcht and others added 11 commits November 4, 2024 18:06

First refactor step to make _run() implementations more similar

fc9c5c1

Signed-off-by: Merel Theisen <[email protected]>

Move common logic to abstract _run method, use executor for sequentia…

62569f7

…l runner as well Signed-off-by: Merel Theisen <[email protected]>

Refactor max workers logic into shared helper method

dcd6991

Signed-off-by: Merel Theisen <[email protected]>

Add resume scenario logic

58abcf7

Signed-off-by: Merel Theisen <[email protected]>

Small cleanup

9e2e278

Signed-off-by: Merel Theisen <[email protected]>

Clean up

3a9dde6

Signed-off-by: Merel Theisen <[email protected]>

Fix mypy checks

af955b8

Signed-off-by: Merel Theisen <[email protected]>

Merge branch 'main' into refactor/abstract-more-into-run-method

39ac276

Fix sequential runner test

a2db9cf

Signed-off-by: Merel Theisen <[email protected]>

Fix thread runner

41154a2

Signed-off-by: Merel Theisen <[email protected]>

Ignore coverage for abstract method

186f845

Signed-off-by: Merel Theisen <[email protected]>

merelcht self-assigned this Nov 26, 2024

merelcht and others added 7 commits November 26, 2024 16:46

Merge branch 'main' into refactor/abstract-more-into-run-method

123f678

Try fix thread runner test on 3.13

b589fa2

Signed-off-by: Merel Theisen <[email protected]>

Fix thread runner test

00ca950

Signed-off-by: Merel Theisen <[email protected]>

Fix sequential runner test on windows

33b3ec6

Signed-off-by: Merel Theisen <[email protected]>

More flexible options for resume suggestion in thread runner tests

351e0ef

Signed-off-by: Merel Theisen <[email protected]>

Clean up + make resume tests the same

6659132

Signed-off-by: Merel Theisen <[email protected]>

Merge branch 'main' into refactor/abstract-more-into-run-method

f4b3610

merelcht commented Nov 27, 2024

View reviewed changes

tests/runner/test_sequential_runner.py Outdated Show resolved Hide resolved

merelcht and others added 2 commits November 27, 2024 11:22

Update tests/runner/test_sequential_runner.py

2fa4fa8

Signed-off-by: Merel Theisen <[email protected]>

Clean up

b3268ce

Signed-off-by: Merel Theisen <[email protected]>

merelcht marked this pull request as ready for review November 27, 2024 11:43

merelcht requested review from idanov and noklam November 27, 2024 11:44

noklam reviewed Nov 29, 2024

View reviewed changes

Merge branch 'main' into refactor/abstract-more-into-run-method

7a42cd5

merelcht requested a review from DimedS December 2, 2024 11:20

idanov reviewed Dec 3, 2024

View reviewed changes

kedro/runner/runner.py Outdated Show resolved Hide resolved

kedro/runner/sequential_runner.py Outdated Show resolved Hide resolved

kedro/runner/runner.py Outdated Show resolved Hide resolved

Address review comments

a990155

Signed-off-by: Merel Theisen <[email protected]>

idanov approved these changes Dec 10, 2024

View reviewed changes

kedro/runner/thread_runner.py Outdated Show resolved Hide resolved

kedro/runner/sequential_runner.py Outdated Show resolved Hide resolved

kedro/runner/sequential_runner.py Outdated Show resolved Hide resolved

merelcht commented Dec 10, 2024

View reviewed changes

kedro/runner/parallel_runner.py Outdated Show resolved Hide resolved

merelcht and others added 3 commits December 10, 2024 15:35

Apply suggestions from code review

1c76953

Co-authored-by: Ivan Danov <[email protected]> Signed-off-by: Merel Theisen <[email protected]>

Merge branch 'main' into refactor/abstract-more-into-run-method

3b36f8a

Fix lint

be2e285

Signed-off-by: Merel Theisen <[email protected]>

DimedS approved these changes Dec 10, 2024

View reviewed changes

merelcht mentioned this pull request Dec 10, 2024

Improve complexity of run algorithm #4373

Open

merelcht enabled auto-merge (squash) December 10, 2024 15:33

merelcht merged commit f6ecca5 into main Dec 10, 2024
34 checks passed

merelcht deleted the refactor/abstract-more-into-run-method branch December 10, 2024 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor run methods more into abstract method #4353

Refactor run methods more into abstract method #4353

merelcht commented Nov 26, 2024 •

edited

Loading

noklam Nov 29, 2024

merelcht Dec 2, 2024 •

edited

Loading

noklam Nov 29, 2024

merelcht Dec 2, 2024

noklam Nov 29, 2024

merelcht Dec 2, 2024

idanov left a comment

idanov left a comment

DimedS left a comment

DimedS Dec 10, 2024

merelcht Dec 10, 2024

merelcht Dec 10, 2024

Refactor run methods more into abstract method #4353

Refactor run methods more into abstract method #4353

Conversation

merelcht commented Nov 26, 2024 • edited Loading

Description

Development notes

Developer Certificate of Origin

Checklist

Choose a reason for hiding this comment

merelcht Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idanov left a comment

Choose a reason for hiding this comment

idanov left a comment

Choose a reason for hiding this comment

DimedS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merelcht commented Nov 26, 2024 •

edited

Loading

merelcht Dec 2, 2024 •

edited

Loading