Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop fitting pipeline after last fit block #132

Merged
merged 6 commits into from
Jan 8, 2021
Merged

Stop fitting pipeline after last fit block #132

merged 6 commits into from
Jan 8, 2021

Conversation

sarahmish
Copy link
Contributor

Resolve #131.

if block_name == self._last_fit_block:
early_stop = True

if (not early_stop) or (block_name in output_blocks):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (block_name in output_blocks) should be changed to just output_blocks.

The logic here is:

  1. If output_blocks is not empty, there is still at least 1 block from which we want the output
  2. The block could not be "this one", but rather one that is later down the chain.
  3. If we check for name match, we may skip the current block, and then we will not be able to produce the next ones, which we want to capture.

So, the if here would be:

if fit_pending or output_blocks:
    self._produce_block...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to modify below, on line 798 (I cannot add a comment there because it has no changes):

The if output_variables is not None and not output_blocks can be changed to:

if output_variables:
    if not output_blocks:
        # do the return logic here
elif not fit_pending:
    return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @csala for the review. I went ahead with the changes, I'm wondering if the return logic is a bit difficult to trace now

for block_name, block in self.blocks.items():
    # perform regular procedure

    if output_variables:
        if not output_blocks:
            # find result

            if debug:
                return result, debug_info

            return result

        elif not fit_pending:
            if debug:
                return debug_info

            return

# start_ check

if debug:
    return debug_info

Ideally, it's a lot easier for me to see what is happening when there are less returns to follow. I don't have a solution to this but I would prefer to break instead and let the last return carry it out.

if output_variables:
    if not output_blocks:
        # return logic
elif not fit_pending:
    break

@@ -767,6 +771,7 @@ def fit(self, X=None, y=None, output_=None, start_=None, debug=False, **kwargs):
debug_info = defaultdict(dict)
debug_info['debug'] = debug.lower() if isinstance(debug, str) else 'tmio'

early_stop = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early_stop sounds like an extraordinary thing but this functionality is just regular behavior.
I suggest inverting the meaning of the variable and calling it fit_pending: "If True, there are still blocks that are pending to be fitted"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -777,7 +782,10 @@ def fit(self, X=None, y=None, output_=None, start_=None, debug=False, **kwargs):

self._fit_block(block, block_name, context, debug_info)

if (block_name != self._last_block_name) or (block_name in output_blocks):
if block_name == self._last_fit_block:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this check above line 776, so this is the first thing that we do on the loop.

The reason is that if we have a start_ that falls after the last_fit_block, we will skip this and keep producing blocks until the end unnecessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@sarahmish sarahmish requested a review from csala January 7, 2021 23:33
@csala csala merged commit 8446048 into MLBazaar:master Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop pipeline fitting after the last block with fit method
2 participants