Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Fixes case where optional user inputs broke computation #133

Merged
merged 1 commit into from
Jun 21, 2022

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented May 30, 2022

The execute function gets all upstream nodes of the required node to compute.
This will mean that there will likely be "user input" nodes to cycle through. When
we were computing the DFS value for them, we would assume they were required.

To illustrate, if you had a function that had optional input, baz e.g.

def foo(bar: int, baz: float = 1.0) -> float:

And then requested foo from the driver, there would be three nodes to compute: foo, bar, baz.

This meant that if you did not pass in a value for baz, and baz
was a user input, Hamilton would complain that a required node
was not provided. Even though it was not required for computation.

So to fix that, in execute any user node is now marked with optional.
I believe this is fine to do, because if this is not the case, there will be a
node in the graph that will have baz as a REQUIRED dependency, and
thus things will break appropriately.

To help with that, I also fixed and added some unit tests.

One unit test is to ensure that we don't remove passing in None values
as part of the kwargs to the function. Since that's what we do now, and this
was another way to fix this bug, which I think would be the wrong solution.

To reproduce the bug:

# repro.py
import pandas as pd
def foo(bar: int = None) -> pd.Series:
    return pd.Series([bar])
from hamilton import driver
import repro

dr = driver.Driver({}, repro)
df = dr.execute(['foo'])
print(df)

Causes:
NotImplementedError: bar was expected to be passed in but was not.

Changes

  • any user node, when we come to compute, from a top level is optional.

Testing

  1. Adds unit tests.
  2. Repro case now results in:
0  None

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Python - local testing

  • python 3.6
  • python 3.7

@skrawcz
Copy link
Collaborator Author

skrawcz commented May 30, 2022

Ray issue is due to ray-project/ray#25282

@skrawcz skrawcz requested a review from elijahbenizzy May 30, 2022 22:17
@skrawcz skrawcz force-pushed the stabilize_node_execution_a_bit branch from a234bbb to cd8611c Compare May 30, 2022 22:56
@elijahbenizzy
Copy link
Collaborator

Ok, I think I see what you're getting at here. i think its OK, but not 100% sure. I guess my question is why would anyone be asking for the user-provided input as an output? Feels a little wonky, but I don't see a great reason why not...

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good -- if I fully understand. Don't 100% get the original use-case this was breaking though...

@skrawcz
Copy link
Collaborator Author

skrawcz commented May 31, 2022

I think this looks good -- if I fully understand. Don't 100% get the original use-case this was breaking though...

added example breaking case to description.

@skrawcz skrawcz requested a review from elijahbenizzy May 31, 2022 04:33
@elijahbenizzy
Copy link
Collaborator

I think this looks good -- if I fully understand. Don't 100% get the original use-case this was breaking though...

Ok I think I understand. Surprised we didn't catch this the first time, and it feels a little hacky, but I think its a fine solution.

I think this looks good -- if I fully understand. Don't 100% get the original use-case this was breaking though...

added example breaking case to description.

Alright, yep, makes sense. Thanks!

tests/test_graph.py Outdated Show resolved Hide resolved
@skrawcz skrawcz force-pushed the stabilize_node_execution_a_bit branch from cd8611c to ba32928 Compare June 21, 2022 04:11
The execute function gets all upstream nodes of the required node to compute.
This will mean that there will likely be "user input" nodes to cycle through. When
we were computing the DFS value for them, we would assume they were required.

To illustrate, if you had a function that had optional input, `baz` e.g.

```python

def foo(bar: int, baz: float = 1.0) -> float:
```
This meant that if you did not pass in a value for `baz`, and `baz`
was a user input, Hamilton would complain that a required node
was not provided. Even though it was not required for computation.

So to fix that, in execute any user node is now marked with `optional`.
I believe this is fine to do, because if this is not the case, there will be a
node in the graph that will have `baz` as a REQUIRED dependency, and
thus things will break appropriately.

To help with that, I also fixed and added some unit tests.

One unit test is to ensure that we don't remove passing in `None` values
as part of the kwargs to the function. Since that's what we do now, and this
was another way to fix this bug, which I think would be the wrong way to go about it.

Otherwise I added tests to ensure that node order does not change the result too.
@skrawcz skrawcz force-pushed the stabilize_node_execution_a_bit branch from ba32928 to 21c007e Compare June 21, 2022 05:11
@skrawcz skrawcz merged commit 15271fa into main Jun 21, 2022
@skrawcz skrawcz deleted the stabilize_node_execution_a_bit branch June 21, 2022 17:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants