This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes case where optional user inputs broke computation
The execute function gets all upstream nodes of the required node to compute. This will mean that there will likely be "user input" nodes to cycle through. When we were computing the DFS value for them, we would assume they were required. To illustrate, if you had a function that had optional input, `baz` e.g. ```python def foo(bar: int, baz: float = 1.0) -> float: ``` This meant that if you did not pass in a value for `baz`, and `baz` was a user input, Hamilton would complain that a required node was not provided. Even though it was not required for computation. So to fix that, in execute any user node is now marked with `optional`. I believe this is fine to do, because if this is not the case, there will be a node in the graph that will have `baz` as a REQUIRED dependency, and thus things will break appropriately. To help with that, I also fixed and added some unit tests. One unit test is to ensure that we don't remove passing in `None` values as part of the kwargs to the function. Since that's what we do now, and this was another way to fix this bug, which I think would be the wrong solution. Otherwise I added tests to ensure that node order does not change the result too.
- Loading branch information
Showing
4 changed files
with
126 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
import importlib | ||
import logging | ||
import sys | ||
|
||
import pandas as pd | ||
from hamilton import driver | ||
|
||
logging.basicConfig(stream=sys.stdout) | ||
initial_columns = { # load from actuals or wherever -- this is our initial data we use as input. | ||
# Note: these values don't have to be all series, they could be a scalar. | ||
'signups': pd.Series([1, 10, 50, 100, 200, 400]), | ||
'spend': pd.Series([10, 10, 20, 40, 40, 50]), | ||
} | ||
# we need to tell hamilton where to load function definitions from | ||
module_name = 'my_functions' | ||
module = importlib.import_module(module_name) | ||
dr = driver.Driver(initial_columns, module) # can pass in multiple modules | ||
# we need to specify what we want in the final dataframe. | ||
output_columns = [ | ||
'spend', | ||
'signups', | ||
'avg_3wk_spend', | ||
'spend_per_signup', | ||
'spend_zero_mean_unit_variance' | ||
] | ||
# let's create the dataframe! | ||
df = dr.execute(output_columns) | ||
print(df.to_string()) | ||
|
||
# To visualize do `pip install sf-hamilton[visualization]` if you want these to work | ||
# dr.visualize_execution(output_columns, './my_dag.dot', {}, graphviz_kwargs=dict(graph_attr={'ratio': '1'})) | ||
# dr.display_all_functions('./my_full_dag.dot', graphviz_kwargs=dict(graph_attr={'ratio': '1'})) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters