Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli added config support and validate command #729

Merged
merged 11 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions examples/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,28 @@ Test the installation with

# Features

Currently 4 commands:

## Commands
- `build`: creates a Hamilton `Driver` from specified modules. It"s useful to validate the dataflow definition
- `validate`: calls `Driver.validate_execution()` for a set of `inputs` and `overrides` passed through the `--context` option.
- `view`: calls `dr.display_all_functions()` on the built `Driver`
- `version`: generates node hashes based on their source code, and a dataflow hash from the collection of node hashes.
- `diff`: get a diff of added/deleted/edited nodes between the current version of Python modules and another git reference (`default=HEAD`, i.e., the last commited version). You can get a visualization of the diffs

## Options
- all commands receive `MODULES` which is a list of path to Python modules to assembled as a single dataflow
- all commands receive `--context` (`-ctx`), which is a file (`.py` or `.json`) that include top-level headers (see `config.py` and `config.json` in this repo for example):
- `HAMILTON_CONFIG`: `typing.Mapping` passed to `driver.Builder.with_config()`
- `HAMILTON_FINAL_VARS`: `typing.Sequence` passed to `driver.validate_execution(final_vars=...)`
- `HAMILTON_INPUTS`: `typing.Mapping` passed to `driver.validate_execution(inputs=...)`
- `HAMILTON_OVERRIDES`: `typing.Mapping` passed to `driver.validate_execution(overrides=...)`
- Using a `.py` context file provides more flexibility than `.json` to define inputs and overrides objects.
- all commands receive a `--name` (`-n`), which is used to name the output file (when the command produces a file). If `None`, a file name will be derived from the `MODULES` argument.
- When using a command that generates a file:
- passing a file path: will output the file with this name at this location
- passing a directory: will output the file with the `--name` value (either explicit or default derived from `MODULES`) at this location
- passing a file path with the name `default`: will output the file with the name replaced by `--name` value at this location. This is useful when you need to specify a type via filename. For example, `hamilton view -o /path/to/default.pdf my_dataflow.py` will create the file `/path/to/my_dataflow.pdf`. (This behavior may change)


See [DOCS.md](./DOCS.md) for the full references

# Usage
Expand Down
6 changes: 6 additions & 0 deletions examples/cli/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
zilto marked this conversation as resolved.
Show resolved Hide resolved
"HAMILTON_CONFIG": {
"holiday": "halloween"
},
"HAMILTON_FINAL_VARS": ["customers_df", "customer_summary_table"]
}
3 changes: 3 additions & 0 deletions examples/cli/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
HAMILTON_CONFIG = dict(config_exists="true")

HAMILTON_FINAL_VARS = ["config_when", "customer_summary_table"]
11 changes: 9 additions & 2 deletions examples/cli/module_v1.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
import pandas as pd

from hamilton.function_modifiers import extract_columns
from hamilton.function_modifiers import config, extract_columns


def customers_df(customers_path: str = "customers.csv") -> pd.DataFrame:
@config.when(holiday="halloween")
def customers_df__halloween() -> pd.DataFrame:
"""Example of using @config.when function modifier"""
return pd.read_csv("/path/to/halloween/customers.csv")
elijahbenizzy marked this conversation as resolved.
Show resolved Hide resolved


@config.when_not(holiday="halloween")
def customers_df__default(customers_path: str = "customers.csv") -> pd.DataFrame:
"""Load the customer dataset."""
return pd.read_csv(customers_path)

Expand Down
Loading