-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feast Integration #322
Feast Integration #322
Conversation
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
# * - ``get_historical_features()`` | ||
# - Enrich an entity dataframe with historical feature values for either training or batch scoring. | ||
@task | ||
def store_offline(parquet_file: FlyteFile, repo_path: str) -> (str, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just create it like a pre-existing task plugin? Where given a parquet file the task will automatically upload the data to Feast Offline feature store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumare3: What do we do about this? When I write a plugin, the input will not just be a parquet file, we'll have to take features, primary key, etc. Similarly, when retrieving the offline features, the user has to give the primary key and datetime values. The same applies to online features as well.
} | ||
) | ||
|
||
retrieval_job = fs.get_historical_features( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these historical features stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, locally.
project: feature_engineering
registry: data/registry.db
provider: local
online_store:
path: data/online_store.db
# One key difference between the online store and data source is that only the latest feature values are stored per entity key. No historical values are stored. | ||
# Our dataset has two such entries with the same ``Hospital Number`` but different time stamps. Only data point with the latest timestamp is picked from the online store. | ||
@task | ||
def store_online(repo_path: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too can be a tasktype already right?
FeastOnlineStoreTask
@task | ||
def store_online(repo_path: str) -> str: | ||
store = FeatureStore(repo_path=repo_path) | ||
store.materialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the time hard coded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take inputs from the user. But then, it can be days, hours, minutes, ... etc. We'll have to ask the user to give four inputs: two specifying the start and end time format, and the other two specifying their respective values.
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
* Initial version Signed-off-by: Eduardo Apolinario <[email protected]> * Add venv to dockerfile Signed-off-by: Eduardo Apolinario <[email protected]> * Rename feast integration dir Signed-off-by: Eduardo Apolinario <[email protected]> * Configure minio in the image Signed-off-by: Eduardo Apolinario <[email protected]> * Refactoring + retrieve offline features Signed-off-by: Eduardo Apolinario <[email protected]> * Remove all_together Signed-off-by: Eduardo Apolinario <[email protected]> * Attempt to add s3 credentials to image Signed-off-by: Eduardo Apolinario <[email protected]> * Fix s3 endpoint Signed-off-by: Eduardo Apolinario <[email protected]> * custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Transform FeatureView prior to executing queries Signed-off-by: Eduardo Apolinario <[email protected]> * Set PYTHONPATH Signed-off-by: Eduardo Apolinario <[email protected]> * Set PYTHONPATH to multiple values Signed-off-by: Eduardo Apolinario <[email protected]> * Remove "custom_provider" from path Signed-off-by: Eduardo Apolinario <[email protected]> * Replace minio endpoint Signed-off-by: Eduardo Apolinario <[email protected]> * Print env vars Signed-off-by: Eduardo Apolinario <[email protected]> * Set FEAST_S3_ENDPOINT_URL while building feature store Signed-off-by: Eduardo Apolinario <[email protected]> * Remove minio credentials from image Signed-off-by: Eduardo Apolinario <[email protected]> * Add aws env vars Signed-off-by: Eduardo Apolinario <[email protected]> * Remove mention to local provider Signed-off-by: Eduardo Apolinario <[email protected]> * Remove piping of registry object Signed-off-by: Eduardo Apolinario <[email protected]> * Create random path via FlyteContext Signed-off-by: Eduardo Apolinario <[email protected]> * Revert "Remove piping of registry object" This reverts commit ccdf326. Signed-off-by: Eduardo Apolinario <[email protected]> * Clean up feature description and remove debugging statements Signed-off-by: Eduardo Apolinario <[email protected]> * Add tasks up to `train_model` Signed-off-by: Eduardo Apolinario <[email protected]> * Rename workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Comment use of custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Rename workflow Signed-off-by: Eduardo Apolinario <[email protected]> * fix error in training Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> * Add TODO Signed-off-by: Eduardo Apolinario <[email protected]> * Import feature_eng tasks directly Signed-off-by: Eduardo Apolinario <[email protected]> * Add store_online task Signed-off-by: Eduardo Apolinario <[email protected]> * Copy remote file to a local file and replace batch_source in materialize Signed-off-by: Eduardo Apolinario <[email protected]> * Add some debugging statements and fix local execution parameter Signed-off-by: Eduardo Apolinario <[email protected]> * Add remaining steps to workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements and put replacement of remote files back in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Add more logging Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements again Signed-off-by: Eduardo Apolinario <[email protected]> * Add workflow return type Signed-off-by: Eduardo Apolinario <[email protected]> * Include a directory prefix in the model filename Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused overrides in custom provider and comment use of localize_feature_view Signed-off-by: Eduardo Apolinario <[email protected]> * Add type transformer Signed-off-by: Eduardo Apolinario <[email protected]> * Pipe _Feature_Store to all interactions with feast Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unnecessary override in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Rearrange initialization of FeatureStore for better legibility Signed-off-by: Eduardo Apolinario <[email protected]> * Revert "Remove unnecessary override in custom provider" This reverts commit 2808ba0. Signed-off-by: Eduardo Apolinario <[email protected]> * Use create_node to enforce order Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused function Signed-off-by: Eduardo Apolinario <[email protected]> * Guard env vars behind a check Signed-off-by: Eduardo Apolinario <[email protected]> * Expose inputs to workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Task to build FeatureStore Signed-off-by: Eduardo Apolinario <[email protected]> * Do not guard env vars behind a check Signed-off-by: Eduardo Apolinario <[email protected]> * Experiment with converted_df Signed-off-by: Eduardo Apolinario <[email protected]> * Comments Signed-off-by: Eduardo Apolinario <[email protected]> * Remove commented code from type transformer Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused portion of sandbox.config Signed-off-by: Eduardo Apolinario <[email protected]> * Remove TODO Signed-off-by: Eduardo Apolinario <[email protected]> * Remove registry parameter from local execution Signed-off-by: Eduardo Apolinario <[email protected]> * No need for type transformers Signed-off-by: Eduardo Apolinario <[email protected]> * Remove mentions to type transformers Signed-off-by: Eduardo Apolinario <[email protected]> * Copy README.rst from #322 Signed-off-by: Eduardo Apolinario <[email protected]> * Step 3 of guide on adding a new integration Signed-off-by: Eduardo Apolinario <[email protected]> * Remove extraneous print statement and turn comments into docstrings in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Comments on README.rst Signed-off-by: Eduardo Apolinario <[email protected]> * Fix link to feast Signed-off-by: Eduardo Apolinario <[email protected]> * Fix serialization of feast_integration dir Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Co-authored-by: Samhita Alla <[email protected]>
@reference_task( | ||
project="flytesnacks", | ||
domain="development", | ||
name="feast_integration.feature_eng_tasks.mean_median_imputer", | ||
version="v1", | ||
) | ||
def mean_median_imputer( | ||
dataframe: pd.DataFrame, | ||
imputation_method: str, | ||
) -> FlyteSchema: | ||
... | ||
|
||
|
||
@reference_task( | ||
project="flytesnacks", | ||
domain="development", | ||
name="feast_integration.feature_eng_tasks.univariate_selection", | ||
version="v1", | ||
) | ||
def univariate_selection( | ||
dataframe: pd.DataFrame, num_features: int, data_class: str, feature_view_name: str | ||
) -> pd.DataFrame: | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
* Initial version Signed-off-by: Eduardo Apolinario <[email protected]> * Add venv to dockerfile Signed-off-by: Eduardo Apolinario <[email protected]> * Rename feast integration dir Signed-off-by: Eduardo Apolinario <[email protected]> * Configure minio in the image Signed-off-by: Eduardo Apolinario <[email protected]> * Refactoring + retrieve offline features Signed-off-by: Eduardo Apolinario <[email protected]> * Remove all_together Signed-off-by: Eduardo Apolinario <[email protected]> * Attempt to add s3 credentials to image Signed-off-by: Eduardo Apolinario <[email protected]> * Fix s3 endpoint Signed-off-by: Eduardo Apolinario <[email protected]> * custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Transform FeatureView prior to executing queries Signed-off-by: Eduardo Apolinario <[email protected]> * Set PYTHONPATH Signed-off-by: Eduardo Apolinario <[email protected]> * Set PYTHONPATH to multiple values Signed-off-by: Eduardo Apolinario <[email protected]> * Remove "custom_provider" from path Signed-off-by: Eduardo Apolinario <[email protected]> * Replace minio endpoint Signed-off-by: Eduardo Apolinario <[email protected]> * Print env vars Signed-off-by: Eduardo Apolinario <[email protected]> * Set FEAST_S3_ENDPOINT_URL while building feature store Signed-off-by: Eduardo Apolinario <[email protected]> * Remove minio credentials from image Signed-off-by: Eduardo Apolinario <[email protected]> * Add aws env vars Signed-off-by: Eduardo Apolinario <[email protected]> * Remove mention to local provider Signed-off-by: Eduardo Apolinario <[email protected]> * Remove piping of registry object Signed-off-by: Eduardo Apolinario <[email protected]> * Create random path via FlyteContext Signed-off-by: Eduardo Apolinario <[email protected]> * Revert "Remove piping of registry object" This reverts commit ccdf3264bb5b428eb2b474d8422a21a5bb82b0b5. Signed-off-by: Eduardo Apolinario <[email protected]> * Clean up feature description and remove debugging statements Signed-off-by: Eduardo Apolinario <[email protected]> * Add tasks up to `train_model` Signed-off-by: Eduardo Apolinario <[email protected]> * Rename workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Comment use of custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Rename workflow Signed-off-by: Eduardo Apolinario <[email protected]> * fix error in training Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> * Add TODO Signed-off-by: Eduardo Apolinario <[email protected]> * Import feature_eng tasks directly Signed-off-by: Eduardo Apolinario <[email protected]> * Add store_online task Signed-off-by: Eduardo Apolinario <[email protected]> * Copy remote file to a local file and replace batch_source in materialize Signed-off-by: Eduardo Apolinario <[email protected]> * Add some debugging statements and fix local execution parameter Signed-off-by: Eduardo Apolinario <[email protected]> * Add remaining steps to workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements and put replacement of remote files back in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Add more logging Signed-off-by: Eduardo Apolinario <[email protected]> * Regenerate requirements again Signed-off-by: Eduardo Apolinario <[email protected]> * Add workflow return type Signed-off-by: Eduardo Apolinario <[email protected]> * Include a directory prefix in the model filename Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused overrides in custom provider and comment use of localize_feature_view Signed-off-by: Eduardo Apolinario <[email protected]> * Add type transformer Signed-off-by: Eduardo Apolinario <[email protected]> * Pipe _Feature_Store to all interactions with feast Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unnecessary override in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Rearrange initialization of FeatureStore for better legibility Signed-off-by: Eduardo Apolinario <[email protected]> * Revert "Remove unnecessary override in custom provider" This reverts commit 2808ba07b2ba73f77d2f6c1a08aba0c2cdccea97. Signed-off-by: Eduardo Apolinario <[email protected]> * Use create_node to enforce order Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused function Signed-off-by: Eduardo Apolinario <[email protected]> * Guard env vars behind a check Signed-off-by: Eduardo Apolinario <[email protected]> * Expose inputs to workflow Signed-off-by: Eduardo Apolinario <[email protected]> * Task to build FeatureStore Signed-off-by: Eduardo Apolinario <[email protected]> * Do not guard env vars behind a check Signed-off-by: Eduardo Apolinario <[email protected]> * Experiment with converted_df Signed-off-by: Eduardo Apolinario <[email protected]> * Comments Signed-off-by: Eduardo Apolinario <[email protected]> * Remove commented code from type transformer Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused portion of sandbox.config Signed-off-by: Eduardo Apolinario <[email protected]> * Remove TODO Signed-off-by: Eduardo Apolinario <[email protected]> * Remove registry parameter from local execution Signed-off-by: Eduardo Apolinario <[email protected]> * No need for type transformers Signed-off-by: Eduardo Apolinario <[email protected]> * Remove mentions to type transformers Signed-off-by: Eduardo Apolinario <[email protected]> * Copy README.rst from flyteorg/flytesnacks#322 Signed-off-by: Eduardo Apolinario <[email protected]> * Step 3 of guide on adding a new integration Signed-off-by: Eduardo Apolinario <[email protected]> * Remove extraneous print statement and turn comments into docstrings in custom provider Signed-off-by: Eduardo Apolinario <[email protected]> * Comments on README.rst Signed-off-by: Eduardo Apolinario <[email protected]> * Fix link to feast Signed-off-by: Eduardo Apolinario <[email protected]> * Fix serialization of feast_integration dir Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Co-authored-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla [email protected]
Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.
Integration between Flyte and Feast can help users take their models and features from prototyping all the way to production cost-effectively and efficiently.