-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Incorporate substrait ODFVs into ibis-based offline store queries #4102
Conversation
Signed-off-by: tokoko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, maybe a bad question, does it mean the substrait transformation is only available in ibis offline store? Is it possible to use it with other offline store? Suppose I have a s3 file sources and want to use the substrait transform directly
@HaoXuAI No, it's applicable for all offline stores, the difference is that for non-ibis offline stores the whole dataset will be collected to a single process as an arrow table and
What this PR adds on top of this is that for ibis-based offline stores collect to arrow is no longer necessary and instead of P.S. duckdb offline store which is for now the the only ibis-based implementation can work with s3 file sources even now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
What this PR does / why we need it:
This PR changes the offline flow for ODFV execution in ibis-based offline stores (currently only duckdb). Instead of collecting raw offline store output into a pyarrow table and applying a substrait transformation with acero, ibis-based offline stores now get ibis functions and directly extend ibis logical plan, meaning ODFVs will be executed by the offline store engine itself.
The feature necessitated a couple of underlying changes in
substrait
ODFVs:substrait
ODVS now also store ibis python functions in protos to directly apply these functions intransform_ibis
. This can probably be avoided in the future if there's a good substrait to ibis compiler in place, but currently is the simplest solution. Note: there are other problems here as well, substrait plans need to know precisely what input columns they will consume, this is problematic in the offline flow as there's no way to know beforehand which features the user will request inget_historical_features
call. To correctly apply ODFVs, one would need to apply each ODFV to a relevant subset of columns and then join all tables back together (kind of like how it's done in the online flow), which is a horrible waste of resources.With this PR, all of the planned ibis/substrait features will now be in place in python sdk. The next steps will be to enable native substrait ODFV execution in non-python sdks as well.
Which issue(s) this PR fixes:
Fixes #3979