Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial Bytewax materialization engine #2974

Merged
merged 4 commits into from
Aug 15, 2022

Conversation

whoahbot
Copy link
Collaborator

What this PR does / why we need it:

This PR adds a Bytewax batch materialization engine.

Which issue(s) this PR fixes:

Fixes #

@whoahbot whoahbot force-pushed the bytewax_engine branch 2 times, most recently from 9a673b0 to be0af00 Compare July 28, 2022 16:08
docs/reference/batch-materialization/bytewax.md Outdated Show resolved Hide resolved
{
"command": ["sh", "-c", "sh ./entrypoint.sh"],
"env": job_env,
"image": "bytewax/bytewax-feast:latest",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that this is https://hub.docker.com/r/bytewax/bytewax-feast - can we add some docs on what dependencies this container comes with and how it's built?

Additionally, I think this should be configurable (so that feast users can potentially supply a customer container with custom online store implementation), with this image being the default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it configurable is a good idea.

My thought was to create a repo that would be used to build the bytewax/bytewax-feast image and push it to Dockerhub. People that want to make changes can fork that repo and build whichever images they would like to use. What do you think?

I haven't taken that step quite yet as I was building the Feast SDK from the source files that included the changes in this PR.

Copy link
Member

@achals achals Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a repo that builds the image makes sense. Having it as part of the main repo may be good too, since that keeps everything in the same place. Doing it in a subsequent PR is fine by me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I'll add it in a subsequent PR, since it will depend on being able to install a release version of Feast that includes this code.

@achals
Copy link
Member

achals commented Jul 28, 2022

/ok-to-test

@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2022

Codecov Report

Merging #2974 (383e3cb) into master (0ed1a63) will increase coverage by 8.50%.
The diff coverage is 44.00%.

@@            Coverage Diff             @@
##           master    #2974      +/-   ##
==========================================
+ Coverage   67.44%   75.94%   +8.50%     
==========================================
  Files         169      203      +34     
  Lines       14936    16939    +2003     
==========================================
+ Hits        10074    12865    +2791     
+ Misses       4862     4074     -788     
Flag Coverage Δ
integrationtests 67.01% <44.00%> (-0.44%) ⬇️
unittests 58.27% <44.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/repo_config.py 88.80% <ø> (+5.40%) ⬆️
setup.py 0.00% <0.00%> (ø)
...on/materialization/contrib/bytewax/test_bytewax.py 45.83% <45.83%> (ø)
sdk/python/feast/cli.py 41.59% <0.00%> (-0.10%) ⬇️
...ocal_feast_tests/test_stream_feature_view_apply.py
...ffline_stores/contrib/athena_repo_configuration.py 50.00% <0.00%> (ø)
...offline_stores/contrib/spark_repo_configuration.py 100.00% <0.00%> (ø)
...b/cassandra_online_store/cassandra_online_store.py 2.63% <0.00%> (ø)
...hon/feast/infra/utils/postgres/connection_utils.py 48.00% <0.00%> (ø)
...line_stores/contrib/postgres_repo_configuration.py 100.00% <0.00%> (ø)
... and 102 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@whoahbot whoahbot force-pushed the bytewax_engine branch 3 times, most recently from dc21105 to dc7dd79 Compare August 2, 2022 20:55
@whoahbot whoahbot requested a review from achals August 9, 2022 15:22
@achals
Copy link
Member

achals commented Aug 11, 2022

@whoahbot looking at this PR now, but do you mind signing your commits? Also you may need a rebase!

@whoahbot whoahbot force-pushed the bytewax_engine branch 2 times, most recently from dc984e9 to d7a1b9c Compare August 15, 2022 15:31
- Add integration test, by factoring out shared consistency test.
- Make the number of Pods dynamic, based on the number of .parquet
  file paths.
- Add instructions for creating a bytewax test cluster for
  integration testing.

Signed-off-by: Dan Herrera <[email protected]>
Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, whoahbot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 55c61f9 into feast-dev:master Aug 15, 2022
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants