Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement spark offline store offline_write_batch method #3076

Merged
merged 6 commits into from
Aug 18, 2022

Conversation

niklasvm
Copy link
Collaborator

What this PR does / why we need it:

  • Create offline_write_batch method for spark offline store
  • Replace spark testing data sets with a file-based parquet format instead of a temporary view

This PR resolves a further set of failing integration tests.

Which issue(s) this PR fixes:

None

@niklasvm niklasvm changed the title Add offline write batch feat: add spark offline store offline_write_batch method Aug 12, 2022
@niklasvm niklasvm changed the title feat: add spark offline store offline_write_batch method feat: Implement spark offline store offline_write_batch method Aug 12, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2022

Codecov Report

Merging #3076 (5ed0cf0) into master (4b1476f) will decrease coverage by 9.24%.
The diff coverage is 16.27%.

@@            Coverage Diff             @@
##           master    #3076      +/-   ##
==========================================
- Coverage   67.27%   58.03%   -9.25%     
==========================================
  Files         169      203      +34     
  Lines       14859    16879    +2020     
==========================================
- Hits         9997     9796     -201     
- Misses       4862     7083    +2221     
Flag Coverage Δ
integrationtests ?
unittests 58.03% <16.27%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ffline_stores/contrib/spark_offline_store/spark.py 32.98% <11.76%> (ø)
...s/contrib/spark_offline_store/tests/data_source.py 40.74% <33.33%> (ø)
...sts/integration/registration/test_universal_cli.py 20.20% <0.00%> (-79.80%) ⬇️
...ts/integration/offline_store/test_offline_write.py 26.08% <0.00%> (-73.92%) ⬇️
...fline_store/test_universal_historical_retrieval.py 28.75% <0.00%> (-71.25%) ⬇️
...ests/integration/e2e/test_python_feature_server.py 29.50% <0.00%> (-70.50%) ⬇️
...dk/python/tests/integration/e2e/test_validation.py 27.55% <0.00%> (-69.30%) ⬇️
...s/integration/registration/test_universal_types.py 32.25% <0.00%> (-67.75%) ⬇️
sdk/python/feast/infra/online_stores/redis.py 28.39% <0.00%> (-66.58%) ⬇️
sdk/python/tests/integration/e2e/test_usage_e2e.py 33.87% <0.00%> (-66.13%) ⬇️
... and 158 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@niklasvm niklasvm marked this pull request as ready for review August 12, 2022 15:57
@niklasvm
Copy link
Collaborator Author

/ok-to-test

@feast-ci-bot
Copy link
Collaborator

@niklasvm: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@niklasvm
Copy link
Collaborator Author

/assign @kevjumba

@achals
Copy link
Member

achals commented Aug 17, 2022

@niklasvm can you rebase? I think we've fixed the lingering issues with the go unit tests.

@niklasvm
Copy link
Collaborator Author

@niklasvm can you rebase? I think we've fixed the lingering issues with the go unit tests.

Thank you. I've rebased. Waiting for tests now

@niklasvm
Copy link
Collaborator Author

@achals looks like tests ran successfully. There are still 2 pending ones. I am not sure if you need to approve before they run

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, niklasvm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 5b0cc87 into feast-dev:master Aug 18, 2022
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants