fix: Fixing Spark min / max entity df event timestamps range return order #2735

levpickis · 2022-05-26T18:22:27Z

… max entity-DF event timestamps in the Spark offline store.

Signed-off-by: Lev Pickovsky [email protected]

What this PR does / why we need it:
This PR is meant to fix the returned order of elements when calculating the timestamp range of an entity-DF in the _get_entity_df_event_timestamp_range method of the Spark offline store class in case the entity-DF is provided as a string. This method returns a tuple with 2 elements - the min and max timestamps encountered in the entity-DF. Currently, in case the entity-DF is a string, it returns the max as the first element and the min as the second, but the code that uses these values later on seems to expect the min element to be the first one (and it can also be observed that the min is the first one to be returned when the provided entity-DF is a Pandas DF). The issue was discovered following failing real-life tests of the Spark offline store which this fix seems to have resolved.

… max entity-DF event timestamps in the Spark offline store. Signed-off-by: Lev Pickovsky <[email protected]>

codecov-commenter · 2022-05-26T19:20:20Z

Codecov Report

Merging #2735 (b181bde) into master (00ed65a) will decrease coverage by 12.06%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master    #2735       +/-   ##
===========================================
- Coverage   82.45%   70.39%   -12.07%     
===========================================
  Files         155      187       +32     
  Lines       12788    21929     +9141     
===========================================
+ Hits        10544    15436     +4892     
- Misses       2244     6493     +4249

Flag	Coverage Δ
integrationtests	`68.95% <ø> (-3.45%)`	⬇️
unittests	`58.69% <ø> (-1.77%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ffline_stores/contrib/spark_offline_store/spark.py	`37.41% <ø> (+0.40%)`	⬆️
...ython/feast/embedded_go/online_features_service.py	`25.00% <0.00%> (-71.97%)`	⬇️
sdk/python/tests/unit/test_feature_views.py	`37.63% <0.00%> (-62.37%)`	⬇️
...gration/online_store/test_push_online_retrieval.py	`42.10% <0.00%> (-57.90%)`	⬇️
sdk/python/feast/embedded_go/type_map.py	`44.00% <0.00%> (-56.00%)`	⬇️
.../integration/online_store/test_universal_online.py	`44.69% <0.00%> (-50.16%)`	⬇️
...on/tests/integration/registration/test_registry.py	`55.75% <0.00%> (-40.13%)`	⬇️
sdk/python/feast/infra/aws.py	`33.09% <0.00%> (-40.11%)`	⬇️
sdk/python/tests/unit/test_data_sources.py	`65.06% <0.00%> (-34.94%)`	⬇️
...sts/integration/registration/test_feature_store.py	`67.91% <0.00%> (-31.41%)`	⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00ed65a...b181bde. Read the comment docs.

adchia · 2022-05-27T17:21:54Z

thanks for the contribution!

Would you mind adding a test for this?

kevjumba · 2022-06-17T17:52:15Z

@levpick are you still working on this?

achals · 2022-07-21T04:56:25Z

/lgtm

feast-ci-bot · 2022-07-25T14:05:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, levpick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adchia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

# [0.23.0](v0.22.0...v0.23.0) (2022-08-02) ### Bug Fixes * Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228)) * Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be)) * Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9)) * Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a)) * Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910)) * Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053)) * Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae)) * Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69)) * Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd)) * Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540)) * Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e)) * Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e)) * Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69)) * Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588)) * Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2)) * Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb)) * Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b)) * Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac)) * Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902)) * Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001)) * Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34)) * Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605) * Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0)) * Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1)) * Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3)) ### Features * Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069)) * Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef)) * Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38)) * Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8)) * Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695)) * Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca)) * Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca)) * Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903) * Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903) * Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c)) * Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))

fix: Fixing the return order of elements when calculating the min and…

b181bde

… max entity-DF event timestamps in the Spark offline store. Signed-off-by: Lev Pickovsky <[email protected]>

feast-ci-bot added the size/XS label May 26, 2022

kevjumba added the ok-to-test label Jun 21, 2022

feast-ci-bot assigned achals Jul 21, 2022

feast-ci-bot added the lgtm label Jul 21, 2022

adchia changed the title ~~fix: Fixing the return order of elements when calculating the min and…~~ fix: Fixing Spark min / max entity df event timestamps range return order Jul 25, 2022

adchia approved these changes Jul 25, 2022

View reviewed changes

feast-ci-bot added the approved label Jul 25, 2022

feast-ci-bot merged commit ac55ce2 into feast-dev:master Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fixing Spark min / max entity df event timestamps range return order #2735

fix: Fixing Spark min / max entity df event timestamps range return order #2735

levpickis commented May 26, 2022

codecov-commenter commented May 26, 2022 •

edited

Loading

adchia commented May 27, 2022

kevjumba commented Jun 17, 2022

achals commented Jul 21, 2022

feast-ci-bot commented Jul 25, 2022

fix: Fixing Spark min / max entity df event timestamps range return order #2735

fix: Fixing Spark min / max entity df event timestamps range return order #2735

Conversation

levpickis commented May 26, 2022

codecov-commenter commented May 26, 2022 • edited Loading

Codecov Report

adchia commented May 27, 2022

kevjumba commented Jun 17, 2022

achals commented Jul 21, 2022

feast-ci-bot commented Jul 25, 2022

codecov-commenter commented May 26, 2022 •

edited

Loading