feat: Add column reordering to `write_to_offline_store` #2876

felixwang9817 · 2022-06-28T18:06:03Z

What this PR does / why we need it: In addition to adding column reordering logic, this PR adds logic for extracting the latest feature values into the SparkKafkaProcessor.

Which issue(s) this PR fixes:

Fixes #

codecov-commenter · 2022-06-28T18:14:49Z

Codecov Report

Merging #2876 (e54ea6e) into master (86e9efd) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2876      +/-   ##
==========================================
- Coverage   80.68%   80.59%   -0.09%     
==========================================
  Files         176      176              
  Lines       15670    15663       -7     
==========================================
- Hits        12643    12624      -19     
- Misses       3027     3039      +12

Flag	Coverage Δ
integrationtests	`70.75% <100.00%> (-0.17%)`	⬇️
unittests	`59.34% <8.69%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py	`87.69% <ø> (ø)`
sdk/python/feast/infra/offline_stores/file.py	`93.99% <ø> (-0.43%)`	⬇️
sdk/python/feast/infra/offline_stores/redshift.py	`91.58% <ø> (-0.50%)`	⬇️
sdk/python/feast/infra/offline_stores/snowflake.py	`90.52% <ø> (ø)`
sdk/python/feast/feature_store.py	`87.12% <100.00%> (+0.10%)`	⬆️
...ests/integration/e2e/test_python_feature_server.py	`100.00% <100.00%> (ø)`
...ts/integration/offline_store/test_offline_write.py	`100.00% <100.00%> (ø)`
...ation/offline_store/test_push_offline_retrieval.py	`100.00% <100.00%> (ø)`
...gration/online_store/test_push_online_retrieval.py	`100.00% <100.00%> (ø)`
sdk/python/tests/utils/online_read_write_test.py	`93.54% <0.00%> (-6.46%)`	⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86e9efd...e54ea6e. Read the comment docs.

kevjumba · 2022-06-28T18:14:33Z

sdk/python/feast/infra/contrib/spark_kafka_processor.py


 class SparkProcessorConfig(ProcessorConfig):
    spark_session: SparkSession
-    processing_time: str
-    query_timeout: int
+    processing_time: str = "30 seconds"


I think we shouldn't set a default here since we have no clue what the correct window should be. Should force the user to set the processing window.

kevjumba · 2022-06-28T18:14:38Z

sdk/python/feast/infra/contrib/spark_kafka_processor.py

-    processing_time: str
-    query_timeout: int
+    processing_time: str = "30 seconds"
+    query_timeout: int = 15


kevjumba · 2022-06-28T18:28:02Z

sdk/python/feast/infra/contrib/spark_kafka_processor.py

 from feast.infra.contrib.stream_processor import (
    ProcessorConfig,
    StreamProcessor,
    StreamTable,
 )
 from feast.stream_feature_view import StreamFeatureView

+if TYPE_CHECKING:


Per offline conversation, this is dangerous. If we ever want to move the functionality into a supported passthrough function in feature store, this is a circular dependency.

I realized it actually isn't circular lol, updating

kevjumba · 2022-06-28T18:32:28Z

sdk/python/feast/feature_store.py

+        )
+        source_columns = [column for column, _ in column_names_and_types]
+        source_columns = [
+            column for column in source_columns if not re.match("__|__$", column)


If there are columns w/. underscores what is the behavior here? Does it just auto fail? I'm confused about why we need to do this check, are we not writing to these internal columns?

yeah this isn't necessary; good catch

kevjumba

/lgtm

Signed-off-by: Felix Wang <[email protected]>

…ther tests Signed-off-by: Felix Wang <[email protected]>

kevjumba

/lgtm

feast-ci-bot · 2022-06-30T20:42:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: felixwang9817, kevjumba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [felixwang9817,kevjumba]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Add feature extraction logic to batch writer Signed-off-by: Felix Wang <[email protected]> * Enable StreamProcessor to write to both online and offline stores Signed-off-by: Felix Wang <[email protected]> * Fix incorrect columns error message Signed-off-by: Felix Wang <[email protected]> * Reorder columns in _write_to_offline_store Signed-off-by: Felix Wang <[email protected]> * Make _write_to_offline_store a public method Signed-off-by: Felix Wang <[email protected]> * Import FeatureStore correctly Signed-off-by: Felix Wang <[email protected]> * Remove defaults for `processing_time` and `query_timeout` Signed-off-by: Felix Wang <[email protected]> * Clean up `test_offline_write.py` Signed-off-by: Felix Wang <[email protected]> * Do not do any custom logic for double underscore columns Signed-off-by: Felix Wang <[email protected]> * Lint Signed-off-by: Felix Wang <[email protected]> * Switch entity values for all tests using push sources to not affect other tests Signed-off-by: Felix Wang <[email protected]>

# [0.23.0](v0.22.0...v0.23.0) (2022-08-02) ### Bug Fixes * Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228)) * Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be)) * Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9)) * Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a)) * Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910)) * Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053)) * Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae)) * Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69)) * Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd)) * Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540)) * Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e)) * Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e)) * Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69)) * Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588)) * Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2)) * Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb)) * Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b)) * Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac)) * Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902)) * Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001)) * Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34)) * Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605) * Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0)) * Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1)) * Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3)) ### Features * Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069)) * Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef)) * Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38)) * Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8)) * Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695)) * Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca)) * Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca)) * Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903) * Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903) * Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c)) * Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))

feast-ci-bot added approved size/L labels Jun 28, 2022

felixwang9817 added the ok-to-test label Jun 28, 2022

kevjumba reviewed Jun 28, 2022

View reviewed changes

kevjumba approved these changes Jun 29, 2022

View reviewed changes

feast-ci-bot assigned kevjumba Jun 29, 2022

feast-ci-bot added the lgtm label Jun 29, 2022

felixwang9817 added 10 commits June 30, 2022 10:30

Add feature extraction logic to batch writer

0a152ee

Signed-off-by: Felix Wang <[email protected]>

Enable StreamProcessor to write to both online and offline stores

0ce6433

Signed-off-by: Felix Wang <[email protected]>

Fix incorrect columns error message

a47b39b

Signed-off-by: Felix Wang <[email protected]>

Reorder columns in _write_to_offline_store

791322b

Signed-off-by: Felix Wang <[email protected]>

Make _write_to_offline_store a public method

9970032

Signed-off-by: Felix Wang <[email protected]>

Import FeatureStore correctly

d1ce38f

Signed-off-by: Felix Wang <[email protected]>

Remove defaults for processing_time and query_timeout

5ff276c

Signed-off-by: Felix Wang <[email protected]>

Clean up test_offline_write.py

838363a

Signed-off-by: Felix Wang <[email protected]>

Do not do any custom logic for double underscore columns

b090a1d

Signed-off-by: Felix Wang <[email protected]>

Lint

b75883a

Signed-off-by: Felix Wang <[email protected]>

felixwang9817 force-pushed the streaming_tutorial branch from da60e79 to b75883a Compare June 30, 2022 17:30

feast-ci-bot removed the lgtm label Jun 30, 2022

Switch entity values for all tests using push sources to not affect o…

e54ea6e

…ther tests Signed-off-by: Felix Wang <[email protected]>

kevjumba approved these changes Jun 30, 2022

View reviewed changes

feast-ci-bot added the lgtm label Jun 30, 2022

feast-ci-bot merged commit 8abc2ef into feast-dev:master Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add column reordering to `write_to_offline_store` #2876

feat: Add column reordering to `write_to_offline_store` #2876

felixwang9817 commented Jun 28, 2022 •

edited

Loading

codecov-commenter commented Jun 28, 2022 •

edited

Loading

kevjumba Jun 28, 2022

felixwang9817 Jun 29, 2022

kevjumba Jun 28, 2022

felixwang9817 Jun 29, 2022

kevjumba Jun 28, 2022

felixwang9817 Jun 29, 2022

kevjumba Jun 28, 2022

felixwang9817 Jun 29, 2022

kevjumba left a comment

kevjumba left a comment

feast-ci-bot commented Jun 30, 2022

feat: Add column reordering to write_to_offline_store #2876

feat: Add column reordering to write_to_offline_store #2876

Conversation

felixwang9817 commented Jun 28, 2022 • edited Loading

codecov-commenter commented Jun 28, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevjumba left a comment

Choose a reason for hiding this comment

kevjumba left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Jun 30, 2022

feat: Add column reordering to `write_to_offline_store` #2876

feat: Add column reordering to `write_to_offline_store` #2876

felixwang9817 commented Jun 28, 2022 •

edited

Loading

codecov-commenter commented Jun 28, 2022 •

edited

Loading