Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

MattDelac · 2021-09-28T11:00:29Z

Signed-off-by: Matt Delacour [email protected]

What this PR does / why we need it:
Contrary to BigQuery, other offline stores like Trino & Redshit don't support SELECT * EXCEPT(...)
Therefore, we should inject the list of the columns we want at the end of an historical retrieval.

It also makes it easier to understand which columns are included. It's often hard to understand those SELECT * EXCEPT as nothing is explicit

I also fix the Redshit template that is not using the ROW_NUMBER logic as in BigQuery. This was a bug fixed 2 months ago.
Unsure why the unit tests did not fail before even though I added tests at the time 🤷

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

We now pass the list of the columns desired as the output of an historical retrieval to the Jinja template

codecov-commenter · 2021-09-28T11:03:16Z

Codecov Report

Merging #1911 (6baf214) into master (6b10a82) will increase coverage by 0.12%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1911      +/-   ##
==========================================
+ Coverage   81.89%   82.02%   +0.12%     
==========================================
  Files          97       97              
  Lines        7739     7760      +21     
==========================================
+ Hits         6338     6365      +27     
+ Misses       1401     1395       -6

Flag	Coverage Δ
integrationtests	`74.19% <100.00%> (+0.16%)`	⬆️
unittests	`59.30% <35.48%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py	`80.00% <ø> (ø)`
sdk/python/feast/infra/offline_stores/redshift.py	`87.06% <ø> (+2.06%)`	⬆️
...python/feast/infra/offline_stores/offline_utils.py	`91.48% <100.00%> (+1.27%)`	⬆️
sdk/python/feast/infra/utils/aws_utils.py	`70.37% <100.00%> (-0.65%)`	⬇️
...fline_store/test_universal_historical_retrieval.py	`99.40% <100.00%> (+0.11%)`	⬆️
sdk/python/feast/infra/online_stores/sqlite.py	`98.83% <0.00%> (+2.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b10a82...6baf214. Read the comment docs.

sdk/python/feast/infra/offline_stores/redshift.py

woop · 2021-09-28T13:27:59Z

Strange that the tests didn't fail. Your proposed change here looks good to me.

MattDelac · 2021-09-28T13:59:46Z

I don't understand why the tests fail for Redshift.

The difference between the 2 files (BigQuery on the left vs Redshift on the right) are minimal

MattDelac · 2021-09-28T14:03:40Z

I don't understand why the tests fail for Redshift.

The difference between the 2 files (BigQuery on the left vs Redshift on the right) are minimal

Also the fact that there is a CONCAT missing in the redshift template is disturbing to me. I will dig into the internals of Redshift to understand if this is implicit in some manner

adchia · 2021-10-14T14:59:04Z

generally LG to me. pending we should revive the old test you wrote cc @achals

Sorry was the test comment directed at me?

nope it was just an FYI

sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py

MattDelac · 2021-10-14T15:45:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MattDelac

The full list of commands accepted by this bot can be found here.

The pull request process is described here
Needs approval from an approver in each of these files:

Lol I did not do anything 🤷

adchia · 2021-10-14T21:45:43Z

seems like you have some bug in the redshift test.

Roughly, you can see the dataset here

https://github.com/feast-dev/feast/blob/master/sdk/python/tests/data/data_creator.py#L10-L10

It creates something like

| driver_id | value | ts_1 | 
+-----------+-------+------+
|         1 |   0.1 | 4 hr ago
|         2 |   None| now
|         1 |   0.3 | 3 hr ago
|         3 |     4 | 4 hr ago
|         3 |     5 | 1 hr ago

It fails after checking driver_id=3's value after materializing data up to 2 hr ago. Seems likely that your timestamp change for the redshift query made it accidentally materialize that last record (driver_id=3, value=5)?

sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py

…EPT at the end Signed-off-by: Matt Delacour <[email protected]>

Signed-off-by: Matt Delacour <[email protected]>

adchia

/lgtm

feast-ci-bot · 2021-10-15T18:52:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, MattDelac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [MattDelac,adchia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MattDelac requested review from achals, adchia, felixwang9817, tsotnet, woop and a team as code owners September 28, 2021 11:00

feast-ci-bot added release-note needs-kind size/S labels Sep 28, 2021

MattDelac added kind/feature New feature or request ok-to-test and removed size/S labels Sep 28, 2021

feast-ci-bot removed the needs-kind label Sep 28, 2021

MattDelac self-assigned this Sep 28, 2021

feast-ci-bot added the size/S label Sep 28, 2021

MattDelac force-pushed the add_list_of_output_features_context branch from d5005a5 to 8aba10b Compare September 28, 2021 11:48

feast-ci-bot added size/M and removed size/S labels Sep 28, 2021

MattDelac commented Sep 28, 2021

View reviewed changes

sdk/python/feast/infra/offline_stores/redshift.py Show resolved Hide resolved

MattDelac force-pushed the add_list_of_output_features_context branch 2 times, most recently from 8aba10b to 4a1e10e Compare September 28, 2021 12:47

MattDelac force-pushed the add_list_of_output_features_context branch from 4a1e10e to a822d27 Compare September 30, 2021 16:46

feast-ci-bot added size/L and removed size/M labels Sep 30, 2021

MattDelac force-pushed the add_list_of_output_features_context branch from a822d27 to fcaa721 Compare September 30, 2021 16:49

feast-ci-bot removed the size/L label Sep 30, 2021

MattDelac force-pushed the add_list_of_output_features_context branch 3 times, most recently from 4c2cef3 to b84a70c Compare October 14, 2021 10:08

adchia reviewed Oct 14, 2021

View reviewed changes

sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py Outdated Show resolved Hide resolved

feast-ci-bot added the approved label Oct 14, 2021

MattDelac force-pushed the add_list_of_output_features_context branch 4 times, most recently from d1af021 to b5dd4d7 Compare October 14, 2021 18:56

adchia reviewed Oct 15, 2021

View reviewed changes

sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py Show resolved Hide resolved

MattDelac force-pushed the add_list_of_output_features_context branch 2 times, most recently from 42507b6 to 0c17e5b Compare October 15, 2021 15:59

Matt Delacour added 8 commits October 15, 2021 13:03

Add final_output_feature_names in Query context to avoid SELECT * EXC…

45e58d1

…EPT at the end Signed-off-by: Matt Delacour <[email protected]>

Remove the drop_columns concept for AWS Redshift

4b69ef8

Signed-off-by: Matt Delacour <[email protected]>

Format files

023fcb9

Signed-off-by: Matt Delacour <[email protected]>

Add again integration tests about backfill rows

e82106f

Signed-off-by: Matt Delacour <[email protected]>

Add teardown to datasource creator

92819dd

Signed-off-by: Matt Delacour <[email protected]>

Remove teardown logic in tests as it s part of conftest

5fea569

Signed-off-by: Matt Delacour <[email protected]>

Fix linter

5247764

Signed-off-by: Matt Delacour <[email protected]>

Add pytest.mark.universal to new test

6baf214

Signed-off-by: Matt Delacour <[email protected]>

MattDelac force-pushed the add_list_of_output_features_context branch from 0c17e5b to 6baf214 Compare October 15, 2021 17:04

adchia approved these changes Oct 15, 2021

View reviewed changes

feast-ci-bot assigned adchia Oct 15, 2021

feast-ci-bot added the lgtm label Oct 15, 2021

feast-ci-bot merged commit cbfc72a into feast-dev:master Oct 15, 2021

adchia mentioned this pull request Jan 18, 2022

Redshift historical retrieval query performance regression #2222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

MattDelac commented Sep 28, 2021 •

edited

Loading

codecov-commenter commented Sep 28, 2021 •

edited

Loading

woop commented Sep 28, 2021

MattDelac commented Sep 28, 2021

MattDelac commented Sep 28, 2021 •

edited

Loading

adchia commented Oct 14, 2021

MattDelac commented Oct 14, 2021

adchia commented Oct 14, 2021 •

edited

Loading

adchia left a comment

feast-ci-bot commented Oct 15, 2021

Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

Conversation

MattDelac commented Sep 28, 2021 • edited Loading

codecov-commenter commented Sep 28, 2021 • edited Loading

Codecov Report

woop commented Sep 28, 2021

MattDelac commented Sep 28, 2021

MattDelac commented Sep 28, 2021 • edited Loading

adchia commented Oct 14, 2021

MattDelac commented Oct 14, 2021

adchia commented Oct 14, 2021 • edited Loading

adchia left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Oct 15, 2021

MattDelac commented Sep 28, 2021 •

edited

Loading

codecov-commenter commented Sep 28, 2021 •

edited

Loading

MattDelac commented Sep 28, 2021 •

edited

Loading

adchia commented Oct 14, 2021 •

edited

Loading