Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add final_output_feature_names in Query context to avoid SELECT * EXCEPT #1911

Merged

Conversation

MattDelac
Copy link
Collaborator

@MattDelac MattDelac commented Sep 28, 2021

Signed-off-by: Matt Delacour [email protected]

What this PR does / why we need it:
Contrary to BigQuery, other offline stores like Trino & Redshit don't support SELECT * EXCEPT(...)
Therefore, we should inject the list of the columns we want at the end of an historical retrieval.

It also makes it easier to understand which columns are included. It's often hard to understand those SELECT * EXCEPT as nothing is explicit

I also fix the Redshit template that is not using the ROW_NUMBER logic as in BigQuery. This was a bug fixed 2 months ago.
Unsure why the unit tests did not fail before even though I added tests at the time 🤷

image

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

We now pass the list of the columns desired as the output of an historical retrieval to the Jinja template

@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2021

Codecov Report

Merging #1911 (6baf214) into master (6b10a82) will increase coverage by 0.12%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1911      +/-   ##
==========================================
+ Coverage   81.89%   82.02%   +0.12%     
==========================================
  Files          97       97              
  Lines        7739     7760      +21     
==========================================
+ Hits         6338     6365      +27     
+ Misses       1401     1395       -6     
Flag Coverage Δ
integrationtests 74.19% <100.00%> (+0.16%) ⬆️
unittests 59.30% <35.48%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py 80.00% <ø> (ø)
sdk/python/feast/infra/offline_stores/redshift.py 87.06% <ø> (+2.06%) ⬆️
...python/feast/infra/offline_stores/offline_utils.py 91.48% <100.00%> (+1.27%) ⬆️
sdk/python/feast/infra/utils/aws_utils.py 70.37% <100.00%> (-0.65%) ⬇️
...fline_store/test_universal_historical_retrieval.py 99.40% <100.00%> (+0.11%) ⬆️
sdk/python/feast/infra/online_stores/sqlite.py 98.83% <0.00%> (+2.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b10a82...6baf214. Read the comment docs.

@MattDelac MattDelac self-assigned this Sep 28, 2021
@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch from d5005a5 to 8aba10b Compare September 28, 2021 11:48
@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch 2 times, most recently from 8aba10b to 4a1e10e Compare September 28, 2021 12:47
@woop
Copy link
Member

woop commented Sep 28, 2021

Strange that the tests didn't fail. Your proposed change here looks good to me.

@MattDelac
Copy link
Collaborator Author

I don't understand why the tests fail for Redshift.

The difference between the 2 files (BigQuery on the left vs Redshift on the right) are minimal
image
image

@MattDelac
Copy link
Collaborator Author

MattDelac commented Sep 28, 2021

I don't understand why the tests fail for Redshift.

The difference between the 2 files (BigQuery on the left vs Redshift on the right) are minimal

Also the fact that there is a CONCAT missing in the redshift template is disturbing to me. I will dig into the internals of Redshift to understand if this is implicit in some manner

@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch from 4a1e10e to a822d27 Compare September 30, 2021 16:46
@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch from a822d27 to fcaa721 Compare September 30, 2021 16:49
@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch 3 times, most recently from 4c2cef3 to b84a70c Compare October 14, 2021 10:08
@adchia
Copy link
Collaborator

adchia commented Oct 14, 2021

generally LG to me. pending we should revive the old test you wrote cc @achals

Sorry was the test comment directed at me?

nope it was just an FYI

@MattDelac
Copy link
Collaborator Author

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MattDelac

The full list of commands accepted by this bot can be found here.

The pull request process is described here
Needs approval from an approver in each of these files:

Lol I did not do anything 🤷

@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch 4 times, most recently from d1af021 to b5dd4d7 Compare October 14, 2021 18:56
@adchia
Copy link
Collaborator

adchia commented Oct 14, 2021

seems like you have some bug in the redshift test.

Roughly, you can see the dataset here

https://github.com/feast-dev/feast/blob/master/sdk/python/tests/data/data_creator.py#L10-L10

It creates something like

| driver_id | value | ts_1 | 
+-----------+-------+------+
|         1 |   0.1 | 4 hr ago
|         2 |   None| now
|         1 |   0.3 | 3 hr ago
|         3 |     4 | 4 hr ago
|         3 |     5 | 1 hr ago

It fails after checking driver_id=3's value after materializing data up to 2 hr ago. Seems likely that your timestamp change for the redshift query made it accidentally materialize that last record (driver_id=3, value=5)?

@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch 2 times, most recently from 42507b6 to 0c17e5b Compare October 15, 2021 15:59
@MattDelac MattDelac force-pushed the add_list_of_output_features_context branch from 0c17e5b to 6baf214 Compare October 15, 2021 17:04
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, MattDelac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants