-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Contrib azure provider with synapse/mssql offline store and Azure registry store #3072
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3072 +/- ##
==========================================
+ Coverage 67.86% 76.03% +8.16%
==========================================
Files 176 214 +38
Lines 15507 17897 +2390
==========================================
+ Hits 10524 13608 +3084
+ Misses 4983 4289 -694
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
dba6b1f
to
7fdc2ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, but we could theoretically setup CI by using https://docs.microsoft.com/en-us/azure/azure-sql-edge/disconnected-deployment and using test-containers.
## Functionality Matrix | ||
|
||
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality). | ||
Below is a matrix indicating which functionality is supported by the Spark offline store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/roadmap.md
Outdated
@@ -10,7 +10,7 @@ The list below contains the functionality that contributors are planning to deve | |||
* [x] [Redshift source](https://docs.feast.dev/reference/data-sources/redshift) | |||
* [x] [BigQuery source](https://docs.feast.dev/reference/data-sources/bigquery) | |||
* [x] [Parquet file source](https://docs.feast.dev/reference/data-sources/file) | |||
* [x] [Synapse source (community plugin)](https://github.com/Azure/feast-azure) | |||
* [x] [Synapse source (contrib plugin)](https://docs.feast.dev/reference/data-sources/mssql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be Azure Synapse + Azure SQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
make_tzaware, | ||
) | ||
|
||
# Make sure spark warning doesn't raise more than once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, we can kill this provider probably or use passthroughprovider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -297,6 +297,7 @@ def from_proto(data_source: DataSourceProto) -> Any: | |||
raise ValueError("Could not identify the source type being added.") | |||
|
|||
if data_source_type == DataSourceProto.SourceType.CUSTOM_SOURCE: | |||
data_source.data_source_class_type = "feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source.MsSqlServerSource" | |||
cls = get_data_source_class_from_type(data_source.data_source_class_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hard coding the class type here, I believe it should be declared in the to_proto() function of the MsSqlServerSource class. This is done in case of the other offline contrib stores and also ensures that there are no conflicts with any other future custom sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
2eaebdd
to
8d92603
Compare
super().__init__(project_name) | ||
self.tables_created: List[str] = [] | ||
self.container = SqlServerContainer(user=MSSQL_USER, password=MSSQL_PASSWORD) | ||
# self.container = fixture_request.getfixturevalue("mssql_container") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all the commented line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
sdk/python/feast/type_map.py
Outdated
pa_type_as_str = str(pa_type).lower() | ||
if pa_type_as_str.startswith("timestamp"): | ||
if "tz=" in pa_type_as_str: | ||
return "DATETIME2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why this is upper case and not lower case like everything else?
docs/reference/data-sources/mssql.md
Outdated
|
||
## Description | ||
|
||
MsSql data sources are Microsoft sql table sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look fixed to me? (Microsoft sql table sources)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, I realized I hadn't pushed my changes last night.
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/mssqlserver_source.py
Show resolved
Hide resolved
|
||
## Description | ||
|
||
The MsSql offline store provides support for reading [MsSQL Sources](../data-sources/mssql.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call out synapse or sql server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/tutorials/azure/README.md
Outdated
The objective of this tutorial is to build a model that predicts if a driver will complete a trip based on a number of features ingested into Feast. During this tutorial you will: | ||
|
||
1. Deploy the infrastructure for a feature store (using an ARM template) | ||
1. Register features into a central feature registry hosted on Blob Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these numbers seem wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/mssql.py
Show resolved
Hide resolved
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
return f"{self.project_name}_{destination_name}" | ||
|
||
def teardown(self): | ||
self.container.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you always use the fixture as I suggest above, then this shouldn't be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
sdk/python/feast/infra/offline_stores/contrib/mssql_offline_store/tests/data_source.py
Show resolved
Hide resolved
entity_df[entity_df_event_timestamp_col], utc=True | ||
).fillna(pandas.Timestamp.now()) | ||
|
||
# TODO: figure out how to deal with entity dataframes that are strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you throw an exception here? e.g. some not implemented error so users know they can't pass a SQL string for MsSQL to generate the entity dataframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/reference/data-sources/mssql.md
Outdated
|
||
## Description | ||
|
||
MsSql data sources are Microsoft sql table sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look fixed to me? (Microsoft sql table sources)
docs/tutorials/azure/README.md
Outdated
## 6. Running Feast Azure Tutorials locally without Azure workspace | ||
|
||
* If you are on a free tier instance, you will not be able to deploy the azure deployment because the azure workspace requires VCPUs and the free trial subscription does not have a quota. | ||
* The workaround is to remove the `Microsoft.MachineLearningServices/workspaces/computes` resource from `fs_snapse_azure_deploy.json` and setting up the environment locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? assuming should be synapse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the catch! fixed
35b123d
to
452379f
Compare
|
||
@pytest.fixture(scope="session") | ||
def mssql_container(): | ||
container = SqlServerContainer(user=MSSQL_USER, password=MSSQL_PASSWORD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass in the azure-sql-edge image as mentioned above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realize you actually wanted to use this image. Updated.
project, | ||
test_repo_config, | ||
repo_dir_name: Path, | ||
fixture_request: Optional[pytest.FixtureRequest], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need to pass a fixture_request actually. the test environment already has a datasourcecreator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
de40738
to
f998d8b
Compare
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
f998d8b
to
3d42093
Compare
Signed-off-by: Danny Chiao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adchia, kevjumba The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
/lgtm |
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25) ### Bug Fixes * Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65)) * FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511)) * Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787)) * Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7)) * Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761)) * Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310)) * Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0)) * Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac)) * Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae)) * Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d)) * Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6)) * Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6)) * Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0)) * More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd)) * Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370)) * Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f)) * Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea)) * Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1)) * Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974)) * UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753)) * Update sql.py ([#3096](#3096)) ([2646a86](2646a86)) * Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1)) ### Features * Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1)) * Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8)) * Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b)) * Adding saved dataset capabilities for Postgres ([#3070](#3070)) ([d3253c3](d3253c3)) * Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951)) * Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557)) * Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07)) * Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08)) * Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87)) * Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9)) * Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Hi, I am wondering if it is possible to use this for a self-hosted MSSQL Server database as an Offline Store? |
What this PR does / why we need it:
Moving https://github.com/Azure/feast-azure feast_azure_provider package into feast as contrib. Does not move the kubernetes cluster configuration.
Which issue(s) this PR fixes:
Fixes #