Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Updating protos to separate transformation #4018

Merged
merged 18 commits into from
Mar 24, 2024

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Mar 17, 2024

What this PR does / why we need it:

This PR updates the OnDemandFeatureView and StreamingFeatureView protobuf objectss into separate FeatureTransformation protobuf objects explicitly. This is a much cleaner way for machine learning engineers to understand the transformations.

This PR will do the following:

  1. Create the new FeatureTransformation.proto file and object
  2. Deprecate UserDefinedFunction in the OnDemandFeatureView.proto and StreamFeatureView.proto files
  3. Store the new FeatureTransformationProto in the OnDemandFeatureViewSpec and StreamFeatureViewSpecProto objects so that when writes are executed the new protos are stored alongside the old protos
  4. Read from both the old UserDefinedFunction proto and the new FeatureTransformation proto and handle the case when UserDefinedFunction is present and FeatureTransformation is missing
  5. Add warnings that the UserDefinedFunction will be deprecated
  6. Update the UI to use the newly structured object

We will make this release with dual writes and the "coalesce-like" read so that the behavior is backwards compatible and also enables a safe roll out for users.

We will deprecate the UserDefinedFunction before releasing to Feast 1.0.0.

Which issue(s) this PR fixes:

Fixes #

@franciscojavierarceo franciscojavierarceo changed the title feat: updating protos to separate transformation feat: Updating protos to separate transformation Mar 17, 2024
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@tokoko
Copy link
Collaborator

tokoko commented Mar 18, 2024

@franciscojavierarceo I like the split, but this will break the existing proto layout for everyone, won't it? factoring out oneof in a separate message type, especially when there are already fields in OnDemandFeatureView with higher numerical positions won't be backwards-compatible. @HaoXuAI What do you think?

@franciscojavierarceo
Copy link
Member Author

@tokoko yes, this is a breaking change.

The refactor is fine it's really bumping the enum value of user_defined_function to 1. I think it's worth the long term clarity, otherwise we'd have to start counting from 5 and have a weird comment.

I think it's worth it but I can change the counter if you feel strongly. We can add to the release notes that this is a breaking change.

@franciscojavierarceo
Copy link
Member Author

Eh, it's not worth the inconvenience to users. I made the protos backwards compatible.

@tokoko
Copy link
Collaborator

tokoko commented Mar 18, 2024

I didn't just mean the numbering. I think introducing FeatureTransformation-typed field simply can't be made in a backwards-compatible way no matter the numbering. if previously encoded proto message kept UserDefinedFunction field in position number 5, now it's part of another field in position number 17. We won't read previously saved protos correctly. Having said that, I don't have strong feelings about it, we can make a breaking change, it just might be too much of a headache for people already using it.

@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review March 18, 2024 12:06
@franciscojavierarceo
Copy link
Member Author

Yeah I guess it's really not avoidable unless I create a new field and then deprecate the other one. I think we should just cut over.

@tokoko
Copy link
Collaborator

tokoko commented Mar 18, 2024

Just to understand this better, do we have a good practical justification for the breaking change though? For example, any idea why we would want to have a separate FeatureTransformation message, any additional fields we may want to add there in the future except oneof? Or is the primary goal to reuse that message type in StreamFeatureView?

@@ -61,6 +60,7 @@ message OnDemandFeatureViewSpec {

// Owner of the on demand feature view.
string owner = 8;
string mode = 9;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of mode field here since this is already a breaking change anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my team at Affirm :)

Copy link
Collaborator

@tokoko tokoko Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that, but I'm a little confused. If you care about proto encoding compatibility, this change will break your protos anyway, with or without mode field. If all you care about is user-facing API (in other words mode parameter in on_demand_feature_view decorator) we can do that without a redundant mode field in the proto. mode parameter in on_demand_feature_view would simply determine which one of oneof fields will be populated. Maybe I'm missing something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point but it does make API interface similar to stream feature view, which I prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not disputing the mode field in the API. I agree we should do that as part of native python PR. All I'm saying is we don't need a mode field in proto for that to be possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i gues @tokoko means to handle mode in Python only. either way it works. Explicitly adding it in the proto does help understanding the Feature View schema.

Copy link
Member Author

@franciscojavierarceo franciscojavierarceo Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly adding it in the proto does help understanding the Feature View schema.

+1

@franciscojavierarceo
Copy link
Member Author

franciscojavierarceo commented Mar 18, 2024

Just to understand this better, do we have a good practical justification for the breaking change though?

To decouple Transformation and Serving explicitly and share that code across Stream and On Demand

For example, any idea why we would want to have a separate FeatureTransformation message, any additional fields we may want to add there in the future except oneof?

This PR makes an explicit FeatureTransformation message where transformation is a child object and we could add additional metadata to it in the future (e.g., the commit sha).

Or is the primary goal to reuse that message type in StreamFeatureView?

This is a primary goal that's a consequences of the goal of decoupling.

@tokoko
Copy link
Collaborator

tokoko commented Mar 18, 2024

@franciscojavierarceo thanks, StreamFeatureView argument sounds solid to me. P.S. Interesting if we can do this somehow without breaking StreamFeatureView proto encoding too?

@etirelli
Copy link
Collaborator

@franciscojavierarceo similar question: why not do the change preserving backward compatibility (i.e., deprecating the field, changing load function to load the new attribute, changing the save function to erase the old one). Eventually, as we reach 1.0, we can remove the deprecated fields to clean things up.

@franciscojavierarceo
Copy link
Member Author

@etirelli I'll update the protos to make them backwards compatible and deprecate them.

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I know it was a painful process. but it's great we kicked off the implementation of transformation. 👍

@tokoko
Copy link
Collaborator

tokoko commented Mar 22, 2024

@franciscojavierarceo thanks, looks good to me as well at first glance, but pretty complicated to actually make sure that it will be backwards-compatible, so please give me a couple days to go through. P.S. I don't know how feasible it is right now but maybe we can also come up with a way to check proto compatibility against the latest version of feast on pypi.

oneof transformation {
UserDefinedFunctionV2 user_defined_function = 5;
OnDemandSubstraitTransformationV2 on_demand_substrait_transformation = 6;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good time to rethink the naming here. My first suggestion was to rename the field (not message type) to on_demand_pandas_transformation instead of user_defined_function. But on second thought, since we are also aiming to reuse this in StreamFeatureViews, I think protos should no longer be called OnDemand... What do you think? I'm thinking of something like this:

UserDefinedFunctionV2 pandas_transformation = 1;
SubstraitTransformationV2 substrait_transformation = 2;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on doing that in a follow up PR to not add too much complexity here.

@franciscojavierarceo
Copy link
Member Author

@tokoko I need to actually test the backwards compatibility so it's not quite done yet. I need to adjust the registry diff and add tests but most of the behavior is here. I'll post in the channel once ready again. Thanks for keeping an eye on it @tokoko @HaoXuAI!

@tokoko
Copy link
Collaborator

tokoko commented Mar 22, 2024

@franciscojavierarceo sorry about that, looks like we jumped the gun.

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@@ -49,9 +50,11 @@ message OnDemandFeatureViewSpec {
map<string, OnDemandSource> sources = 4;

oneof transformation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't made it to an official release yet, so can't we remove transformation oneof here? just leave UserDefinedFunction user_defined_function = 5 [deprecated = true]. It would also enable you to call the new field transformation which sounds less awkward (FeatureTransformationV2 transformation = 10;) and cut down on overall code size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do that in a separate PR as a follow up. I'll probably end up with 4 by the end. I want to leave this one as is though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have separate clean PRs than one massive one imo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokoko lmk if you're good with this. I have the next PR coming soon where I'll rename things and the subsequent will do the PythonTransformation type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo yeah, sounds good to me. Let's just not make a new release until we get all these ironed out.

Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this and worry about naming in the next one. I also wonder if we could add some sort of a test for compatibility... generate protos with latest released version of feast on pypi (maybe in docker) and test if that still works. Easier said than done, though... 😄

@franciscojavierarceo
Copy link
Member Author

Since the proto enums didn't change it's backwards compatible, I also had a unit test do this explicitly to confirm the behavior.



@pytest.mark.filterwarnings("ignore:udf and udf_string parameters are deprecated")
def test_from_proto_backwards_compatable_udf():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokoko see here

@franciscojavierarceo franciscojavierarceo merged commit c58ef74 into master Mar 24, 2024
16 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Apr 16, 2024
# [0.36.0](v0.35.0...v0.36.0) (2024-04-16)

### Bug Fixes

* Add __eq__, __hash__ to SparkSource for correct comparison ([#4028](#4028)) ([e703b40](e703b40))
* Add conn.commit() to Postgresonline_write_batch.online_write_batch ([#3904](#3904)) ([7d75fc5](7d75fc5))
* Add missing __init__.py to embedded_go ([#4051](#4051)) ([6bb4c73](6bb4c73))
* Add missing init files in infra utils ([#4067](#4067)) ([54910a1](54910a1))
* Added registryPath parameter documentation in WebUI reference ([#3983](#3983)) ([5e0af8f](5e0af8f)), closes [#3974](#3974) [#3974](#3974)
* Adding missing init files in materialization modules ([#4052](#4052)) ([df05253](df05253))
* Allow trancated timestamps when converting ([#3861](#3861)) ([bdd7dfb](bdd7dfb))
* Azure blob storage support in Java feature server ([#2319](#2319)) ([#4014](#4014)) ([b9aabbd](b9aabbd))
* Bugfix for grabbing historical data from Snowflake with array type features. ([#3964](#3964)) ([1cc94f2](1cc94f2))
* Bytewax materialization engine fails when loading feature_store.yaml ([#3912](#3912)) ([987f0fd](987f0fd))
* CI unittest warnings ([#4006](#4006)) ([0441b8b](0441b8b))
* Correct the returning class proto type of StreamFeatureView to StreamFeatureViewProto instead of FeatureViewProto. ([#3843](#3843)) ([86d6221](86d6221))
* Create index only if not exists during MySQL online store update ([#3905](#3905)) ([2f99a61](2f99a61))
* Disable minio tests in workflows on master and nightly ([#4072](#4072)) ([c06dda8](c06dda8))
* Disable the Feast Usage feature by default. ([#4090](#4090)) ([b5a7013](b5a7013))
* Dump repo_config by alias ([#4063](#4063)) ([e4bef67](e4bef67))
* Extend SQL registry config with a sqlalchemy_config_kwargs key ([#3997](#3997)) ([21931d5](21931d5))
* Feature Server image startup in OpenShift clusters ([#4096](#4096)) ([9efb243](9efb243))
* Fix copy method for StreamFeatureView ([#3951](#3951)) ([cf06704](cf06704))
* Fix for materializing entityless feature views in Snowflake ([#3961](#3961)) ([1e64c77](1e64c77))
* Fix type mapping spark ([#4071](#4071)) ([3afa78e](3afa78e))
* Fix typo as the cli does not support shortcut-f option. ([#3954](#3954)) ([dd79dbb](dd79dbb))
* Get container host addresses from testcontainers ([#3946](#3946)) ([2cf1a0f](2cf1a0f))
* Handle ComplexFeastType to None comparison ([#3876](#3876)) ([fa8492d](fa8492d))
* Hashlib md5 errors in FIPS for python 3.9+ ([#4019](#4019)) ([6d9156b](6d9156b))
* Making the query_timeout variable as optional int because upstream is considered to be optional ([#4092](#4092)) ([fd5b620](fd5b620))
* Move gRPC dependencies to an extra ([#3900](#3900)) ([f93c5fd](f93c5fd))
* Prevent spamming pull busybox from dockerhub ([#3923](#3923)) ([7153cad](7153cad))
* Quickstart notebook example ([#3976](#3976)) ([b023aa5](b023aa5))
* Raise error when not able read of file source spark source ([#4005](#4005)) ([34cabfb](34cabfb))
* remove not use input parameter in spark source ([#3980](#3980)) ([7c90882](7c90882))
* Remove parentheses in pull_latest_from_table_or_query ([#4026](#4026)) ([dc4671e](dc4671e))
* Remove proto-plus imports ([#4044](#4044)) ([ad8f572](ad8f572))
* Remove unnecessary dependency on mysqlclient ([#3925](#3925)) ([f494f02](f494f02))
* Restore label check for all actions using pull_request_target ([#3978](#3978)) ([591ba4e](591ba4e))
* Revert mypy config ([#3952](#3952)) ([6b8e96c](6b8e96c))
* Rewrite Spark materialization engine to use mapInPandas ([#3936](#3936)) ([dbb59ba](dbb59ba))
* Run feature server w/o gunicorn on windows ([#4024](#4024)) ([584e9b1](584e9b1))
* SqlRegistry _apply_object update statement ([#4042](#4042)) ([ef62def](ef62def))
* Substrait ODFVs for online ([#4064](#4064)) ([26391b0](26391b0))
* Swap security label check on the PR title validation job to explicit permissions instead ([#3987](#3987)) ([f604af9](f604af9))
* Transformation server doesn't generate files from proto ([#3902](#3902)) ([d3a2a45](d3a2a45))
* Trino as an OfflineStore Access Denied when BasicAuthenticaion ([#3898](#3898)) ([49d2988](49d2988))
* Trying to import pyspark lazily to avoid the dependency on the library ([#4091](#4091)) ([a05cdbc](a05cdbc))
* Typo Correction in Feast UI Readme ([#3939](#3939)) ([c16e5af](c16e5af))
* Update actions/setup-python from v3 to v4 ([#4003](#4003)) ([ee4c4f1](ee4c4f1))
* Update typeguard version to >=4.0.0 ([#3837](#3837)) ([dd96150](dd96150))
* Upgrade sqlalchemy from 1.x to 2.x regarding PVE-2022-51668. ([#4065](#4065)) ([ec4c15c](ec4c15c))
* Use CopyFrom() instead of __deepycopy__() for creating a copy of protobuf object. ([#3999](#3999)) ([5561b30](5561b30))
* Using version args to install the correct feast version ([#3953](#3953)) ([b83a702](b83a702))
* Verify the existence of Registry tables in snowflake before calling CREATE sql command. Allow read-only user to call feast apply. ([#3851](#3851)) ([9a3590e](9a3590e))

### Features

* Add duckdb offline store ([#3981](#3981)) ([161547b](161547b))
* Add Entity df in format of a Spark Dataframe instead of just pd.DataFrame or string for SparkOfflineStore ([#3988](#3988)) ([43b2c28](43b2c28))
* Add gRPC Registry Server ([#3924](#3924)) ([373e624](373e624))
* Add local tests for s3 registry using minio ([#4029](#4029)) ([d82d1ec](d82d1ec))
* Add python bytes to array type conversion support proto ([#3874](#3874)) ([8688acd](8688acd))
* Add python client for remote registry server ([#3941](#3941)) ([42a7b81](42a7b81))
* Add Substrait-based ODFV transformation ([#3969](#3969)) ([9e58bd4](9e58bd4))
* Add support for arrays in snowflake ([#3769](#3769)) ([8d6bec8](8d6bec8))
* Added delete_table to redis online store ([#3857](#3857)) ([03dae13](03dae13))
* Adding support for Native Python feature transformations for ODFVs ([#4045](#4045)) ([73bc853](73bc853))
* Bumping requirements ([#4079](#4079)) ([1943056](1943056))
* Decouple transformation types from ODFVs ([#3949](#3949)) ([0a9fae8](0a9fae8))
* Dropping Python 3.8 from local integration tests and integration tests ([#3994](#3994)) ([817995c](817995c))
* Dropping python 3.8 requirements files from the project. ([#4021](#4021)) ([f09c612](f09c612))
* Dropping the support for python 3.8 version from feast ([#4010](#4010)) ([a0f7472](a0f7472))
* Dropping unit tests for Python 3.8 ([#3989](#3989)) ([60f24f9](60f24f9))
* Enable Arrow-based columnar data transfers  ([#3996](#3996)) ([d8d7567](d8d7567))
* Enable Vector database and retrieve_online_documents API ([#4061](#4061)) ([ec19036](ec19036))
* Kubernetes materialization engine written based on bytewax ([#4087](#4087)) ([7617bdb](7617bdb))
* Lint with ruff ([#4043](#4043)) ([7f1557b](7f1557b))
* Make arrow primary interchange for offline ODFV execution ([#4083](#4083)) ([9ed0a09](9ed0a09))
* Pandas v2 compatibility ([#3957](#3957)) ([64459ad](64459ad))
* Pull duckdb from contribs, add to CI ([#4059](#4059)) ([318a2b8](318a2b8))
* Refactor ODFV schema inference ([#4076](#4076)) ([c50a9ff](c50a9ff))
* Refactor registry caching logic into a separate class ([#3943](#3943)) ([924f944](924f944))
* Rename OnDemandTransformations to Transformations ([#4038](#4038)) ([9b98eaf](9b98eaf))
* Revert updating dependencies so that feast can be run on 3.11. ([#3968](#3968)) ([d3c68fb](d3c68fb)), closes [#3958](#3958)
* Rewrite ibis point-in-time-join w/o feast abstractions ([#4023](#4023)) ([3980e0c](3980e0c))
* Support s3gov schema by snowflake offline store during materialization ([#3891](#3891)) ([ea8ad17](ea8ad17))
* Update odfv test ([#4054](#4054)) ([afd52b8](afd52b8))
* Update pyproject.toml to use Python 3.9 as default ([#4011](#4011)) ([277b891](277b891))
* Update the Pydantic from v1 to v2 ([#3948](#3948)) ([ec11a7c](ec11a7c))
* Updating dependencies so that feast can be run on 3.11. ([#3958](#3958)) ([59639db](59639db))
* Updating protos to separate transformation ([#4018](#4018)) ([c58ef74](c58ef74))

### Reverts

* Reverting bumping requirements ([#4081](#4081)) ([1ba65b4](1ba65b4)), closes [#4079](#4079)
* Verify the existence of Registry tables in snowflake… ([#3907](#3907)) ([c0d358a](c0d358a)), closes [#3851](#3851)
@tokoko tokoko deleted the odfv-proto-change branch July 16, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants