Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Python SDK to remove v1 concepts #1023

Merged
merged 34 commits into from
Oct 8, 2020

Conversation

terryyylim
Copy link
Member

What this PR does / why we need it:
In upcoming releases, we'll be slowly shifting towards a different architecture as defined in this RFC. This PR is the first of a series that'll be related to cleaning up older concepts of Feast in the codebase.

The following changes would be included in subsequent PRs, likely when new retrieval method has been implemented.

  • helm charts
  • load test
  • docker-compose test

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Python SDK `ingest` method now supports ingesting data from a batch source (parquet file/bigquery table) specified by the user, and written to the specified FeatureTable.

@terryyylim
Copy link
Member Author

/retest

@terryyylim terryyylim changed the title WIP: Refactor Python SDK to remove v1 concepts Refactor Python SDK to remove v1 concepts Oct 5, 2020
sdk/python/feast/client.py Outdated Show resolved Hide resolved
name="alltypes",
entities=["alltypes_id"],
features=[
Feature(name="float_feature", dtype=ValueType.FLOAT).to_proto(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to convert to_proto()? are users expected to do this as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

date_partition_column="date_partition_col",
)

stream_source = DataSource(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using specific sources like FileSource here and not the underlying base class. Otherwise the API is too low level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In protos, we have the different DataSources identified via its options here. Thus, in the sdk, I only exposed DataSource, and let the different options be configurable via native classes FileOptions, BigQueryOptions, KafkaOptions, KinesisOptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is our preference here to create native classes FileSource, BigQuerySource, KafkaSource and KinesisSource? I felt that following how protos were defined would be a more standardized approach for the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Source based approach (as in the RFC) is a more abstracted view. It allows us to define a source specific constructor. So they dont have to define the type, we can define that. We can also have input arguments that are specific to each source, and we dont require them to define FileOptions which is basically the same as having FileSource at the end of the day. So its better to just have a single constructor instead of two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that would be cleaner. I've updated the FeatureTable class and only expose FileSource, BigQuerySource, KafkaSource and KinesisSource instead of the respective options native classes.

Signed-off-by: Terence <[email protected]>
Signed-off-by: Terence <[email protected]>
Signed-off-by: Terence <[email protected]>
This reverts commit d567937eaf80190cde59128c19af4644c810e7d9.

Signed-off-by: Terence <[email protected]>
Signed-off-by: Terence <[email protected]>
This reverts commit 7e74e9069f97af9c0e108aba8f4bd1197ba5c3ed.

Signed-off-by: Terence <[email protected]>
Signed-off-by: Terence <[email protected]>
@terryyylim terryyylim force-pushed the v1-clean-up branch 3 times, most recently from e144e7d to ef014ad Compare October 8, 2020 03:12
Signed-off-by: Terence <[email protected]>
@feast-ci-bot
Copy link
Collaborator

feast-ci-bot commented Oct 8, 2020

@terryyylim: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
test-end-to-end-batch a3f83fbe0f1ac5f602a2c6cec02618ce68738279 link /test test-end-to-end-batch
test-end-to-end-batch-dataflow a3f83fbe0f1ac5f602a2c6cec02618ce68738279 link /test test-end-to-end-batch-dataflow

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@terryyylim
Copy link
Member Author

/retest

features: Union[FeatureV2, List[FeatureV2]],
batch_source: Optional[DataSource] = None,
stream_source: Optional[DataSource] = None,
entities: List[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this be Entity?

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terryyylim, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Oct 8, 2020

/lgtm

@woop woop added this to the v0.8.0 milestone Oct 8, 2020
@feast-ci-bot feast-ci-bot merged commit 25b796f into feast-dev:master Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants