Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: BigTable online store #3140

Merged
merged 24 commits into from
Oct 5, 2022

Conversation

chhabrakadabra
Copy link
Collaborator

@chhabrakadabra chhabrakadabra commented Aug 25, 2022

What this PR does / why we need it:

BigTable online-store implementation.

Which issue(s) this PR fixes:

Fixes #

Remaining tasks

  • Write documentation for the BigTable online store
  • Update the implementation to be in line with recommendations from Google
  • Clean up the implementation and complete all the TODOs in here
  • Integrate this with the Feast test suite
  • Rebuild all the requirements files

Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! Just a couple of early comments. Happy to help with testing setup for this

@adchia adchia changed the title Initial implementation of BigTable online store. Initial implementation of BigTable online store (WIP) Sep 2, 2022
@chhabrakadabra chhabrakadabra changed the title Initial implementation of BigTable online store (WIP) feat: BigTable online store (WIP) Sep 27, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2022

Codecov Report

Base: 67.31% // Head: 57.95% // Decreases project coverage by -9.36% ⚠️

Coverage data is based on head (3c104d8) compared to base (532d8a1).
Patch coverage: 36.04% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3140      +/-   ##
==========================================
- Coverage   67.31%   57.95%   -9.37%     
==========================================
  Files         179      215      +36     
  Lines       16324    18062    +1738     
==========================================
- Hits        10989    10467     -522     
- Misses       5335     7595    +2260     
Flag Coverage Δ
integrationtests ?
unittests 57.95% <36.04%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/repo_config.py 76.40% <ø> (-5.27%) ⬇️
setup.py 0.00% <0.00%> (ø)
sdk/python/feast/infra/online_stores/bigtable.py 31.46% <31.46%> (ø)
...n/feature_repos/universal/online_store/bigtable.py 58.33% <58.33%> (ø)
...ts/integration/feature_repos/repo_configuration.py 56.59% <100.00%> (-31.12%) ⬇️
...sts/integration/registration/test_universal_cli.py 20.20% <0.00%> (-79.80%) ⬇️
...ts/integration/offline_store/test_offline_write.py 26.08% <0.00%> (-73.92%) ⬇️
...fline_store/test_universal_historical_retrieval.py 28.75% <0.00%> (-71.25%) ⬇️
...dk/python/tests/integration/e2e/test_validation.py 27.55% <0.00%> (-69.30%) ⬇️
...ests/integration/e2e/test_python_feature_server.py 31.34% <0.00%> (-68.66%) ⬇️
... and 175 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

chhabrakadabra and others added 15 commits September 29, 2022 21:08
Currently focusing on just getting the tests running locally. I've only
build python3.8 requirements.

Signed-off-by: Abhin Chhabra <[email protected]>
This was recommended by the BigTable dev team. Details of this layout
will be added to the documentation in a future commit.

Signed-off-by: Abhin Chhabra <[email protected]>
- If a row is empty when fetching data, don't process it more.
- If a task in the threadpool fails, bubble up that failure.
- If a `created_ts` is not available, use an empty string. `None` does
  not automatically serialize to bytes.

Signed-off-by: Abhin Chhabra <[email protected]>
As per feedback on the PR.

Signed-off-by: Abhin Chhabra <[email protected]>
Provide the GCP project and the bigtable instance ID for the tests to
connect to.

Signed-off-by: Abhin Chhabra <[email protected]>
This is BigTable's table length limit and it's causing test failures.

Signed-off-by: Abhin Chhabra <[email protected]>
- Fetch all the rows in one bigtable fetch.
- Get only the columns that are necessary (using a column regex filter).

Signed-off-by: Abhin Chhabra <[email protected]>
The latest rebuilding of requirements has upgraded the `moto` library
past the `4.0.0` release, which has a couple of breaking changes.
Specifically, the `mock_dynamodb2` decorator has been deprecated. See
https://github.com/spulec/moto/blob/master/CHANGELOG.md#400 for more
details.

The actual PR (getmoto/moto#4919) mentions that
it's because the `mock_dynamodb` decorator is now equivalent to the
`mock_dynamodb2` decorator.

Signed-off-by: Abhin Chhabra <[email protected]>
This matches the GCP docs.

Signed-off-by: Abhin Chhabra <[email protected]>
Closely mirrors the docs for the other online stores.

Signed-off-by: Abhin Chhabra <[email protected]>
It looks like the bigtable client will just skip over non-existent row
keys.

Signed-off-by: Abhin Chhabra <[email protected]>
@chhabrakadabra chhabrakadabra changed the title feat: BigTable online store (WIP) feat: BigTable online store Oct 3, 2022
docs/specs/online_store_format.md Show resolved Hide resolved
docs/reference/online-stores/bigtable.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@adchia
Copy link
Collaborator

adchia commented Oct 5, 2022

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, chhabrakadabra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adchia,chhabrakadabra]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Feedback from Danny mentioned that Bigtable should be able to store
multiple versions of the same key and fetch the latest at read time.
This makes sense and means that concurrent writes should work just fine.

Signed-off-by: Abhin Chhabra <[email protected]>
@chhabrakadabra
Copy link
Collaborator Author

@adchia I forgot to add the signature in the last commit. I guess I'll have to ask you to lgtm this once more please. 🙏

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
@adchia
Copy link
Collaborator

adchia commented Oct 5, 2022

/lgtm

Signed-off-by: Danny Chiao <[email protected]>
@feast-ci-bot feast-ci-bot removed the lgtm label Oct 5, 2022
@adchia
Copy link
Collaborator

adchia commented Oct 5, 2022

/lgtm

@feast-ci-bot feast-ci-bot merged commit 6bc91c2 into feast-dev:master Oct 5, 2022
@chhabrakadabra chhabrakadabra deleted the bigtable_online_store branch October 5, 2022 18:29
kevjumba pushed a commit that referenced this pull request Oct 6, 2022
# [0.26.0](v0.25.0...v0.26.0) (2022-10-06)

### Bug Fixes

* Add `X-Trino-Extra-Credential` header and remove user override ([#3246](#3246)) ([164e666](164e666))
* Add postgres to the feature server Dockerfile to fix helm chart flow ([#3261](#3261)) ([6f6cbb7](6f6cbb7))
* Add stream feature view in the Web UI ([#3257](#3257)) ([1f70b3a](1f70b3a))
* Build dockerfile correctly ([#3239](#3239)) ([a2dc0d0](a2dc0d0))
* Configuration to stop coercion of tz for entity_df ([#3255](#3255)) ([97b7ab9](97b7ab9))
* Enable users to upgrade a batch source into a push source ([#3213](#3213)) ([1b312fb](1b312fb))
* Fix docker image for feature-server ([#3272](#3272)) ([eff01d1](eff01d1))
* Fix Feast UI release process to update the feast-ui package  ([#3267](#3267)) ([a9d48d0](a9d48d0))
* Return 422 on bad push source name ([#3214](#3214)) ([b851e01](b851e01))
* Stream feature view meta undefined created_timestamp issue ([#3266](#3266)) ([12e1a8f](12e1a8f))
* Stream feature view not shown in the UI ([#3251](#3251)) ([e713dda](e713dda))
* Udf in stream feature view UI shows pickled data ([#3268](#3268)) ([0728117](0728117))
* Update snowflake materialization messages ([#3230](#3230)) ([a63d440](a63d440))
* Updated quickstart notebook to patch an incorrect reference to an outdated featureview name ([#3271](#3271)) ([b9b9c54](b9b9c54))
* Use configured user in env var instead of "user" for Trino ([#3254](#3254)) ([532d8a1](532d8a1))

### Features

* Add mysql as online store ([#3190](#3190)) ([cb8db84](cb8db84))
* Add possibility to define feature_store.yaml path with env variable ([#3231](#3231)) ([95fdb19](95fdb19))
* Add request_timeout setting for cassandra online store adapter ([#3256](#3256)) ([da20757](da20757))
* Add tag description in Field in the Feast UI ([#3258](#3258)) ([086f279](086f279))
* Adding billing_project_id in BigQueryOfflineStoreConfig ([#3253](#3253)) ([f80f05f](f80f05f))
* BigTable online store ([#3140](#3140)) ([6bc91c2](6bc91c2)), closes [/github.com/spulec/moto/blob/master/CHANGELOG.md#400](https://github.com//github.com/spulec/moto/blob/master/CHANGELOG.md/issues/400)
* Filter subset features in postgres [#3174](#3174) ([#3203](#3203)) ([b48d36b](b48d36b))
franciscojavierarceo pushed a commit to franciscojavierarceo/feast that referenced this pull request Oct 18, 2022
* Initial implementation of BigTable online store.

Signed-off-by: Abhin Chhabra <[email protected]>

* Attempt to run bigtable integration tests.

Currently focusing on just getting the tests running locally. I've only
build python3.8 requirements.

Signed-off-by: Abhin Chhabra <[email protected]>

* Got the BigTable tests running in local containers

Signed-off-by: Abhin Chhabra <[email protected]>

* Set serialization version when computing entity ID

Signed-off-by: Abhin Chhabra <[email protected]>

* Switch to the recommended layout in bigtable.

This was recommended by the BigTable dev team. Details of this layout
will be added to the documentation in a future commit.

Signed-off-by: Abhin Chhabra <[email protected]>

* Minor bugfixes.

- If a row is empty when fetching data, don't process it more.
- If a task in the threadpool fails, bubble up that failure.
- If a `created_ts` is not available, use an empty string. `None` does
  not automatically serialize to bytes.

Signed-off-by: Abhin Chhabra <[email protected]>

* Move BigTable online store out of contrib

As per feedback on the PR.

Signed-off-by: Abhin Chhabra <[email protected]>

* Attempt to run integration tests in CI.

Provide the GCP project and the bigtable instance ID for the tests to
connect to.

Signed-off-by: Abhin Chhabra <[email protected]>

* Delete tables for entity-less feature views.

Signed-off-by: Abhin Chhabra <[email protected]>

* Table names should be smaller than 50 characters

This is BigTable's table length limit and it's causing test failures.

Signed-off-by: Abhin Chhabra <[email protected]>

* Optimize bigtable reads.

- Fetch all the rows in one bigtable fetch.
- Get only the columns that are necessary (using a column regex filter).

Signed-off-by: Abhin Chhabra <[email protected]>

* dynamodb: switch to `mock_dynamodb`

The latest rebuilding of requirements has upgraded the `moto` library
past the `4.0.0` release, which has a couple of breaking changes.
Specifically, the `mock_dynamodb2` decorator has been deprecated. See
https://github.com/spulec/moto/blob/master/CHANGELOG.md#400 for more
details.

The actual PR (getmoto/moto#4919) mentions that
it's because the `mock_dynamodb` decorator is now equivalent to the
`mock_dynamodb2` decorator.

Signed-off-by: Abhin Chhabra <[email protected]>

* minor: rename `BigTable` to `Bigtable`

This matches the GCP docs.

Signed-off-by: Abhin Chhabra <[email protected]>

* Wrote some Bigtable documentation.

Closely mirrors the docs for the other online stores.

Signed-off-by: Abhin Chhabra <[email protected]>

* Bugfix: Deal with missing row keys.

It looks like the bigtable client will just skip over non-existent row
keys.

Signed-off-by: Abhin Chhabra <[email protected]>

* Fix linting issues.

Signed-off-by: Abhin Chhabra <[email protected]>

* Generate requirements files.

- As of version `1.49`, the various python packages in the [grpc
  repo](https://github.com/grpc/grpc/tree/master/src/python) require
  `protobuf>=4.21.3`. Unfortunately, this is incompatible with all
  versions of `tensorflow-metadata` (see [this
  issue](tensorflow/metadata#37)). And since
  `piptools` doesn't backtrack during dependency resolution, the
  requirement files cannot be regenerated without adding an upper limit
  on these grpc libraries directly in `setup.py`.
- The previous attempt to upgrade usages of the `mock_dynamodb2`
  decorator to the newest version failed. Since I'm not an expert in
  dynamodb, it made sense to just cap the test tool to the version
  already being used in CI.

Signed-off-by: Abhin Chhabra <[email protected]>

* Don't bother materializing created timestamp.

Had a discussion with Danny about whether it's useful to copy this
column. He agreed that there's no value to storing this in the online
store.

Signed-off-by: Abhin Chhabra <[email protected]>

* Remove `tensorflow-metadata`.

Turns out that this dependency is not required. We removed all
references to it in [this
PR](feast-dev#2063), but did not remove it
from `setup.py`. Removing it has caused many of the restrictions imposed
in previous commits to be unnecessary.

Signed-off-by: Abhin Chhabra <[email protected]>

* Minor fix to Bigtable documentation.

Feedback from Danny mentioned that Bigtable should be able to store
multiple versions of the same key and fetch the latest at read time.
This makes sense and means that concurrent writes should work just fine.

Signed-off-by: Abhin Chhabra <[email protected]>

* update roadmap docs

Signed-off-by: Danny Chiao <[email protected]>

* Fix roadmap doc

Signed-off-by: Danny Chiao <[email protected]>

* Change link to point to roadmap page

Signed-off-by: Danny Chiao <[email protected]>

* change order in roadmap

Signed-off-by: Danny Chiao <[email protected]>

Signed-off-by: Abhin Chhabra <[email protected]>
Signed-off-by: Abhin Chhabra <[email protected]>
Signed-off-by: Danny Chiao <[email protected]>
Co-authored-by: Danny Chiao <[email protected]>
franciscojavierarceo pushed a commit to franciscojavierarceo/feast that referenced this pull request Oct 18, 2022
# [0.26.0](feast-dev/feast@v0.25.0...v0.26.0) (2022-10-06)

### Bug Fixes

* Add `X-Trino-Extra-Credential` header and remove user override ([feast-dev#3246](feast-dev#3246)) ([164e666](feast-dev@164e666))
* Add postgres to the feature server Dockerfile to fix helm chart flow ([feast-dev#3261](feast-dev#3261)) ([6f6cbb7](feast-dev@6f6cbb7))
* Add stream feature view in the Web UI ([feast-dev#3257](feast-dev#3257)) ([1f70b3a](feast-dev@1f70b3a))
* Build dockerfile correctly ([feast-dev#3239](feast-dev#3239)) ([a2dc0d0](feast-dev@a2dc0d0))
* Configuration to stop coercion of tz for entity_df ([feast-dev#3255](feast-dev#3255)) ([97b7ab9](feast-dev@97b7ab9))
* Enable users to upgrade a batch source into a push source ([feast-dev#3213](feast-dev#3213)) ([1b312fb](feast-dev@1b312fb))
* Fix docker image for feature-server ([feast-dev#3272](feast-dev#3272)) ([eff01d1](feast-dev@eff01d1))
* Fix Feast UI release process to update the feast-ui package  ([feast-dev#3267](feast-dev#3267)) ([a9d48d0](feast-dev@a9d48d0))
* Return 422 on bad push source name ([feast-dev#3214](feast-dev#3214)) ([b851e01](feast-dev@b851e01))
* Stream feature view meta undefined created_timestamp issue ([feast-dev#3266](feast-dev#3266)) ([12e1a8f](feast-dev@12e1a8f))
* Stream feature view not shown in the UI ([feast-dev#3251](feast-dev#3251)) ([e713dda](feast-dev@e713dda))
* Udf in stream feature view UI shows pickled data ([feast-dev#3268](feast-dev#3268)) ([0728117](feast-dev@0728117))
* Update snowflake materialization messages ([feast-dev#3230](feast-dev#3230)) ([a63d440](feast-dev@a63d440))
* Updated quickstart notebook to patch an incorrect reference to an outdated featureview name ([feast-dev#3271](feast-dev#3271)) ([b9b9c54](feast-dev@b9b9c54))
* Use configured user in env var instead of "user" for Trino ([feast-dev#3254](feast-dev#3254)) ([532d8a1](feast-dev@532d8a1))

### Features

* Add mysql as online store ([feast-dev#3190](feast-dev#3190)) ([cb8db84](feast-dev@cb8db84))
* Add possibility to define feature_store.yaml path with env variable ([feast-dev#3231](feast-dev#3231)) ([95fdb19](feast-dev@95fdb19))
* Add request_timeout setting for cassandra online store adapter ([feast-dev#3256](feast-dev#3256)) ([da20757](feast-dev@da20757))
* Add tag description in Field in the Feast UI ([feast-dev#3258](feast-dev#3258)) ([086f279](feast-dev@086f279))
* Adding billing_project_id in BigQueryOfflineStoreConfig ([feast-dev#3253](feast-dev#3253)) ([f80f05f](feast-dev@f80f05f))
* BigTable online store ([feast-dev#3140](feast-dev#3140)) ([6bc91c2](feast-dev@6bc91c2)), closes [/github.com/spulec/moto/blob/master/CHANGELOG.md#400](https://github.com//github.com/spulec/moto/blob/master/CHANGELOG.md/issues/400)
* Filter subset features in postgres [feast-dev#3174](feast-dev#3174) ([feast-dev#3203](feast-dev#3203)) ([b48d36b](feast-dev@b48d36b))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants