Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Msudhir/add vector update functionality #14

Merged
merged 30 commits into from
Aug 11, 2023

Conversation

Manisha4
Copy link
Collaborator

@Manisha4 Manisha4 commented Aug 7, 2023

What this PR does / why we need it:

Added updated functionality to MilvusOnlineStore. The update functionality is able to delete and create a collection.
Relevant tests added as well

Which issue(s) this PR fixes:

Fixes #

adchia and others added 26 commits July 3, 2023 12:29
update Catalog argument in athena connector

Signed-off-by: Gyumin Lee <[email protected]>
Co-authored-by: Gyumin Lee <[email protected]>
ensure correct precedence with the two operators

Signed-off-by: Ben Fletcher <[email protected]>
…east-dev#3677)

Bumps [tough-cookie](https://github.com/salesforce/tough-cookie) from 4.0.0 to 4.1.3.
- [Release notes](https://github.com/salesforce/tough-cookie/releases)
- [Changelog](https://github.com/salesforce/tough-cookie/blob/master/CHANGELOG.md)
- [Commits](salesforce/tough-cookie@v4.0.0...v4.1.3)

---
updated-dependencies:
- dependency-name: tough-cookie
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [tough-cookie](https://github.com/salesforce/tough-cookie) from 4.0.0 to 4.1.3.
- [Release notes](https://github.com/salesforce/tough-cookie/releases)
- [Changelog](https://github.com/salesforce/tough-cookie/blob/master/CHANGELOG.md)
- [Commits](salesforce/tough-cookie@v4.0.0...v4.1.3)

---
updated-dependencies:
- dependency-name: tough-cookie
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…east-dev#3630)

* sql.py data_sources.data_source_name String(255)

Extend the limit of the data_source_name field from 50 to 255.

Signed-off-by: Ross Donnachie <[email protected]>
…east-dev#3680)

feat: Optimize bytes processed when retrieving entity df schema to 0

Signed-off-by: Hai Nguyen <[email protected]>
…tore.plan() on python (feast-dev#3640)

* fix! KeyError: __dummy on entityless fv

Signed-off-by: williamfoschiera <[email protected]>

* fix! join_keys typing.

Signed-off-by: williamfoschiera <[email protected]>

---------

Signed-off-by: williamfoschiera <[email protected]>
Co-authored-by: williamfoschiera <[email protected]>
Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.1.1 to 7.2.4.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](protobufjs/protobuf.js@protobufjs-v7.1.1...protobufjs-v7.2.4)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…east-dev#3675)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.1.2 to 7.2.4.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md)
- [Commits](protobufjs/protobuf.js@protobufjs-v7.1.2...protobufjs-v7.2.4)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md)
- [Commits](npm/node-semver@v6.3.0...v6.3.1)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…-dev#3679)

Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md)
- [Commits](npm/node-semver@v6.3.0...v6.3.1)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.47.0 to 1.53.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.47.0...v1.53.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# [0.32.0](feast-dev/feast@v0.31.0...v0.32.0) (2023-07-17)

### Bug Fixes

* Added generic Feature store Creation for CLI ([feast-dev#3618](feast-dev#3618)) ([bf740d2](feast-dev@bf740d2))
* Broken non-root path with projects-list.json ([feast-dev#3665](feast-dev#3665)) ([4861af0](feast-dev@4861af0))
* Clean up snowflake to_spark_df() ([feast-dev#3607](feast-dev#3607)) ([e8e643e](feast-dev@e8e643e))
* Entityless fv breaks with `KeyError: __dummy` applying feature_store.plan() on python ([feast-dev#3640](feast-dev#3640)) ([ef4ef32](feast-dev@ef4ef32))
* Fix scan datasize to 0 for inference schema ([feast-dev#3628](feast-dev#3628)) ([c3dd74e](feast-dev@c3dd74e))
* Fix timestamp consistency in push api ([feast-dev#3614](feast-dev#3614)) ([9b227d7](feast-dev@9b227d7))
* For SQL registry, increase max data_source_name length to 255 ([feast-dev#3630](feast-dev#3630)) ([478caec](feast-dev@478caec))
* Implements connection pool for postgres online store ([feast-dev#3633](feast-dev#3633)) ([059509a](feast-dev@059509a))
* Manage redis pipe's context ([feast-dev#3655](feast-dev#3655)) ([48e0971](feast-dev@48e0971))
* Missing Catalog argument in athena connector ([feast-dev#3661](feast-dev#3661)) ([f6d3caf](feast-dev@f6d3caf))
* Optimize bytes processed when retrieving entity df schema to 0 ([feast-dev#3680](feast-dev#3680)) ([1c01035](feast-dev@1c01035))

### Features

* Add gunicorn for serve with multiprocess ([feast-dev#3636](feast-dev#3636)) ([4de7faf](feast-dev@4de7faf))
* Use string as a substitute for unregistered types during schema inference ([feast-dev#3646](feast-dev#3646)) ([c474ccd](feast-dev@c474ccd))
* Add fully-qualified-table-name Redshift prop

Signed-off-by: Robin Neufeld <[email protected]>

* pre-commit

Signed-off-by: Robin Neufeld <[email protected]>

* Docstring

Signed-off-by: Robin Neufeld <[email protected]>

* Test fully_qualified_table_name

Signed-off-by: Robin Neufeld <[email protected]>

* Simplify logic

Signed-off-by: Robin Neufeld <[email protected]>

* pre-commit

Signed-off-by: Robin Neufeld <[email protected]>

* pre-commit

Signed-off-by: Robin Neufeld <[email protected]>

* Test offline_write_batch

Signed-off-by: Robin Neufeld <[email protected]>

* Bump to trigger CI

Signed-off-by: Robin Neufeld <[email protected]>

* another bump for ci

Signed-off-by: Robin Neufeld <[email protected]>

---------

Signed-off-by: Robin Neufeld <[email protected]>
…SA role (feast-dev#3696)

Add aws-sts dependency in java sdk

Signed-off-by: harmeet-singh-discovery <[email protected]>
…e-functionality"

This reverts commit 8487678, reversing
changes made to 0578b9b.
collection_available = utility.has_collection(table_to_keep.name)
try:
if collection_available:
logger.error(f"Collection {table_to_keep.name} already exists.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this log statement on "info" level. If a collection exists it is not an error and is expected.



@dataclass
class MockFeatureView:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary to use a mock? Can you use a VectorFeatureView instead?


# Assert that logger.info was called with expected messages
expected_log_calls = [
mocker.call("Closing the connection to Milvus"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verification through logs is not a good practice in unit tests. Look into methods of "connections" like get_connection_addr() https://github.com/milvus-io/pymilvus/blob/master/pymilvus/orm/connections.py#L396 to use for the assert statements

assert len(utility.list_collections()) == 1
assert utility.has_collection(self.collection_to_write) is True
assert (
f"Collection {self.collection_to_write} has been created successfully."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to assert a specific log statement here

partial=None,
)

assert f"Collection {self.collection_to_write} already exists." in caplog.text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert with list_collections() == 1 instead


online_store_creator.teardown()
MilvusOnlineStore().update(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the milvus online store here makes this test dependent on update() working as expected. Instead creating a collection via the pymilvus sdk directly would be better here to simulate that a collection already exists.

if collection_available:
logger.error(f"Collection {table_to_keep.name} already exists.")
else:
Collection(name=table_to_keep.name, schema=table_to_keep.schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just catching this. FeatureView schemas are not the same as the schema that Milvus expects. We will need to write logic to translate that.

]
)

MilvusOnlineStore().update(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would also rather create the collection first through the pymilvus sdk so that we do not introduce tests that are interdependent.

assert utility.has_collection(self.collection_to_delete) is False
assert len(utility.list_collections()) == 0
assert (
f"Collection {self.collection_to_delete} has been deleted successfully."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not assert log-statements. If the wording changes one has to go and update the tests.

)

assert (
f"Collection {self.collection_to_delete} does not exist or is already deleted."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check with assert connections.has_connection(self.collection_to_delete) instead

f"Collection {table_to_delete.name} has been deleted successfully."
)
else:
return logger.error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove return statement

field_name = field.name
description = field.tags.get("description")
is_primary = boolean_mapping_from_string.get(field.tags.get("is_primary"))
dimension = field.tags.get("dimension")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimensions have to be retrieved from VectorFeatureView. Also, the name of the column holding the vector is defined separately in VectorFeatureView ("vector_field")

is_primary = boolean_mapping_from_string.get(field.tags.get("is_primary"))
dimension = field.tags.get("dimension")

if dimension is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

@@ -50,6 +50,7 @@ class VectorFeatureView(BaseFeatureView):

# inheriting from FeatureView wouldn't work due to issue with conflicting proto classes
# therefore using composition instead
name: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature_view already has an attribute name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not reflect and threw an error when I tried to add name


def setup_method(self, milvus_online_setup):
# Ensuring that the collections created are dropped before the tests are run
with PymilvusConnectionContext():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to re-use MilvusConnectionManager in order to not duplicate code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the test not dependent on any functionality in Milvus online store. This context manager is a very simple open and close connection, purely for testing purposes. Will not add in username, password, etc.

tags={
"is_primary": "False",
"description": "float32",
"dimension": "128",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it this is actually a great suggestion to use a tag to pass along the number of dimensions. I think we can do likewise with index algorithm and eliminate the need for a VectorFeatureView altogether. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats a good idea

)

# Here we want to open and add a collection using pymilvus directly and close the connection, we need to check if the collection count remains 1 and exists.
with PymilvusConnectionContext():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the embedded Milvus actually persist data after it is stopped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does

partial=None,
)

assert "Collection abc does not exist or is already deleted." in caplog.text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I'd rather assert against Milvus not having any collection. The update() method should really have a return value, but that is an issue in Feast itself

Copy link
Collaborator

@michaelbackes michaelbackes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

@Manisha4 Manisha4 merged commit 2c7b7b1 into master Aug 11, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.