Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Bug: IVectorStore::GetCollection throws away HasNamedVectors #9520

Closed
lilhoser opened this issue Nov 4, 2024 · 3 comments
Closed

.Net: Bug: IVectorStore::GetCollection throws away HasNamedVectors #9520

lilhoser opened this issue Nov 4, 2024 · 3 comments
Assignees
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code

Comments

@lilhoser
Copy link

lilhoser commented Nov 4, 2024

Describe the bug
When retrieving an existing collection, the GetCollection API creates a new QdrantVectorStoreRecordCollectionOptions object that passes the caller's VectorStoreRecordDefinition parameter but does not consider HasNamedVectors which is a crucial collection option for Qdrant:

var recordCollection = new QdrantVectorStoreRecordCollection<TRecord>(this._qdrantClient, name, new QdrantVectorStoreRecordCollectionOptions<TRecord>() { VectorStoreRecordDefinition = vectorStoreRecordDefinition });

Source

This also throws away other options that might have been specified when the caller created the collection, such as PointStructCustomMapper.

As a result, the default value of HasNamedVectors is used, which is false. Later vector searches that try to search on a specific named vector causes an exception:

One or more errors occurred. (Call to vector store failed.)

To Reproduce
Steps to reproduce the behavior:

  1. Create a collection:
var collection = new QdrantVectorStoreRecordCollection<MyRecord>(
    new QdrantClient("localhost"),
    "mycollection",
    new()
    {
        HasNamedVectors = true,
        PointStructCustomMapper = new Mapper(),
        VectorStoreRecordDefinition =RecordDefinition
    });
await collection.CreateCollectionAsync();
  1. Use IVectorStore.GetCollection to retrieve that collection - notice that the retrieved collection has HasNamedVectors set to false even though it was created with true:
var store = m_Kernel.GetRequiredService<IVectorStore>();
var collection = store.GetCollection<ulong, MyRecord>("mycollection", RecordDefinition);
  1. Issue a vector search:
var embedding = m_Kernel.GetRequiredService<ITextEmbeddingGenerationService>();
var searchVector = await embedding.GenerateEmbeddingAsync(query);
var searchOptions = new VectorSearchOptions
{
    VectorPropertyName = nameof(RecordDefinition.MyEmbedding)
};
var collection = store.GetCollection<ulong, MyRecord>("mycollection", RecordDefinition);
var searchResult = await collection.VectorizedSearchAsync(searchVector, searchOptions);

Expected behavior
GetCollection should allow the caller to pass in the QdrantVectorStoreRecordCollectionOptions it used to create the collection, or better yet, this should not be a required parameter at all and SK should figure out how to retrieve the collection instead of creating new parameters that alters the behavior of the collection.

Platform

  • OS: Windows 11
  • IDE: VS 2022
  • Language: C#
@lilhoser lilhoser added the bug Something isn't working label Nov 4, 2024
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Nov 4, 2024
@github-actions github-actions bot changed the title .NET: Bug: IVectorStore::GetCollection throws away HasNamedVectors .Net: Bug: IVectorStore::GetCollection throws away HasNamedVectors Nov 4, 2024
@westey-m
Copy link
Contributor

westey-m commented Nov 4, 2024

@lilhoser, thanks for filing a bug, I've created a PR to fix the issue where HasNamedVectors was not passed from VectorStore To Collection. With regards to your design suggestion:

GetCollection should allow the caller to pass in the QdrantVectorStoreRecordCollectionOptions it used to create the collection

Since GetCollection is part of the abstraction and doesn't know anything about Qdrant it cannot be modified to allow Qdrant options be provided. There is of course the option of adding overloads of GetCollection that are only on the concrete implementations e.g. QdrantVectorStore and not on IVectorStore, which would allow passing Qdrant options, but this would live outside of the abstraction. We have been considering this and if there is interest we can certainly add these.

this should not be a required parameter at all and SK should figure out how to retrieve the collection instead of creating new parameters that alters the behavior of the collection

While it is possible to make a request to Qdrant on first use, each time a collection object was created, to determine whether it is using named or unnamed vectors, it isn't good to subject all consumers to this overhead. This also doesn't address the scenario where a developer wants to use a specific mode, and are using SK to create the collection. In this case there isn't a way to know which mode the developer prefers without the developer specifying it.

github-merge-queue bot pushed a commit that referenced this issue Nov 4, 2024
…ion in GetCollection. (#9523)

### Motivation and Context

We allow users to provide a setting to indicate whether they are using a
collection which has named vectors or not when using Qdrant.

#9520

### Description

- Fixing a bug where the has named vectors setting wasn't passed from
the vector store to the collection in GetCollection.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
@lilhoser
Copy link
Author

lilhoser commented Nov 4, 2024

Appreciate those details!

It's unfortunate that to get a collection object out of Qdrant, you have to specify its configuration (QdrantVectorStoreRecordCollectionOptions), because when/if this class ever changes, all places that instantiate this class will need to be updated to propagate correct values to the new field(s).

Perhaps the overload is the way to go (a generic GetCollection<T> ?), but I'm not quite familiar enough with some of the SK abstractions to offer better suggestions.

@westey-m
Copy link
Contributor

westey-m commented Nov 5, 2024

Created a feature for adding concrete implementation overloads #9537

@westey-m westey-m closed this as completed Nov 5, 2024
@github-project-automation github-project-automation bot moved this from Sprint: In Review to Sprint: Done in Semantic Kernel Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code
Projects
Archived in project
Development

No branches or pull requests

3 participants