Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(firestore): Adding distance threshold and result field #10802

Merged
merged 8 commits into from
Sep 11, 2024

Conversation

bhshkh
Copy link
Contributor

@bhshkh bhshkh commented Sep 1, 2024

The proposed changes in current PR can be used like below:

client.Collection(collName).
    FindNearest("path", []float32{5, 6, 7}, limit, DistanceMeasureEuclidean, firestore.FindNearestOptions{
				DistanceThreshold:   Ptr(20.0),
				DistanceResultField: resultField,
			}).
   

Two new fields have been added to FindNearest proto:

// Optional. Optional name of the field to output the result of the vector
// distance calculation. Must conform to [document field
// name][google.firestore.v1.Document.fields] limitations.
DistanceResultField string `protobuf:"bytes,5,opt,name=distance_result_field,json=distanceResultField,proto3" json:"distance_result_field,omitempty"`
// Optional. Option to specify a threshold for which no less similar
// documents will be returned. The behavior of the specified
// `distance_measure` will affect the meaning of the distance threshold.
// Since DOT_PRODUCT distances increase when the vectors are more similar,
// the comparison is inverted.
//
// For EUCLIDEAN, COSINE: WHERE distance <= distance_threshold
// For DOT_PRODUCT: WHERE distance >= distance_threshold
DistanceThreshold *wrapperspb.DoubleValue `protobuf:"bytes,6,opt,name=distance_threshold,json=distanceThreshold,proto3" json:"distance_threshold,omitempty"`

  1. DistanceResultField
  2. DistanceThreshold

'DistanceThreshold' is included in 'FindNearestOptions' as a pointer to a float64 (*float64) rather than a simple float64. This distinction allows us to differentiate between cases where the value is explicitly set and cases where it defaults to 0.

If vectors are like below and the query vector is {3.0, 1.0, 2.0}

docs := []CoffeeBean{
        {
	        Name:           "Kahawa coffee beans",
	        Description:    "Information about the Kahawa coffee beans.",
	        EmbeddingField: []float32{1.0, 2.0, 3.0},
	        Color:          "red",
        },
        {
	        Name:           "Onyx coffee beans",
	        Description:    "Information about the Onyx coffee beans.",
	        EmbeddingField: []float32{4.0, 5.0, 6.0},
	        Color:          "brown",
        },
        {
	        Name:           "SleepyOwl coffee beans",
	        Description:    "Information about the Kahawa coffee beans.",
	        EmbeddingField: []float32{3.0, 1.0, 2.0},
	        Color:          "red",
        },
}

then:
-- distance threshold 0 returns:

map[color:red description:Information about the Kahawa coffee beans. embedding_field:[3 1 2] name:SleepyOwl coffee beans]

-- distance threshold unset returns:

map[color:red description:Information about the Kahawa coffee beans. embedding_field:[3 1 2] name:SleepyOwl coffee beans]
map[color:red description:Information about the Kahawa coffee beans. embedding_field:[1 2 3] name:Kahawa coffee beans]
map[color:brown description:Information about the Onyx coffee beans. embedding_field:[4 5 6] name:Onyx coffee beans]

@product-auto-label product-auto-label bot added the api: firestore Issues related to the Firestore API. label Sep 1, 2024
@bhshkh bhshkh requested a review from jba September 1, 2024 16:03
@bhshkh bhshkh changed the title feat(firestore): Adding Vector search distance threshold and result f… feat(firestore): Adding distance threshold and result field Sep 1, 2024
@bhshkh bhshkh force-pushed the feature/fs-vs-threshold branch from 62ad34e to ab60ebc Compare September 1, 2024 16:08
@bhshkh bhshkh marked this pull request as ready for review September 1, 2024 16:11
@bhshkh bhshkh requested review from a team as code owners September 1, 2024 16:11
@jba
Copy link
Contributor

jba commented Sep 3, 2024

I believe a distance threshold of 0 is the same as unset, since distances can never be negative.
So I think using a float64 field is OK.

@bhshkh
Copy link
Contributor Author

bhshkh commented Sep 3, 2024

I believe a distance threshold of 0 is the same as unset, since distances can never be negative. So I think using a float64 field is OK.

@jba It yields different results:

If vectors are like below and the query vector is {3.0, 1.0, 2.0}

docs := []CoffeeBean{
        {
	        Name:           "Kahawa coffee beans",
	        Description:    "Information about the Kahawa coffee beans.",
	        EmbeddingField: []float32{1.0, 2.0, 3.0},
	        Color:          "red",
        },
        {
	        Name:           "Onyx coffee beans",
	        Description:    "Information about the Onyx coffee beans.",
	        EmbeddingField: []float32{4.0, 5.0, 6.0},
	        Color:          "brown",
        },
        {
	        Name:           "SleepyOwl coffee beans",
	        Description:    "Information about the Kahawa coffee beans.",
	        EmbeddingField: []float32{3.0, 1.0, 2.0},
	        Color:          "red",
        },
}

then:
-- distance threshold 0 returns:

map[color:red description:Information about the Kahawa coffee beans. embedding_field:[3 1 2] name:SleepyOwl coffee beans]

-- distance threshold unset returns:

map[color:red description:Information about the Kahawa coffee beans. embedding_field:[3 1 2] name:SleepyOwl coffee beans]
map[color:red description:Information about the Kahawa coffee beans. embedding_field:[1 2 3] name:Kahawa coffee beans]
map[color:brown description:Information about the Onyx coffee beans. embedding_field:[4 5 6] name:Onyx coffee beans]

@jba
Copy link
Contributor

jba commented Sep 4, 2024

OK, but I still think we should put them in the options struct. They are actually options for FindNearest, that is the right place for them.
We can make the pointer stuff easier with a helper function. Here's what we did in the Gemini client: https://pkg.go.dev/github.com/google/[email protected]/genai#Ptr.

@bhshkh
Copy link
Contributor Author

bhshkh commented Sep 4, 2024

OK, but I still think we should put them in the options struct. They are actually options for FindNearest, that is the right place for them. We can make the pointer stuff easier with a helper function. Here's what we did in the Gemini client: https://pkg.go.dev/github.com/google/[email protected]/genai#Ptr.

This is awesome! Thanks for sharing.
Updated

@bhshkh bhshkh enabled auto-merge (squash) September 5, 2024 03:24
@jba
Copy link
Contributor

jba commented Sep 5, 2024

Looks pretty good. We should not submit until this whole feature goes GA.

@bhshkh bhshkh added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 5, 2024
firestore/integration_test.go Outdated Show resolved Hide resolved
},
{
desc: "FindNearest threshold and resultField",
vq: collRef.FindNearest(queryField, []float64{1, 2, 3}, 3, DistanceMeasureEuclidean, &FindNearestOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an example for this in examples_test.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we maintain examples_test.go. Don't the sample snippets in golang-samples repo serve the same purpose ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there'd be one source of truth, but now they serve different purposes. The ones in example_test.go appear in pkgsite: https://pkg.go.dev/cloud.google.com/go/firestore#pkg-examples. The samples are linked from the GCP documentation.

firestore/query.go Outdated Show resolved Hide resolved
firestore/query.go Outdated Show resolved Hide resolved
firestore/query.go Show resolved Hide resolved
firestore/integration_test.go Show resolved Hide resolved
firestore/integration_test.go Outdated Show resolved Hide resolved
firestore/query_test.go Outdated Show resolved Hide resolved
firestore/query_test.go Outdated Show resolved Hide resolved
firestore/query_test.go Outdated Show resolved Hide resolved
@bhshkh bhshkh force-pushed the feature/fs-vs-threshold branch from 9c4c7c3 to 0e47cc5 Compare September 10, 2024 19:30
@bhshkh bhshkh force-pushed the feature/fs-vs-threshold branch from 0e47cc5 to aaf18bd Compare September 10, 2024 19:32
@bhshkh bhshkh removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Sep 10, 2024
@bhshkh
Copy link
Contributor Author

bhshkh commented Sep 10, 2024

Removed do not merge label, since feature is now GA

@bhshkh bhshkh requested a review from jba September 10, 2024 19:44
Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// DistanceResultField specifies name of the document field to output the result of
// the vector distance calculation.
// If the field already exists in the document, its value get overwritten with the distance calculation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/get/gets/

@bhshkh bhshkh disabled auto-merge September 11, 2024 19:56
@bhshkh bhshkh enabled auto-merge (squash) September 11, 2024 19:58
@bhshkh bhshkh merged commit e9a551e into googleapis:main Sep 11, 2024
8 checks passed
@bhshkh bhshkh deleted the feature/fs-vs-threshold branch September 11, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the Firestore API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants