-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL Add initial support for semantic_text field type #113920
Conversation
Hi @ioanatia, I've created a changelog YAML for you. |
8537aaa
to
36e4c7d
Compare
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
@@ -43,6 +44,7 @@ public static ElasticsearchCluster localCluster(ElasticsearchCluster remoteClust | |||
.setting("cluster.remote.connections_per_cluster", "1") | |||
.shared(true) | |||
.setting("cluster.routing.rebalance.enable", "none") | |||
.plugin("inference-service-test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this plugin is used for testing semantic_text
- it's an inference service that can create sparse or dense embeddings - that have no actual "semantic" meaning since they are not using a model, but they are supposed to be deterministic.
Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
request.setJsonEntity(""" | ||
{ | ||
"service": "test_service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the inference service from the inference-service-test
test plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's probably worth javadoc to explain that it's for the semantic text fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kinda surprises me that this is the first capability conditional in the test infra. How expensive is this service endpoint? Conditionally creating it is of course fine, but the test infra would be simpler if it was unconditionally registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test endpoint is quite lightweight. We could always register it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc was added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. @not-napoleon might want to take a look because this is beginning to follow your lead.
@@ -14,7 +14,7 @@ | |||
|
|||
public class EsqlSpecIT extends EsqlSpecTestCase { | |||
@ClassRule | |||
public static ElasticsearchCluster cluster = Clusters.testCluster(spec -> {}); | |||
public static ElasticsearchCluster cluster = Clusters.testCluster(spec -> spec.plugin("inference-service-test")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will everything need this plugin? should it be in testCluster
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only this one needs it - it's the only one for multi_node
that extends from EsqlSpecTestCase
that loads the CSV data sets.
Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
request.setJsonEntity(""" | ||
{ | ||
"service": "test_service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's probably worth javadoc to explain that it's for the semantic text fields.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/semantic_text.csv-spec
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantic_text
support, yay! 💯
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
request.setJsonEntity(""" | ||
{ | ||
"service": "test_service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test endpoint is quite lightweight. We could always register it.
There are some issues with loading this plugin for multi-cluster and mixed-versions when one of the cluster nodes in on 8.16.0:
multi-node and single-node should continue to work just fine. The checks |
Ok, mixed node testing will be avoided until the following issue is resolved as a follow up. #115166. (given that mixed mode will only be relevant when this PR is backported) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has had a number of review iterations. It's a great first step towards support for semantic search in ES|QL. LGTM 👍
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine test this please |
@elasticmachine update branch |
Thank you folks, especially @fang-xing-esql and @ChrisHegarty for your help on this PR! |
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
…5256) * Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
Support is added behind a feature flag. We could not simply use
EsqlCapabilities
since that's not available inesql-core
.Right now we have no support in existing functions.
I followed the approach for adding initial support for
date_nanos
which was also added behind a feature flag which allowed for incremental progress, rather than adding support for everything in one PR: #110205With this PR we will return
semantic_text
fields as part of the results and it will also allow us to refer tosemantic_text
fields in thematch
function (to run semantic search):realtes: #115103