Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_to and multifields support for semantic_text #1

Conversation

carlosdelest
Copy link

Adds copy_to and multifields support to semantic_text.

This needs to be merged after elastic#106560 is merged, as it is based on it.

Changes:

Iterate on the source fields for calculating inference
Allow inference to be applied from multiple responses to a single field
Parser needs to check field type for multifields

rockdaboot and others added 30 commits March 19, 2024 18:33
* WIP Support ENRICH MATCH on TEXT

* Disallow KEYWORD from range enrich

The ingest processor does not support this, and there is no keyword_range type to complement the numerical, date and ip range types.

* Revert: Disallow KEYWORD from range enrich

We allow using KEYWORD to range match against ip_range.

* Update docs/changelog/106435.yaml

* Improve changelog entry

* Added yaml test for ENRICH on TEXT fields

* Allow TEXT for range, so text matches IP-range (plus test)
Packaging tests have several files that may be useful in debugging
failures. Additionally, we sometimes have assertions for which we want
to catch them and emit additional debugging info. This commit guards
the common ways that Elasticsearch is started and assertions are run
with dumping all debug information available.
The shutdown integration tests test scenarios across multiple nodes.
When checking if a shard is moved off a node that is shutting down, the
shard migration status may not yet have been updated. This commit adds a
busy wait to ensure the status has time to update before failing the
test.

closes elastic#77488
NodeShutdownIT.testStalledShardMigrationProperlyDetected has been muted
for a couple years. It apparently reproduced when the failure first
started, but no longer reproduces on main. This commit re-enables the
test and closes the test issue. We can open a new issue with any
subsequent failure.

closes elastic#77456
This makes a couple of changes to regex processing in the compute
engine:
1. Process utf-8 strings directly. This should save a ton of time.
2. Snip the `toString` output if it is too big - I chose 64kb of
   strings.
3. I changed the formatting of the automaton to a slightly customized
   `dot` output. Because automata are graphs. Everyone knows it. And
   they are a lot easier to read as graphs. `dot` is easy to convert
   into a graph.
4. I implement `EvaluatorMapper` for regex operations which is pretty
   standard for the rest of our operations.
This modifies the ESQL test infrastructure to generate more of the
documentation for functions. It generates the *Description* section, the
*Examples* section, and the *Parameters* section as separate files so we
can use them as needed. It also generates a `layout` file that's just
a guess as to how to render the whole thing. In some cases it'll work
and we can use that instead of hand maintaining a "top level"
description file for the function.

Most newly generated files are unused. We have to chose to pick them up
by replacing the sections we were manually maintaining with an include
of the generated section. Or by replacing the entire hand maintained
file with the generated top level file.

Relates to elastic#104247
…tic#106505)

The distributions already have correct permissions set on native
libraries copied to them. However, the build itself to extract the
native libs relies on the upstream file permissions. This commit sets
explicit permissions on the copy task which extracts native libraries.
)

Since mrjars may use preview apis, forbidden apis must know about any
preview apis from the jdk. However, we do not run forbidden apis with
the preview enabled flag, nor in a separate jvm, so it does not know
about these classes. Thus we ignore missing classes on source sets added
by the mrjar plugin.

This commit configures all sourcesets added by mrjar plugin to ignore
forbidden apis missing classes.
The task for updating cluster state with nodes seen by shutdown was
previously switched to use batched tasks. However, the task is never
marked as complete, which leads to the tasks piling up. This commit
marks the task as complete and re-enables a test that appears to succeed
now.

closes elastic#76689
When we use `ROW` in ESQL we pick a random data set by just iterating
the `Map`. It's random. Yay! And some of them don't work in this place.
This just picks one that we know works.

Closes elastic#106501
* Working tests

* Adding more tests

* Adding comment

* Switching to micros and addressing feedback

* Removing nanos and adding test for bug fix

---------

Co-authored-by: Elastic Machine <[email protected]>
If we proceed without waiting for pages, we might cancel the main 
request before starting the data-node request. As a result, the exchange
sinks on data-nodes won't be removed until the inactive_timeout elapses,
which is longer than the assertBusy timeout.

Closes elastic#106443
Add missing _to_ in sentence

(cherry picked from commit 40a9155)

Co-authored-by: Aaron Hanusa <[email protected]>
The scope here is to expose a method (Realms#getRealmRef) that can be used
to retrieve the realm domain assignments for any realm id.
+ Add esql as rest test dependency for ml/native-multi-node-tests to work around the mixed testClusters/TestCluster nodes (so all have the esql plugin installed)
This commits exposes the query transport action to improve usage. While one can perform all operations prior to this change, it has been suggested that adding the action would improve the symmetry of the API by allowing e.g.

client().execute(builder.action(), builder.request()).actionGet(30, SECONDS);
* Add links to text_expansion in ELSER tutorial

* Apply suggestions from code review

Co-authored-by: Liam Thompson <[email protected]>

---------

Co-authored-by: Liam Thompson <[email protected]>
Test tweaks for serverless:

* Valid application name in API key tests
* Move from `cluster.health` to `info` call in roles test (the call is just used to check that a user with a cluster privilege is indeed able to execute the test)

Closes: ES-7987
jonathan-buttner and others added 28 commits March 27, 2024 21:10
I realized I forgot to add some namedwritables to our registry. I've
forgotten this multiple times.

Any ideas how we can improve this so we get failures if we forget in the
future?
Add new optional request option, `with_profile_uid`,
to the Get and Query API Key Information endpoints,
to return the API keys owner users' profile uid.

Closes elastic#98939
Will restore the assert on the metric in a follow-up PR.

Related to elastic#106834
In this PR we introduce the API that will expose the global retention configuration and will allow users to take advantage of it.

These APIs are protected by the dedicated introduced privileges:

`manage_data_stream_global_retention` or higher, which allows all operations on the global retention configuration
`monitor_data_stream_retention` or higher, which allows the retrieval of the global retention configuration.

This PR is the final PR that makes the global retention available for our users.
For now skip tests when flaky hdfs cluster cannot be started. Investigating further without
bothering others and keeping pipeline green
…ndex pattern (elastic#106815)

* Update KibanaOwnedReservedRoleDescriptors.java

* replaced all with read, delete_index
Regular feature names are extracted together with historical features during feature metadata extraction.
Based on this, feature checks in tests are validated to use only known features to prevent tests from
being silently disabled due to a invalid or misspelled feature name.
---------

Co-authored-by: Lorenzo Dematte <[email protected]>
…elds (elastic#106862)

The SearchExecutionContext supports the notion of allowed fields, provided via a specific setter method.
Fields are though only filtered for the getFieldType method. There needs to be consistency between getMatchingFieldNames and getFieldType.
In fact there are places in the code where getMatchingFieldNames is called to resolve field name patterns, and later getFieldType is called
on each of the resolved fields. If the former resolves to one field that we can't retrieve a field type for, that is unexpected and to be considered a bug.

In addition, this commit adds consistency for getAllFields: this is only called by field caps, hence a different codepath that does not seem to set allowed fields
for now, but it's important for the context to provide consistency around fields access, especially for methods that are as broad as getAllFields,
despite their currently very specific usage.

This surfaced as we are trying to move fetching of the `_ignored` field to use value fetchers, which use a search execution context and resolve the field type,
whereas until now they are retrieved directly via StoredFieldsPhase and completely bypass such check.

This commit also adds a test that was missing around verifying that SearchExecutionContext applies the allowedFields predicate when provided.
…ce metadata in `IndexMetadata` (elastic#106743)

This change refactors the integration of the field inference metadata in IndexMetadata. Instead of partial diffs, the new class simply sends the entire object as diff if it has changed.
This PR also rename the fields and methods related to the inference fields consistently.
The inference phase (in the transport shard bulk action) is also changed so that inference is not called if:

The document contains a value for the inference input.
The document also contains a value for the inference results of that field (in the _inference map).
If the document contains no value for the inference input but an inference result for that field, it is marked as failed.
---------

Co-authored-by: carlosdelest <[email protected]>
Stop and Start error messages include the reason for the error followed
by the suggestion to use force=true.  This may cause the suggestion to
be hidden by the reason, so we will move the reason after the
suggestion.

Close elastic#106819
Like Block#filter, Block#expand should return the specific type of the 
original block, rather than a generic block type. For instance, the
expanded block of an IntBlock should also be an IntBlock. I encountered
a situation where I had to cast the expanded block.
…Input` (elastic#106794)

There's loads of scenarios where we create very small slices (as in less
than buffer size) from input that already have these bytes buffered.
(BKDReader#packedIndex for example)
We can save considerable memory as well as potential IO to disk or
worse-yet the blob store by just slicing the buffer if possible.
Outside of the case of slicing and never reading from the slice,
this should always save memory.
FieldName does not make much sense as an abstract class with a single private subclass.
Also, the base implementation holds most of the fields that the subclass relies on to do its job.
They can be unified into a single class
This adds an OPTIONS clause to FROM, allowing to specify search or index
resolution options, such as: preference, allow_no_indices or
ignore_unavailable.
…elastic#106624)

Assert using greaterThanOrEqualTo to allow for additional scheduled background threads to 
appear in collected measurements after the thread pool stats have already been pulled, 
e.g. this could be the case for the cluster coordination thread pool.
…06881)

* Update 8.13 release notes with known issue

* revert unintended

* reword

* reword

* reword
…copy-to-support-inference

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/FieldTypeLookupTests.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/cluster/metadata/SemanticTextClusterMetadataTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_inference.yml
jimczi pushed a commit that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.