forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy_to and multifields support for semantic_text #1
Closed
carlosdelest
wants to merge
277
commits into
jimczi:register_semantic_text
from
carlosdelest:carlosdelest/semantic-text-copy-to-support-inference
Closed
copy_to and multifields support for semantic_text #1
carlosdelest
wants to merge
277
commits into
jimczi:register_semantic_text
from
carlosdelest:carlosdelest/semantic-text-copy-to-support-inference
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* WIP Support ENRICH MATCH on TEXT * Disallow KEYWORD from range enrich The ingest processor does not support this, and there is no keyword_range type to complement the numerical, date and ip range types. * Revert: Disallow KEYWORD from range enrich We allow using KEYWORD to range match against ip_range. * Update docs/changelog/106435.yaml * Improve changelog entry * Added yaml test for ENRICH on TEXT fields * Allow TEXT for range, so text matches IP-range (plus test)
Packaging tests have several files that may be useful in debugging failures. Additionally, we sometimes have assertions for which we want to catch them and emit additional debugging info. This commit guards the common ways that Elasticsearch is started and assertions are run with dumping all debug information available.
The shutdown integration tests test scenarios across multiple nodes. When checking if a shard is moved off a node that is shutting down, the shard migration status may not yet have been updated. This commit adds a busy wait to ensure the status has time to update before failing the test. closes elastic#77488
NodeShutdownIT.testStalledShardMigrationProperlyDetected has been muted for a couple years. It apparently reproduced when the failure first started, but no longer reproduces on main. This commit re-enables the test and closes the test issue. We can open a new issue with any subsequent failure. closes elastic#77456
This makes a couple of changes to regex processing in the compute engine: 1. Process utf-8 strings directly. This should save a ton of time. 2. Snip the `toString` output if it is too big - I chose 64kb of strings. 3. I changed the formatting of the automaton to a slightly customized `dot` output. Because automata are graphs. Everyone knows it. And they are a lot easier to read as graphs. `dot` is easy to convert into a graph. 4. I implement `EvaluatorMapper` for regex operations which is pretty standard for the rest of our operations.
This modifies the ESQL test infrastructure to generate more of the documentation for functions. It generates the *Description* section, the *Examples* section, and the *Parameters* section as separate files so we can use them as needed. It also generates a `layout` file that's just a guess as to how to render the whole thing. In some cases it'll work and we can use that instead of hand maintaining a "top level" description file for the function. Most newly generated files are unused. We have to chose to pick them up by replacing the sections we were manually maintaining with an include of the generated section. Or by replacing the entire hand maintained file with the generated top level file. Relates to elastic#104247
…tic#106505) The distributions already have correct permissions set on native libraries copied to them. However, the build itself to extract the native libs relies on the upstream file permissions. This commit sets explicit permissions on the copy task which extracts native libraries.
) Since mrjars may use preview apis, forbidden apis must know about any preview apis from the jdk. However, we do not run forbidden apis with the preview enabled flag, nor in a separate jvm, so it does not know about these classes. Thus we ignore missing classes on source sets added by the mrjar plugin. This commit configures all sourcesets added by mrjar plugin to ignore forbidden apis missing classes.
The task for updating cluster state with nodes seen by shutdown was previously switched to use batched tasks. However, the task is never marked as complete, which leads to the tasks piling up. This commit marks the task as complete and re-enables a test that appears to succeed now. closes elastic#76689
When we use `ROW` in ESQL we pick a random data set by just iterating the `Map`. It's random. Yay! And some of them don't work in this place. This just picks one that we know works. Closes elastic#106501
* Working tests * Adding more tests * Adding comment * Switching to micros and addressing feedback * Removing nanos and adding test for bug fix --------- Co-authored-by: Elastic Machine <[email protected]>
If we proceed without waiting for pages, we might cancel the main request before starting the data-node request. As a result, the exchange sinks on data-nodes won't be removed until the inactive_timeout elapses, which is longer than the assertBusy timeout. Closes elastic#106443
Add missing _to_ in sentence (cherry picked from commit 40a9155) Co-authored-by: Aaron Hanusa <[email protected]>
The scope here is to expose a method (Realms#getRealmRef) that can be used to retrieve the realm domain assignments for any realm id.
Empty read is [short-circuited](https://github.com/elastic/elasticsearch/blob/e8039b9ecb2451752ac5377c44a6a0c662087a9f/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java#L115-L116) without going to the blob store. In order to test s3 blob store, ranged read should read at least one byte. This PR ensures that. Resolves: elastic#105958
+ Add esql as rest test dependency for ml/native-multi-node-tests to work around the mixed testClusters/TestCluster nodes (so all have the esql plugin installed)
This commits exposes the query transport action to improve usage. While one can perform all operations prior to this change, it has been suggested that adding the action would improve the symmetry of the API by allowing e.g. client().execute(builder.action(), builder.request()).actionGet(30, SECONDS);
* Add links to text_expansion in ELSER tutorial * Apply suggestions from code review Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: Liam Thompson <[email protected]>
Test tweaks for serverless: * Valid application name in API key tests * Move from `cluster.health` to `info` call in roles test (the call is just used to check that a user with a cluster privilege is indeed able to execute the test) Closes: ES-7987
I realized I forgot to add some namedwritables to our registry. I've forgotten this multiple times. Any ideas how we can improve this so we get failures if we forget in the future?
Add new optional request option, `with_profile_uid`, to the Get and Query API Key Information endpoints, to return the API keys owner users' profile uid. Closes elastic#98939
Will restore the assert on the metric in a follow-up PR. Related to elastic#106834
In this PR we introduce the API that will expose the global retention configuration and will allow users to take advantage of it. These APIs are protected by the dedicated introduced privileges: `manage_data_stream_global_retention` or higher, which allows all operations on the global retention configuration `monitor_data_stream_retention` or higher, which allows the retrieval of the global retention configuration. This PR is the final PR that makes the global retention available for our users.
For now skip tests when flaky hdfs cluster cannot be started. Investigating further without bothering others and keeping pipeline green
…onse (elastic#106858) Add missing getter
…ndex pattern (elastic#106815) * Update KibanaOwnedReservedRoleDescriptors.java * replaced all with read, delete_index
Regular feature names are extracted together with historical features during feature metadata extraction. Based on this, feature checks in tests are validated to use only known features to prevent tests from being silently disabled due to a invalid or misspelled feature name. --------- Co-authored-by: Lorenzo Dematte <[email protected]>
…elds (elastic#106862) The SearchExecutionContext supports the notion of allowed fields, provided via a specific setter method. Fields are though only filtered for the getFieldType method. There needs to be consistency between getMatchingFieldNames and getFieldType. In fact there are places in the code where getMatchingFieldNames is called to resolve field name patterns, and later getFieldType is called on each of the resolved fields. If the former resolves to one field that we can't retrieve a field type for, that is unexpected and to be considered a bug. In addition, this commit adds consistency for getAllFields: this is only called by field caps, hence a different codepath that does not seem to set allowed fields for now, but it's important for the context to provide consistency around fields access, especially for methods that are as broad as getAllFields, despite their currently very specific usage. This surfaced as we are trying to move fetching of the `_ignored` field to use value fetchers, which use a search execution context and resolve the field type, whereas until now they are retrieved directly via StoredFieldsPhase and completely bypass such check. This commit also adds a test that was missing around verifying that SearchExecutionContext applies the allowedFields predicate when provided.
…ce metadata in `IndexMetadata` (elastic#106743) This change refactors the integration of the field inference metadata in IndexMetadata. Instead of partial diffs, the new class simply sends the entire object as diff if it has changed. This PR also rename the fields and methods related to the inference fields consistently. The inference phase (in the transport shard bulk action) is also changed so that inference is not called if: The document contains a value for the inference input. The document also contains a value for the inference results of that field (in the _inference map). If the document contains no value for the inference input but an inference result for that field, it is marked as failed. --------- Co-authored-by: carlosdelest <[email protected]>
Stop and Start error messages include the reason for the error followed by the suggestion to use force=true. This may cause the suggestion to be hidden by the reason, so we will move the reason after the suggestion. Close elastic#106819
Like Block#filter, Block#expand should return the specific type of the original block, rather than a generic block type. For instance, the expanded block of an IntBlock should also be an IntBlock. I encountered a situation where I had to cast the expanded block.
…Input` (elastic#106794) There's loads of scenarios where we create very small slices (as in less than buffer size) from input that already have these bytes buffered. (BKDReader#packedIndex for example) We can save considerable memory as well as potential IO to disk or worse-yet the blob store by just slicing the buffer if possible. Outside of the case of slicing and never reading from the slice, this should always save memory.
FieldName does not make much sense as an abstract class with a single private subclass. Also, the base implementation holds most of the fields that the subclass relies on to do its job. They can be unified into a single class
This adds an OPTIONS clause to FROM, allowing to specify search or index resolution options, such as: preference, allow_no_indices or ignore_unavailable.
…elastic#106624) Assert using greaterThanOrEqualTo to allow for additional scheduled background threads to appear in collected measurements after the thread pool stats have already been pulled, e.g. this could be the case for the cluster coordination thread pool.
…06881) * Update 8.13 release notes with known issue * revert unintended * reword * reword * reword
…copy-to-support-inference # Conflicts: # server/src/test/java/org/elasticsearch/index/mapper/FieldTypeLookupTests.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/cluster/metadata/SemanticTextClusterMetadataTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_inference.yml
jimczi
pushed a commit
that referenced
this pull request
Nov 6, 2024
…sion (#1…" (elastic#115827) This reverts commit 32dee6a.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds copy_to and multifields support to semantic_text.
This needs to be merged after elastic#106560 is merged, as it is based on it.
Changes:
Iterate on the source fields for calculating inference
Allow inference to be applied from multiple responses to a single field
Parser needs to check field type for multifields