Fast refresh indices should use search shards #113478

kingherc · 2024-09-24T15:38:05Z

Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards.

For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally.

Relates ES-9573
Relates ES-9579

elasticsearchmachine · 2024-09-30T14:35:32Z

Pinging @elastic/es-distributed (Team:Distributed)

arteam

LGTM! Looking forward to simplify things with ES-9563 after this PR is successfully rolled out.

henningandersen

I have a few comments/questions.

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java

henningandersen · 2024-10-03T09:54:11Z

server/src/main/java/org/elasticsearch/action/support/replication/PostWriteRefresh.java

-                    // Fast refresh indices do not depend on the unpromotables being refreshed
-                    boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings());
-                    if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) {
+                    if (location != null && indexShard.routingEntry().isSearchable() == false) {


This fixes it for future refreshes after the indexing node upgraded. But it does not guarantee immediate availability of the latest state on the search node. So we risk some seconds of non-realtime GET requests going backwards during such an upgrade? I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?

The reasoning here is this code runs on the primary/indexing node, and indeed that the indexing node will be upgraded after the search nodes.

But it does not guarantee immediate availability of the latest state on the search node.

Doesn't our upgrade process guarantee that, since search nodes are upgraded first?

So we risk some seconds of non-realtime GET requests going backwards during such an upgrade?

A non-realtime GET coordinated by an old search node will go the primary to execute.
A non-realtime GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A non-realtime GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should be fine as well. Not sure I see when/why it might go backwards?

I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?

A real-time GET coordinated by an old search node will go the primary to execute.
A real-time GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A real-time GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should use wait-for generation if necessary.

Please tell me if you see any corner cases I might have missed or not considered. It might be useful to think about the above combinations also for searches/mgets, but I believe it should be a similar story for them as well.

I think you are right that it works out. The upgrade will force a relocation, which forces a flush, bringing things back into order. Thanks.

server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java

server/src/main/java/org/elasticsearch/action/get/TransportShardMultiGetAction.java

kingherc

Thanks @henningandersen for the feedback! Feel free to review again.

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

server/src/main/java/org/elasticsearch/action/get/TransportGetAction.java

server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java

kingherc · 2024-10-03T13:21:31Z

server/src/main/java/org/elasticsearch/action/support/replication/PostWriteRefresh.java

-                    // Fast refresh indices do not depend on the unpromotables being refreshed
-                    boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings());
-                    if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) {
+                    if (location != null && indexShard.routingEntry().isSearchable() == false) {


The reasoning here is this code runs on the primary/indexing node, and indeed that the indexing node will be upgraded after the search nodes.

But it does not guarantee immediate availability of the latest state on the search node.

Doesn't our upgrade process guarantee that, since search nodes are upgraded first?

So we risk some seconds of non-realtime GET requests going backwards during such an upgrade?

A non-realtime GET coordinated by an old search node will go the primary to execute.
A non-realtime GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A non-realtime GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should be fine as well. Not sure I see when/why it might go backwards?

I think real-time GET requests will be saved by the wait-for generation, is that also your understanding?

A real-time GET coordinated by an old search node will go the primary to execute.
A real-time GET coordinated by a new search node, with an old primary node, will go the primary to execute.
A real-time GET coordinated by a new search node on a fully upgraded cluster, will be executed on the search node as is done for non-fast-refresh indices. Which should use wait-for generation if necessary.

Please tell me if you see any corner cases I might have missed or not considered. It might be useful to think about the above combinations also for searches/mgets, but I believe it should be a similar story for them as well.

Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579

henningandersen

Looks good, main issue remaining is the BitsetFilterCache.

server/src/main/java/org/elasticsearch/action/get/TransportShardMultiGetAction.java

server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java

henningandersen · 2024-10-04T09:54:18Z

server/src/main/java/org/elasticsearch/action/support/replication/PostWriteRefresh.java

-                    // Fast refresh indices do not depend on the unpromotables being refreshed
-                    boolean fastRefresh = IndexSettings.INDEX_FAST_REFRESH_SETTING.get(indexShard.indexSettings().getSettings());
-                    if (location != null && (indexShard.routingEntry().isSearchable() == false && fastRefresh == false)) {
+                    if (location != null && indexShard.routingEntry().isSearchable() == false) {


I think you are right that it works out. The upgrade will force a relocation, which forces a flush, bringing things back into order. Thanks.

…ast-refresh-rco

As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.

henningandersen

LGTM.

server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java

…ast-refresh-rco

elasticsearchmachine · 2024-10-07T18:16:18Z

💚 Backport successful

Status	Branch	Result
✅	8.x

Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579

ywangd · 2024-10-08T03:48:04Z

IIUC, it is not absolutely necessary to backport this PR to 8.16 since the change affects serverless only and serverless works on the main branch only? I am trying understanding the reason here in case it applies to any future work. Thanks!

kingherc · 2024-10-08T08:33:25Z

Hi @ywangd , I think I backported it because I saw the transport versions are still on 8 major version, so somehow it made sense in my mind this should be backported. But, no, it was not necessary to backport it indeed. And it has nothing to do with any future work. Nor does it affect stateful.

ywangd · 2024-10-08T12:57:26Z

Thanks for the explanation. 🙏

Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579

As recognized in PR #113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.

As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.

As recognized in PR #113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.

Fast refresh indices should now behave like non fast refresh indices in how they execute (m)gets and searches. I.e., they should use the search shards. For BWC, we define a new transport version. We expect search shards to be upgraded first, before promotable shards. Until the cluster is fully upgraded, the promotable shards (whether upgraded or not) will still receive and execute gets/searches locally. Relates ES-9573 Relates ES-9579

As recognized in PR elastic#113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this.

* Avoid leaking blackholed register ops in tests (#114287) Today when we reboot a node in a test case derived from `AbstractCoordinatorTestCase` we lose the contents of `blackholedRegisterOperations`, but it's important that these operations _eventually_ run. With this commit we copy these operations over into the new node. * Mute org.elasticsearch.xpack.esql.qa.single_node.RestEsqlIT testProfileOrdinalsGroupingOperator {SYNC} #114380 * Skip storing ignored source for single-element leaf arrays (#113937) * Minimize storing array source * restrict to fields * revert changes for `addIgnoredFieldFromContext` * fix test * spotless * count nulls * Mute org.elasticsearch.xpack.inference.services.cohere.CohereServiceTests testInfer_StreamRequest #114385 * Add mappings for OTel event body (#114332) Also changes mappings from body_* to body.* * Revert "Fix BWC for file-settings based role mappings (#113900)" and related (#114326) Revert "Fix BWC for file-settings based role mappings (#113900)" and related changes. Reverted commits: - 763764c7fac0d5738534e632d7da327711a272d0 - bc8f9dc7f3882a461d4b89d69c7554a4cb3858ac - ce07060dce69f961c0906079529e91c7dd7d4b48 This is due to a bug in the above fix. We will reintroduce a paired down version of the fix in a subsequent PR. * Update forcemerge.asciidoc (#114377) As per request https://github.com/elastic/elasticsearch/pull/114315#issuecomment-2400521895 doing the PR on the main branch. * Avoid noisy errors in testSyntheticSourceKeepArrays (#114391) * Minimize storing array source * restrict to fields * revert changes for `addIgnoredFieldFromContext` * fix test * spotless * count nulls * Avoid noisy errors in testSyntheticSourceKeepArrays * update * update * update * update * Mute org.elasticsearch.index.mapper.extras.ScaledFloatFieldMapperTests testSyntheticSourceKeepArrays #114406 * Entitlements for System.exit (#114015) * Entitlements for System.exit * Respond to Simon's comments * Rename trampoline -> bridge * Require exactly one bridge jar * Use Type helpers to generate descriptor strings * Various cleanup from PR comments * Remove null "receiver" for static methods * Use List<Type> instead of voidDescriptor * Clarifying comment * Whoops, getMethod * SuppressForbidden System.exit * Spotless * Use embedded provider plugin to keep ASM off classpath * Oops... forgot the punchline * Move ASM license to impl * Use ProviderLocator and simplify bridgeJar logic * Avoid eager resolution of configurations during task configuration * Remove compile-time dependency agent->bridge --------- Co-authored-by: Mark Vieira <[email protected]> * Re-enable ScaledFloatFieldMapperTests.testSyntheticSourceKeepArrays (#114408) * Default enable cluster state role mapper (#114337) This PR default-enables cluster-state role mappings as the first part of the mitigation for a regression in ECK introduced by https://github.com/elastic/elasticsearch/pull/107410. Prior to this PR, cluster-state role mappings were written to cluster-state, but not read from it. With this PR, cluster-state role mappings will be read and used to assign roles to users, i.e. in user role resolution. However, they will not be included in the output of the [Get role mappings API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-get-role-mapping.html) yet. Exposing them via API is a target for a follow-up fix. Relates: ES-9628 Supersedes: https://github.com/elastic/elasticsearch/pull/113900 * Ensure green step in synonyms rule yaml test (#114400) Fixes test issue serverless 2922. * Unmute SecureHdfsSearchableSnapshotsIT * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym set not found} #114432 * Timeout on buildkite artifact upload but do not fail the build (#114430) this workarounds an issue we currently see on windows ci boxes where we run into timeouts in this step in our builds * Unmute many tests (#114431) These look to have been muted due to suite timeouts that we've since fixed. Let's try running these again. Closes #109687 Closes #112144 Closes #112624 Closes #113315 Closes #113316 Closes #113327 Closes #113340 * [ML] Remove threading from tests (#113212) We are getting InterruptedExceptions ~1% of the time when running these tests, but we can remove the threading from this test and still verify the one-by-one behavior. Fix #112471 Co-authored-by: Elastic Machine <[email protected]> * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Get a synonym rule} #114443 * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym rule not found} #114444 * Bump default timeout for test suites on Windows to 60 minutes (#114428) Co-authored-by: Elastic Machine <[email protected]> * Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService (#113981) * Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService * Update docs/changelog/113981.yaml * Updating AlibabaService chunkedInfer to handle sparse embedding task types --------- Co-authored-by: Elastic Machine <[email protected]> * Remove ccs_telemetry feature flag (#113825) This removes `ccs_telemetry` feature flag, and instead introduces an undocumented, true by default setting: - `search.ccs.collect_telemetry` - enables CCS search telemetry collection and `_cluster/stats?include_remote=true`. Can be disabled if this is causing any problems. * (Doc+) Link API doc to parent object - part2 (#113541) * (Doc+) Cross-link CAT APIs to parent object --------- Co-authored-by: Lisa Cawley <[email protected]> Co-authored-by: shainaraskas <[email protected]> * Remove type param from `BaseNodesRequest` (#114399) This type parameter is only needed so that the `.timeout(TimeValue)` method returns a request of the right type, but this still requires an unchecked cast. Yet there's no real need to return anything from this method, we can just use a regular setter. This commit does that. * LogsDB `host` and `timestamp` mappings tests (#114001) Here we are testing mappings of `host` and `timestamp` fields as they are used as default fields to sort on when using LogsDB. LogsDB uses a `host.name` field mapped as a `keyword` and a `@timestamp` field, required by data streams. Some mappings throw errors as a result of incompatibilities when trying to merge object fields. Such errors are expected. * Mute org.elasticsearch.xpack.inference.InferenceRestIT test {p0=inference/30_semantic_text_inference/Calculates embeddings using the default ELSER 2 endpoint} #114412 * Prevent flattening of ordered and unordered interval sources (#114234) This PR applies a temporary patch to fix an issue with ordered and unordered intervals source. The flattening that is applied in Lucene modifies the final gap preventing valid queries to match. The fix already exists in Lucene but will be released in Lucene 10.x later this year. Since the bug prevents the combination of ordered and unordered intervals with gaps, this change applies a workaround to ensure that the bug is fixed in Elasticsearch 8x. Relates #113554 * Mute org.elasticsearch.xpack.inference.InferenceRestIT test {p0=inference/40_semantic_text_query/Query a field that uses the default ELSER 2 endpoint} #114376 * [ML] Stream Anthropic Completion (#114321) Enable chat completion streaming responses for Anthropic's server sent events. Co-authored-by: Elastic Machine <[email protected]> * ESQL: Delay construction of warnings (#114368) Delay construction of `Warnings` until they are needed to save memory when evaluating many many many expressions. Most expressions won't use warnings at all and there isn't any need to make registering warnings super duper fast. So let's make the construction lazy to save a little memory. It's like 200 bytes per expression which isn't much, but it's possible to have thousands of expressions in a single query. Abusive, but possible. This also consolidates all `Warnings` usages to a single `Warnings` class. We had two. We don't need two. * Ensure that all rewriteable are called in retrievers (#114366) This PR ensures that all retriever applies the rewrite to all their rewriteable. Rewriting eagerly at the retriever level ensures that we don't rewrite the same query multiple times when compound retrievers are used. * Return `_ignored_source` only if explicitly requested via `stored_fields` or `fields` (#114145) We do not want to return `_ignored_source` with every search hit. Other than being expensive (especially when there are many ignored fields), it also exposes some implementation details. Anyway, it might still be useful to have the ability to retrieve it if necessary, at least for debugging purposes. For this reason we require that it is explicitly requested using `stored_fields` or `fields` and we do not return it by default or if requested via a wildcard `*` in `stored_fields` or `fields`. * IPinfo privacy detection support (#114456) * [ML] Filter empty task settings objects from the API response (#114389) Inference endpoints that do not define task settings or where there are no defaults return an empty task_setting object. Filter this object from the response * Azure: Explain why we don't use batch delete (#114379) * Mute org.elasticsearch.search.retriever.StandardRetrieverBuilderParsingTests testRewrite #114466 * Mute org.elasticsearch.search.retriever.RankDocsRetrieverBuilderTests testRewrite #114467 * Improve performance of Int3Hash#removeAndAdd (#114383) * Mute org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT #114471 * Add telemetry for retrievers (#114109) * Update wolfi image and fix breaking change (#114390) * Actually add `terminate` docs page (#114440) A docs page for the `terminate` processor was added in https://github.com/elastic/elasticsearch/pull/114157, but the change to include it in the outer processor reference page was omitted. This change corrects that oversight. * Refactor change point detection (#114289) * Move change detection code to separate class * Uniformize ChangeDetector and SkipeAndDipDetector * Separate ChangeDetectorTests and ChangePointAggregatorTests. * Public entrypoint for change point detection * Move p-value computation to ChangeDetector * Move main entrypoint to a separate file * Fix synonyms CI tests timeout (#114476) * Use synonym index alias, add timeout * Unmute tests * Clean up factory retention settings from elasticsearch (#114396) This removes the possibility for a plugin to provide factory retention settings. Factory retention settings have been deprecated and completely replaced by #111972. Note: this feature is not in use. If someone wants to set global retention they can use the cluster settings as defined in #111972. * Mute org.elasticsearch.packaging.test.DockerTests test022InstallPluginsFromLocalArchive #111063 * Reduce double and float precision requirements on rest CSV tests (#114313) Fixes https://github.com/elastic/elasticsearch-serverless/issues/2837 The failing value is `5.801464200000001`, which rounds to `5.8014642`. However, `5.8014642` is roudned to `5.801464199`. With a precision of 7, both are truncated to `5.801464`. Not the most elegant solution, but it works for this case, which may be a quite edgy one. * Fix standard retriever rewrite (#114480) Closes #114466 * [Build] Add AGPL license to open source poms (#114403) Aftermath of coming back to open source licensing * Give the kibana system user permission to read security entities (#114363) * Give the kibana system user .entities read permissions * Update docs/changelog/114363.yaml --------- Co-authored-by: Elastic Machine <[email protected]> * Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml=reference/esql/esql-across-clusters/line_196} #114488 * Mute org.elasticsearch.gradle.internal.PublishPluginFuncTest org.elasticsearch.gradle.internal.PublishPluginFuncTest #114492 * Handle InternalSendException inline for non-forking handlers (#114375) When TransportService fails to send a transport action, it can complete the listener's `onFailure` with the `generic` executor. If the listener is a `PlainActionFuture` and also waits to be completed with a `generic` thread, it will trip the `assertCompleteAllowed` assertion. https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064 With this PR, we no longer fork to the generic thread pool and instead just handle the exeption inline with the current thread. The expectation is that the downstream handler should take care potential stack overflow issues. This is similar to what is done in #109236 * ESQL: Use less memory in listener (#114358) Use less memory in the top level listener by fetching the output attributes from the plan before starting rather than after finishing. The plan *can* be very large so let's not hold on to it longer than we have to. * Speed up XPackRestIT a little (#114425) This speeds up all of the `profiling` tests in `XPackRestIT` by replacing a "wait for refresh" with a "refresh as fast as you can". It'll only be a few seconds of speed up, but it's something. While I'm here I'm reenabling one of our tests that doesn't seem to be causing the slow down. Closes #113340 * Mute org.elasticsearch.xpack.inference.DefaultElserIT testInferCreatesDefaultElser #114503 * [ES|QL] Named parameter for field names and field name patterns (#112905) * named parameters for field name and pattern * [CI] Fix PublishPluginFuncTest (#114511) * Improve performance of LongObjectPagedHashMap#removeAndAdd and ObjectObjectPagedHashMap#removeAndAdd (#114280) * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym set not found} #114432 * Fix enum switch case error in AlibabaSearchService (#114504) Co-authored-by: Elastic Machine <[email protected]> * Improve handling of failure to create persistent task (#114386) Today if creating a persistent task fails with an exception then we submit a cluster state update to fail the task but until that update executes we will retry the failing task creation and cluster state submission on all other cluster state updates that change the persistent tasks metadata. With this commit we register a placeholder task on the executing node to block further attempts to create it until the cluster state update is processed. * No longer require logs@settings component template to enable logsdb by default. (#114501) This change also opts out apm logs from logsdb. * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Get a synonym rule} #114443 * Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym rule not found} #114444 * Verify Maxmind database types in the geoip processor (#114527) * [ML] Stream Azure Completion (#114464) Includes both Azure AI Studio and Azure Open AI. Both streaming responses are processed using Open AI's SSE format. * Add chunking settings configuration to ElasticsearchService/ELSER (#114429) * Add chunking settings configuration to ElasticsearchService/ELSER * Update docs/changelog/114429.yaml --------- Co-authored-by: Elastic Machine <[email protected]> * Adding support for registered country fields for maxmind geoip databases (#114521) Co-authored-by: Joe Gallo <[email protected]> * Update "Securing Clients and integrations" to include Fleet (#113731) * Add link to NO_COPIES allocation explain message (#113656) * tweaked no-valid-shard-copies message * untweaked misformatting in allocation explain asciidoc * [ML] Mute tests using mock web server for streaming (#114542) Relates #114385 * [ML] Upgrade to AWS SDK v2 (#114309) - Replaced AWS 1.12.740 with 2.28.13 - Removed `aws-java-sdk*` and its transitive dependencies. - Added `awssdk:bedrockruntime` as an `implementations`, all transitive dependencies are added as `api` matching their marked `Compile` in maven. - Added `awssdk:netty-nio-client` as our client implementation, since our v1 integration is using the respective Async client. - Added netty packages as `runtimeOnly` since they are only used during runtime. - Replaced AWS's use of SLF4J-1.7 with our declaration of SLF4J-2.x, since SLF4J includes backwards-compatible bindings. - Migrated all references from the v1 package (`com.amazonaws`) to the v2 package (`software.amazon.awssdk`). Notable changes in the SDK: - *Result objects are renamed to *Response objects. - Objects are now immutable and require Builders to set fields. - Getters no longer have the `get*` prefix, e.g. `getModelId()` is now `modelId()`. - `Future` has been replaced with `CompletableFuture`. - There is no longer a need to invoke the `IdleConnectionReaper`, this is now done when the client is closed. - Builders have a consumer mutation pattern for modifying many fields at once. Security changes: - The underlying Builder objects always check to see if the `.aws/credentials` and `.aws/config` files exist, even if they are not used, so our `plugin-security.policy` now allows reading these files. - The Builder always checks for the `http.proxyHost` property before defaulting to the hardcoded Bedrock URL. Resolve #110590 * Updating tests to account for rewritting nested retrievers (#114502) * Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsAggregation #114554 * [TEST] Add coverage for field caps and ES|QL to LogsDB QA testing (#114505) * Add coverage for field caps and ES|QL to LogsDB QA testing * address comments * address comments * address comments * Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsQuery #114563 * Ensure clean thread context in `MasterService` (#114512) `ThreadContext#stashContext` doesn't guarantee to give a clean thread context, but it's important we don't allow the callers' thread contexts to leak into the cluster state update. This commit captures the desired thread context at startup rather than using `stashContext` when forking the processor. * Fix TDigestState.read CB leaks (#114303) Closes https://github.com/elastic/elasticsearch/issues/114194 Fixes `TDigestState.read()` CB leak on error on `.reserve()`. * Fix `LogsdbTestSuiteIT` unexpected warning (#114481) Just expect the warning when adding the template. * Fix bitset filter cache loading in Stateless (#114191) As recognized in PR #113478 reviewing, the bitset filter cache was wrongly eagerly loaded only for fast refresh indices on index nodes. However, it should be eagerly loaded for any index that can be searched. This PR fixes this. * Fix deployment_stats.state for target_allocation_count=0 (#114570) * Additional index settings provider validation (#113838) Fail if an index settings provider adds a setting that was added by another index settings provider. * ES|QL: Add support for cached strings in plan serialization (#112929) * Improve exception message for bad environment variable placeholders in settings (#114552) Closes #110858 * Unmute ComparisonTests (#114248) Closes https://github.com/elastic/elasticsearch/issues/111721 The test got muted time ago. I'm unmuting it because: - I couldn't reproduce it. The test uses an accuracy delta for assertions, so I executed it _in a loop_ in case there was some edge case with it, but no luck - The classes were heavily changed * Simplifying TextSimilarityRankBuilder to operate through the standard QueryPhase (#114567) * Enable pushing Sort/Filter by ReferenceAttribute down to Lucene, and thereby optimize Sort by ST_DISTANCE (#112938) The ST_DISTANCE function added in #108764 was optimized for lucene pushdown in a series of followup PRs, but this did not include sorting by distance. Now this is resolved, for two key scenarios, both known to be valued by users: * Sorting by distance: `FROM index | EVAL distance=ST_DISTANCE(field, literal) | SORT distance` * Sorting and filtering by distance: `FROM index | EVAL distance=ST_DISTANCE(field, literal) | WHERE distance < literal | SORT distance` The key changes required to make this work: * Add to the EsQueryExec the appropriate sort->_geo_distance sort type * Enhance PushTopNToSource to understand how to pushdown the sort even when there is an EVAL in between the FROM and the SORT (between the TopNExec and the EsQueryExec in the physical plan). * Enhance PushFiltersToSource to understand how to pushdown the filter even when there is an EVAL in between the FROM and the WHERE (between the Filter and the EsQueryExec in the physical plan). A useful bonus feature of this additional EVAL intelligence is that other, non-spatial cases are now also pushed down. In particular EVALs that are simple aliases are considered and pushed down, for both filtering and sorting. Local benchmark results, very approximate, but show massive improvements for distanceSort and distanceFilterSort, which relate to the two cases listed above. Benchmark Query DSL ESQL before this PR ESQL after this PR Comments distanceFilter 10 5 5 Optimized in #109972 distanceEvalFilter 10 10000 1500 Still slow due to unnecessary EVAL distanceSort 150 12000 160 distanceFilterSort 20 10000 24 NOTE: This enables pushing down sorting by any ReferenceAttribute that either refers to a sortable FieldAttribute, or to an StDistance function that itself refers to a suitable FieldAttribute of geo_point type. --------- Co-authored-by: Alexander Spies <[email protected]> * Verify `CancellableFanOut` items processed before completion (#114595) Enhances `CancellableFanOutTests#testConcurrency` to ensure that all the per-item response-handling methods have finished executing before `onCompletion` is called. * Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testMatchAllQuery #114607 * Fix dim validation for bit element_type (#114533) A silly bug has reared its ugly head. Apparently, our dimension validations are predicated on JSON parsing order, that is not good. So, this commit adjusts the dim validations so that it is an actual validation, instead of something that occurs during parsing. Additionally, I found that our custom formats were not overriding `getMaxDimensions` correctly. Typically, and in production, this isn't that big of a deal, but I have found it useful to do this for other testing purposes (so that we don't have to rely on the perfield codec for more direct and advanced testing). * ESQL: Push down filters even in case of renames in Evals (#114411) Optimize queries like ... | EVAL b = a, c = b | WHERE c > 2 to ... | WHERE a > 2 | EVAL b = a, c = b * [Failure store - selector syntax] Refactor IndicesOptions builder (#114597) **Introduction** > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see https://github.com/elastic/elasticsearch/pull/113144. All this work has been cherry picked from there. **Purpose of this PR** This PR is replacing the the indices options boolean constructor with the builders. The goal is to give me and the reviewer a very narrow scope change when we can ensure we did not make any mistakes during the conversion. Also it will reduce a bit the change list in https://github.com/elastic/elasticsearch/pull/113144/files. * [ML] Add sentence overlap option to the sentence chunking settings (#114461) * ESQL: Improve error message in test (#114524) Improve an error message in the test for `profile`ing the ordinals-based grouping operator. It's failed in the past with a rather cryptic error message. This will either keep it passing fully or give us a better error message when it does fail. Closes #114380 * Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTests testPushSpatialIntersectsEvalToSource {default} #114627 * Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTests testPushWhereEvalToSource {default} #114628 * Fixing test failure for #114556 (#114617) * AwaitsFixes for #114625 * Revert "AwaitsFixes for #114625" This reverts commit d37c1c636bfc87df0906a6427e824f6bca722245. The automuter got there first. * SQL: Remove dependency on `org.elasticsearch.Version` (#112094) This removes SQL's use of `org.elasticsearch.Version` class and usages replaced by `SqlVersion`. All the currently considered released versions (`7.0.0` to `8.16.0`) have been declared as `SqlVersion` instances. These are still tested against. The last "known release" (`8.16.0`) is considered the "server compatibility version" and all clients at or past this release are compatible with the server; notably, they can be also on a newer version than server's. Clients released before this "server compatibility version" respect existing compatibility requirements (must/can be older up to one major lower, but past `7.7.0`). The "server compatibility version" will not be updated with newer stack releases (at least not until #112745 is addressed). Fixes #102689 * Replace "::<type>" casts to explicit casting functions (#114639) Fixes https://github.com/elastic/elasticsearch/issues/114613 Those tests were added with `::<type>` casts, which don't work in older versions. As they aren't testing anything around those casts, I'm replacing them with `TO_<TYPE>()` functions to let them work everywhere. * Second parsing pass tracks array scopes properly (#114621) * Change exception type when timing out waiting for specific seqno in fleet search api. (#114526) Without this change request fails with `ElasticsearchTimeoutException` if waiting for seqno times out. This results in a 500 status code. With this change the `SearchTimeoutException` is used which results in a 504 status code. This is a more appropriate response code for time-outs. Closes #114395 * Allow stored source in logsdb and tsdb (#114454) * ESQL: Retry test on 403 (#114450) Retry the async test when you get a 403 - that could be because security has not yet booted. We should have permission to fetch everything. * CCS metadata is opt-in in ESQL JSON responses (#114437) Since Kibana only needs CCS metadata in ESQL responses from certain well-defined locations, we are making CCS metadata opt-in. This feature is patterned after ESQL profiling, where you specify "profile": true in the ESQL body and if you asked for it will be present in the response always (it will be written to the .async-search index and you can’t turn it off in later async-search requests against this particular query ID) and if you didn’t ask for it at the beginning it will never be present (it will NOT be written to the .async-search index when it is persisted). The new option is "include_ccs_metadata": true/false. * ESQL: Speed up grouping by bytes (#114021) This speeds up grouping by bytes valued fields (keyword, text, ip, and wildcard) when the input is an ordinal block: ``` bytes_refs 22.213 ± 0.322 -> 19.848 ± 0.205 ns/op (*maybe* real, maybe noise. still good) ordinal didn't exist -> 2.988 ± 0.011 ns/op ``` I see this as 20ns -> 3ns, an 85% speed up. We never hard the ordinals branch before so I'm expecting the same performance there - about 20ns per op. This also speeds up grouping by a pair of byte valued fields: ``` two_bytes_refs 83.112 ± 42.348 -> 46.521 ± 0.386 ns/op two_ordinals 83.531 ± 23.473 -> 8.617 ± 0.105 ns/op ``` The speed up is much better when the fields are ordinals because hashing bytes is comparatively slow. I believe the ordinals case is quite common. I've run into it in quite a few profiles. * ESQL: Test partially filtered aggs (#114510) Tests for partially filtered aggs. It uses the existing aggs tests and adds junk rows that are filtered away. That way we don't have to add new testing assertions to each class - we just can reuse the existing assertions. * Mute org.elasticsearch.xpack.inference.integration.ModelRegistryIT testGetModel #114657 * [ES|QL] Add hypot function (#114382) Adds a hypotenuse function * Replace cloud-ess docker image with wolfi-ess (#114413) * Replace cloud-ess docker image with wolfi-ess We just replaced the existing implementation of cloud-ess with what was wolfi-ess which is a wolfi based ess image. The cloud image itself will be removed in a future commit it was not used anywhere * Switch to test cloud docker image instead of default docker in packaging pr tests. This adds way more coverage than the default docker image which is also barely touched * Initial InstrumenterTests (#114422) * Initial InstrumenterTests * Assert on instrumentation method arguments * Unmute test that does not exist anymore (#114655) Closes #111631. * Support IPinfo database configurations (#114548) * Move tests out of geo ip processor tests (#114656) * [Inference API] Introduce Update API to change some aspects of existing inference endpoints (#114457) * Refactor IPinfoIpDataLookupsTests tests (and others) (#114667) * Introduce `index.mapping.source.mode` setting to override `_source.mode` (#114433) * featur : introduce index.mapping.source.mode setting Introduce a new `index.mapper.source.mode` setting which will be used to override the mapping level `_source.mode`. For now the mapping level setting will stay and be deprecated later with another PR. The setting takes precedence always precedence. When not defined the index mode is used and can be overridden by the _source.mode mapping level definition. * Add feature flag for subobjects auto (#114616) * Avoid throw exception in SyntheticSourceIndexSettingsProvider (#114479) Co-authored-by: Nhat Nguyen <[email protected]> * Add a callback for onConnectionClosed to MockTransportService (#114564) The callback is added to allow inserting additional behaviour such as delay when handling closed connection. * Add ESQL match function (#113374) * Add alias event.dataset -> data_stream.dataset (#114642) * [ML] Feature flag default configs (#114660) * Renovate Bot PRs should run ci checks (#114699) * Simplify NodeShutdownShardsIT (#114583) We no longer need to manually reroute after registering node shutdown in test since https://github.com/elastic/elasticsearch/pull/103251 * Add generated code changes for HypotEvaluator (#114697) * ES|QL: Restrict sorting for _source and counter field types (#114638) * Preserve thread context when waiting for segment generation in RTG (#114623) Closes ES-9778 * Fix failing tests after PR clash (#114625) Two PRs conflicted without github or CI noticing. The first added these tests, and the second modified their behaviour. Both went green in CI and both were merged within an hour of each other. * PR that added the tests: * https://github.com/elastic/elasticsearch/pull/112938 * merged 14:13CET * PR that changed the behaviour of these tests: * https://github.com/elastic/elasticsearch/pull/114411 * merged 14:48CET * Guard second doc parsing pass with index setting (#114649) * Guard second doc parsing pass with index setting * add test * updates * updates * merge * Fix termStats posting usage (#114644) * Node shutdown test integration test (#114582) This change adds a test case that verifies that the node can be shutdown while hosting an index with 0-1 or 0-all auto-expand configuration. * Update docker.elastic.co/wolfi/chainguard-base:latest Docker digest to 277ebb4 (main) (#114409) * Update docker.elastic.co/wolfi/chainguard-base:latest Docker digest to 277ebb4 * Tweak renovate replace pattern --------- Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com> Co-authored-by: Rene Groeschke <[email protected]> * Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml=reference/rest-api/usage/line_38} #113694 * Introduce CRUD APIs for data stream options (#113945) In this PR we introduce two endpoint PUT and GET to manage the data stream options and consequently the failure store configuration on the data stream level. This means that we can manage the failure store of existing data streams. The APIs look like: ``` # Enable/disable PUT _data_stream/my-data-stream/_options { "failure_store": { "enabled": true } } # Remove existing configuration DELETE _data_stream/my-data-stream/_options # Retrieve GET _data_stream/my-data-stream/_options { "failure_store": { "enabled": true } } ``` Future work: - Document the new APIs - Convert `DataStreamOptionsIT.java` to a yaml test. * Expands semantic_text tutorial with hybrid search (#114398) * Creates a new page for the hybrid search tutorial * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Adds search response example * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> * Update docs/reference/search/search-your-data/semantic-text-hybrid-search Co-authored-by: István Zoltán Szabó <[email protected]> --------- Co-authored-by: István Zoltán Szabó <[email protected]> * Add ResolvedExpression wrapper (#114592) **Introduction** > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see https://github.com/elastic/elasticsearch/pull/113144. All this work has been cherry picked from there. **Purpose of this PR** This PR is introducing a wrapper around the resolved expression that used to be a `String` to create the base on which the selectors are going to be added. The current PR is just a refactoring and does not and should not change any existing behaviour. * Update IndexSettingProvider#getAdditionalIndexSettings() signature (#114150) With logsdb another index mode is available, the isTimeSeries parameter is limiting. Instead, we should just push down the index mode from template to index settings provider. Follow up from #113451 Relates to #113583 * Fix Max Score Propagation in RankDocsQuery (#114716) Fix rank doc query when some segments have no ranked docs * [ML] Switch default chunking strategy to sentence (#114453) * Don't close/recreate adaptive allocations metrics (#114721) * Simplify `XContent` output of epoch times (#114491) Today the overloads of `XContentBuilder#timeField` do two rather different things: one formats an object as a `String` representation of a time (where the object is either an unambiguous time object or else a `long`) and the other formats only a `long` as one or two fields depending on the `?human` flag. This is trappy in a number of ways: - `long` means an absolute (epoch) time, but sometimes folks will mistakenly use this for time intervals too. - `long` means only milliseconds, there is no facility to specify a different unit. - the dependence on the `?human` flag in exactly one of the overloads is kinda weird. This commit removes the confusion by dropping support for considering a `Long` as a valid representation of a time at all, and instead requiring callers to either convert it into a proper time object or else call a method that is explicitly expecting an epoch time in milliseconds. * Clarify use of special values for publish addresses (#114551) Special values like `0.0.0.0` may resolve to multiple IP addresses just like hostnames, so the same considerations apply when using such values as a publish address. This commit spells this case out in the docs and cleans up the nearby wording a little. * [ML] Pick best model variant for the default elser endpoint (#114690) * [ML] Ignore unrecognized openai sse fields (#114715) Azure / Llama sends back fields we do not expect - rewriting the parser to better handle unknown fields (by dropping them). * [ML] Send mid-stream errors to users (#114549) If apache sends an error mid stream, forward it to the user rather than the now-ignored listener. * Add a query rules tester API call (#114168) * Add a query rules tester API call * Update docs/changelog/114168.yaml * Wrap client call in async with origin * Remove unused param * PR feedback * Remove redundant test * CI workaround - add ent-search as ml dependency so it can find node features * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/10_basic/Test using the deprecated elasticsearch_version field results in a warning} #114748 * Mute org.elasticsearch.xpack.eql.EqlRestIT testIndexWildcardPatterns #114749 * Refactor merge scheduling code to allow overrides (#114547) This code refactors how the merge scheduler is configured to allow different engine implementations to configure different merge schedulers. * [DOCS] ES|QL: Adding a tip to the WHERE documentation (#114050) * Adding a tip to make null field behavior more apparent. * Update docs/reference/esql/processing-commands/where.asciidoc Co-authored-by: Andrei Stefan <[email protected]> * Update docs/reference/esql/processing-commands/where.asciidoc Rephrasing for clarity Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: Liam Thompson <[email protected]> * Mute org.elasticsearch.xpack.eql.EqlRestIT testBadRequests #114752 * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich stats REST response structure} #114753 * Remove PushTopNToSource support for ExchangeExec (#114637) This appears to be dead code, so we're removing it. * Test StDistance multivalue consistency and fixed two CartesianPoint bugs (#114729) * Fix Synthetic Source Handling for `bit` Type in `dense_vector` Field (#114407) **Description:** This PR addresses the issue described in [#114402](https://github.com/elastic/elasticsearch/issues/114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [#114402](https://github.com/elastic/elasticsearch/issues/114402) - Introduced in [#110059](https://github.com/elastic/elasticsearch/pull/110059) * Mute org.elasticsearch.xpack.rank.rrf.RRFRankClientYamlTestSuiteIT test {yaml=rrf/800_rrf_with_text_similarity_reranker_retriever/explain using rrf retriever and text-similarity} #114757 * only return deprecation warning for elser service (#114507) Co-authored-by: Elastic Machine <[email protected]> * [ML] Stream Google Completion (#114596) Google supports SSE for chat completion and sends the same payload as their non-streaming calls, so we can reuse the SSE parser with our existing parse function. The downside is, google requires a different URI, so we refactored away from the visitor pattern to allow for a different URI creating and set during request time rather than on model instantiation time. * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/30_tsdb_index/enrich documents over _bulk} #114761 * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich documents over _bulk via an alias} #114763 * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/10_basic/Test enrich crud apis} #114766 * [ML] Dynamically get of num allocations (#114636) * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich documents over _bulk} #114768 * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/50_data_stream/enrich documents over _bulk via a data stream} #114769 * Mute org.elasticsearch.xpack.eql.EqlRestValidationIT testDefaultIndicesOptions #114771 * Mute org.elasticsearch.xpack.enrich.EnrichIT testEnrichSpecialTypes #114773 * Mute org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT testEveryActionIsEitherOperatorOnlyOrNonOperator #102992 * Mute org.elasticsearch.xpack.enrich.EnrichIT testDeleteExistingPipeline #114775 * Support IPinfo databases in the ip_location processor (#114735) * [ML] Default inference endpoint for the multilingual-e5-small model (#114683) * [ML] Stream Bedrock Completion (#114732) Notes: - Adds a new API to the chatCompletionRequest to invoke the Bedrock Stream API - Create a StreamingChatProcessor that subscribes to streaming results from bedrock and handles the parsing on another thread. - There was no good way (that I could see) to extend the Provider-based CompletionRequestEntity, so they have been flattened into one RequestEntity that can be shared between ConverseRequest and ConverseStreamRequest. * Adding new bbq index types behind a feature flag (#114439) new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties. * Add IndicesMetrics instead of IndicesService to toClose (#114782) The same line already exists in [L543](https://github.com/ywangd/elasticsearch/blob/9f4a7927bdc366f8ca98c4652ac7d1102d9430f5/server/src/main/java/org/elasticsearch/node/Node.java#L543). It should have no practial impact since AbstractLifecycleComponent#close short-circuits if its lifecycle is already closed. The original code meant to close IndicesMetrics. This PR adds it. Relates: #113737 * Mute org.elasticsearch.test.rest.ClientYamlTestSuiteIT org.elasticsearch.test.rest.ClientYamlTestSuiteIT #114787 * Mute org.elasticsearch.xpack.inference.rest.ServerSentEventsRestActionListenerTests testNoStream #114788 * Mute org.elasticsearch.xpack.eql.EqlRestValidationIT testAllowNoIndicesOption #114789 * Mute org.elasticsearch.xpack.eql.EqlStatsIT testEqlRestUsage #114790 * ESQL: Add skips to tests that were added retroactively (#114727) Skip some csv tests that cannot be used in bwc tests before 8.13/8.14. * Remove snapshot build restriction for match and qstr functions (#114482) * ESQL: Introduce per agg filter (#113735) Add support for aggregation scoped filters that work dynamically on the data in each group. | STATS success = COUNT(*) WHERE 200 <= code AND code < 300, redirect = COUNT(*) WHERE 300 <= code AND code < 400, client_err = COUNT(*) WHERE 400 <= code AND code < 500, server_err = COUNT(*) WHERE 500 <= code AND code < 600, total_count = COUNT(*) Implementation wise, the base AggregateFunction has been extended to allow a filter to be passed on. This is required to incorporate the filter as part of the aggregate equality/identify which would fail with the filter as an external component. As part of the process, the serialization for the existing aggregations had to be fixed so AggregateFunction implementations so that it delegates to their parent first. * Skip spatial.AirportsSortCityName before 8.13 (#114795) Fix https://github.com/elastic/elasticsearch/issues/114767. TopN didn't work in this scenario on old versions. * Remove unused v7-only APIs (#114733) Removes several REST endpoints that only existed in the now-inaccessible v7-compatible mode. * Mute org.elasticsearch.xpack.eql.EqlRestIT testUnicodeChars #114791 * [TestFix] ExplainLifecycleIT testStepInfoPreservedOnAutoRetry failing (#114294) * Extend timeout of test and add logging on fail * Unmute unstable test * Switch to using logger for output Keeps the forbiddenApis check happy * Switch to using assertion messages to display To display debug info * Adjust logic of previous step info preservation Add additional checks to ensure previous step info can't be cleared when auto retrying, only updated with new info. Also added logic to ensure previous step info is cleared when transitioning to a new action * Undo accidentally added lines from merge * Remove v7 compat from `{PUT,DELETE} /_snapshot/${REPO}` APIs (#114726) This exception mangling only existed for v7 API compatibility and is no longer needed. * [TEST] Migrated ccs-unavailable-clusters QA tests (#114764) Ccs-unavailable-clusters QA tests migrated to the new REST testing framework, using 'elasticsearch.internal-java-rest-test' Gradle plugin * Add documentation for passthrough field type (#114720) * Guard second doc parsing pass with index setting * add test * updates * updates * merge * Add documentation for passthrough field type * Apply suggestions from code review Co-authored-by: Felix Barnsteiner <[email protected]> * updates * updates * Update docs/reference/mapping/types/passthrough.asciidoc Co-authored-by: Felix Barnsteiner <[email protected]> * address comment * address comment * Update docs/reference/mapping/types/passthrough.asciidoc Co-authored-by: Felix Barnsteiner <[email protected]> * address comment --------- Co-authored-by: Felix Barnsteiner <[email protected]> * Remove all v7-only REST endpoints (#114765) These endpoints were deprecated in v7 and are unsupported in v8 so we can remove them entirely in v9. * Remove unused `ChunkedToXContent#toXContentChunkedV7` (#114728) We don't support the v7 REST API in v9, so this commit removes the now-unused `ChunkedToXContent#toXContentChunkedV7` method. It also introduces a similar `ChunkedToXContent#toXContentChunkedV8` method for implementations to use for v8 REST API compatibility. * [ML] Wait for allocation on scale up from 0 (#114719) * Mute org.elasticsearch.xpack.rank.rrf.RRFRetrieverBuilderNestedDocsIT testRRFExplainWithNamedRetrievers #114820 * Fix minor formatting issue (#114815) The list with two options doesn't get rendered as a list, due to the snippet in between. https://www.elastic.co/guide/en/elasticsearch/reference/master/passthrough.html#passthrough-conflicts * [DOCS] Fix User agent processor properties (#112518) * Mute org.elasticsearch.ingest.geoip.HttpClientTests org.elasticsearch.ingest.geoip.HttpClientTests #112618 * Mute org.elasticsearch.xpack.remotecluster.RemoteClusterSecurityWithApmTracingRestIT testTracingCrossCluster #112731 * [EIS] Validate EIS Gateway URL if set (#114600) * #111893 Add Warnings For Missing Index Templates (#114589) * Add data stream template validation to snapshot restore * Add data stream template validation to data stream promotion endpoint * Add new assertion for response headers Add a new assertion to synchronously execute a request and check the response contains a specific warning header * Test for warning header on snapshot restore When missing templates * Test for promotion warnings * Add documentation for the potential error states * PR changes * Spotless reformatting * Add logic to look in snapshot global metadata This checks if the snapshot contains a matching template for the DS * Comment on test cleanup to explain it was copied * Removed cluster service field * Mute org.elasticsearch.xpack.enrich.EnrichIT testImmutablePolicy #114839 * Introduce utils for _really_ stashing the thread context (#114786) `ThreadContext#stashContext` does not yield a completely fresh context: it preserves headers related to tracing the original request. That may be appropriate in many situations, but sometimes we really do want to detach processing entirely from the original task. This commit introduces new utilities to do that. * [TEST] Fix ccs-unavailable-clusters QA tests build (#114833) Properly use `configureEach` on the task configuration to postpone the tasks creation and configuration in the build process * Mark Data Stream Lifecycle APIs to stable (#114780) Data Stream Lifecycle has GA'ed in 8.14, so we can safely mark these as stable. * Remove all replaced-in-v8 REST endpoints (#114800) These endpoints were deprecated in v7 and are replaced in v8 with different endpoints so we can remove the v7 endpoint names in v9. * Set min number of allocations for ElasticSearchInternalService to 0 (#114829) * Set min number of allocations for ElasticSearchInternalService to 0 * Updating IT tests with new min allocations value --------- Co-authored-by: Elastic Machine <[email protected]> * Updating queries used in rrf with text similarity tests (#114838) * Fix bbq index feature exposure for testing & remove feature flag (#114832) We actually don't need a cluster feature, a capability added if the feature flag is enabled is enough for testing. closes https://github.com/elastic/elasticsearch/issues/114787 * Allow synthetic source and disabled source for standard indices (#114817) When using the index.mapping.source.mode setting we need to make sure that it takes precedence and that is used also when standard index mode is used. Without this patch we always return stored source if _source.mode is not used and the setting is. Relates #114433 * ESQL: Fix grammar changes around per agg filtering (#114848) Remove dev flag left in grammar for agg filtering Related to #113735 * [ML] Create an ml node inference endpoint referencing an existing deployment (#114750) * Support multi-valued fields in compute engine for ST_DISTANCE (#114836) In #112063 we added support for multivalued fields to the compute engine for ST_INTERSECTS and relatives, but not for ST_DISTANCE. In #114729 it was discovered that, at least for the common case of a field and a constant, this support was not needed due to ST_DISTANCE being re-written to ST_INTERSECTS. However, for many other cases, like ST_DISTANCE used on the coordinator node, or between two fields, this lack of support would result in null values. This PR fixes those cases, making sure ST_DISTANCE uses the Block-Builder approach similar to what was done for ST_INTERSECTS et al. * Ensuring consistent ordering for inner hits in collapse test for rrf (#114740) * Revert "[ML] Dynamically get of num allocations (#114636)" (#114861) This reverts commit 8040fbb0d05401d40ea856f0a4982e8aaab48340. * Mute org.elasticsearch.license.LicensingTests org.elasticsearch.license.LicensingTests #114865 * Retry throttled snapshot deletions (#113237) Closes ES-8562 * Make mapping a distinct concept in logsdb data generation (#114370) * Download IPinfo ip location databases (#114847) * Revert "[EIS] Validate EIS Gateway URL if set (#114600)" (#114867) This reverts commit 39168e139d98b2eacade007fcd616715a6106c10. * [ML] Unmute MlJobIT tests (#114553) A large number (almost the entirety) of tests in the `MlJobIT` tests suite have been muted. In all cases the cause of failure of the tests is the same, persistent tasks for `cluster:admin/xpack/ml/job/close` and `cluster:admin/xpack/ml/job/close[n]` have been detected as present after the test case has completed. Examination of the tests show that the majority of them do not call either `close` directly or indirectly, indicating that the root cause lies with some previous test. As the `close` task inherits the default timeout of half an hour, an instance of it lingering about can cause a lot of damage to subsequent tests. The approach taken in this PR is to call the `_task/_cancel` endpoint after every test execution in the `MlJobIT` suite as the final operation. This should restrict the impact of the lingering `close` task to the test responsible, and the reduction in noise should permit better identification of the culprit. Closes #105239, #113581, #113046, #112729, #113528, #112701, #113742, #113370, #112823, #112088, #112212, #112730, #113654, #113655, #112381, #113477, #112382, #113651, #112510 * Fixing randomization issue for RRFRetrieverBuilderNestedDocsIT (#114859) * Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsQuery #114873 * Mute org.elasticsearch.xpack.enrich.EnrichIT testDeleteIsCaseSensitive #114840 * Remove dead branches for v7 REST API (#114850) In v9 the `getRestApiVersion()` method on `RestRequest`, `XContentBuilder` and `XContentParser` can never return `V_7`, so we can replace all the expressions of the form `$x$.getRestApiVersion() == V_7` with `false`. This commit does that, and then refactors away the resulting dead code using (largely) automated transformations. * Removing tech-preview header and updating documentation for retrievers and RRF (#114810) * Retry `S3BlobContainer#getRegister` on all exceptions (#114813) S3 register reads are subject to the regular client retry policy, but in practice we see failures of these reads sometimes for errors that are transient but for which the SDK does not retry. This commit adds another layer of retries to these reads. Relates ES-9721 * OTel mappings: avoid metrics to be rejected when attributes are malformed (#114856) * #104411 Add warning headers for ingest pipelines containing special characters (#114837) * Add logs and headers For pipeline creation when name is invalid * Fix YAML tests and add YAML test for warnings * Update docs/changelog/114837.yaml * Changelog entry * Changelog entry * Update docs/changelog/114837.yaml * Changelog entry * [Failure store - selector syntax] Replace failureOptions with selector options internally. (#114812) **Introduction** > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see https://github.com/elastic/elasticsearch/pull/113144. All this work has been cherry picked from there. **Purpose of this PR** This PR is replacing the `FailureStoreOptions` with the `SelectorOptions`, there shouldn't be any perceivable change to the user since we kept the query parameter "failure_store" for now. It will be removed in the next PR which will introduce the parsing of the expressions. _The current PR is just a refactoring and does not and should not change any existing behaviour._ * Mute org.elasticsearch.packaging.test.EnrollmentProcessTests test20DockerAutoFormCluster #114885 * Document _cat/indices behavior when encountering source only indices (#114884) Closes https://github.com/elastic/elasticsearch/issues/114546 * Inline `MockTransportService#getLocalDiscoNode()` (#114883) This method just delegates to `getLocalNode()`, we may as well call the more widely-used method with the shorter name directly. * Better DataType string checks (#114863) * Use DataType.isString * Add DataType.stringTypes() * Fix shouldHideSignature check * Fix NPE in AdaptiveAllocationsScalerService (#114880) * Fix NPE in AdaptiveAllocationsScalerService * Update docs/changelog/114880.yaml * Delete docs/changelog/114880.yaml * ESQL: Fix MvPercentileTests precision issues (#114844) Fixes https://github.com/elastic/elasticsearch/issues/114588 Fixes https://github.com/elastic/elasticsearch/issues/114587 Fixes https://github.com/elastic/elasticsearch/issues/114586 Fixes https://github.com/elastic/elasticsearch/issues/114585 Fixes https://github.com/elastic/elasticsearch/issues/113008 Fixes https://github.com/elastic/elasticsearch/issues/113007 Fixes https://github.com/elastic/elasticsearch/issues/113006 Fixes https://github.com/elastic/elasticsearch/issues/113005 Fixed the long precision issue by allowing a +/-1 range. Also made a minor refactor to simplify using different matchers for different types. * Remove the min_compatible_shard_node option and associated classes (#114713) Any similar functionality in the future should use capabilities instead * Fixes flaky ST_CENTROID_AGG tests (#114892) Even with Kahan summation, we were occasionally getting floating point differences at the 14th decimal point, well beyond anything a GIS use case would care about. * Fix ST_CENTROID_AGG when no records are aggregated (#114888) This was returning an invalid result `POINT(NaN NaN)` and now instead returns `null`. * Mute org.elasticsearch.test.rest.ClientYamlTestSuiteIT test {yaml=cluster.stats/30_ccs_stats/cross-cluster search stats search} #114902 * Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/40_synthetic_source/enrich documents over _bulk} #114825 * Reset array scope tracking for nested objects (#114891) * Reset array scope tracking for nested objects * update * update * update * ESQL: adapt to new range in ToDatetimeTests (#114605) Two tests shared the same name in `ToDatetimeTests`, so that needed fixing. But then also the ranges in the masked test needed adjusting after the change that added the masking test. Fixes #108093 * Fix setOnce in EmbeddingRequestChunker (#114900) * [DOCS] Adds link to tutorial and API docs to trained model autoscaling. (#114904) * Mute org.elasticsearch.xpack.inference.DefaultEndPointsIT testInferDeploysDefaultElser #114913 * Inject the `host.name` field mapping only if required for `logsdb` index mode (#114573) Here we check for the existence of a `host.name` field in index sort settings when the index mode is `logsdb` and decide to inject the field in the mapping depending on whether it exists or not. By default `host.name` is required for sorting in LogsDB. This reduces the chances for errors at mapping or template composition time as a result of injecting the `host.name` field only if strictly required. A user who wants to override index sort settings without including a `host.name` field would be able to do so without finding an addi…

kingherc added >non-issue :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Sep 24, 2024

kingherc self-assigned this Sep 24, 2024

elasticsearchmachine added the v9.0.0 label Sep 24, 2024

kingherc force-pushed the non-issue/ES-9573-fast-refresh-rco branch 3 times, most recently from 3f87579 to f1ff18a Compare September 25, 2024 16:21

kingherc marked this pull request as ready for review September 30, 2024 14:35

kingherc requested review from arteam and henningandersen September 30, 2024 14:35

kingherc requested a review from Tim-Brooks September 30, 2024 14:38

arteam approved these changes Oct 1, 2024

View reviewed changes

kingherc requested a review from pxsalehi October 2, 2024 08:30

henningandersen reviewed Oct 3, 2024

View reviewed changes

kingherc force-pushed the non-issue/ES-9573-fast-refresh-rco branch from f1ff18a to 40df9da Compare October 3, 2024 13:30

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Oct 3, 2024

kingherc commented Oct 3, 2024

View reviewed changes

kingherc requested review from henningandersen and JVerwolf October 3, 2024 13:30

kingherc force-pushed the non-issue/ES-9573-fast-refresh-rco branch from 8f85a6c to da342b0 Compare October 4, 2024 08:39

kingherc added auto-backport-and-merge v8.16.0 labels Oct 4, 2024

henningandersen reviewed Oct 4, 2024

View reviewed changes

kingherc requested a review from original-brownbear October 4, 2024 10:03

Revert comment

e11e283

mark-vieira added auto-backport Automatically create backport pull requests when merged and removed auto-backport-and-merge labels Oct 4, 2024

Merge remote-tracking branch 'kingherc/main' into non-issue/ES-9573-f…

28781a5

…ast-refresh-rco

kingherc mentioned this pull request Oct 7, 2024

Fix bitset filter cache loading in Stateless #114191

Merged

henningandersen approved these changes Oct 7, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java Show resolved Hide resolved

Merge remote-tracking branch 'kingherc/main' into non-issue/ES-9573-f…

9ec611e

…ast-refresh-rco

kingherc merged commit 4990276 into elastic:main Oct 7, 2024
16 checks passed

kingherc mentioned this pull request Oct 7, 2024

[8.x] Fast refresh indices should use search shards (#113478) #114259

Merged

kingherc mentioned this pull request Oct 17, 2024

Revert fast refresh using search shards #115019

Merged

carlosdelest mentioned this pull request Oct 18, 2024

Change synonyms index auto-expand replicas to 0-1 #115078

Closed

kingherc mentioned this pull request Nov 12, 2024

Fast refresh indices to use search shards #116658

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast refresh indices should use search shards #113478

Fast refresh indices should use search shards #113478

kingherc commented Sep 24, 2024 •

edited

Loading

elasticsearchmachine commented Sep 30, 2024

arteam left a comment

henningandersen left a comment

henningandersen Oct 3, 2024

kingherc Oct 3, 2024

henningandersen Oct 4, 2024

kingherc left a comment

kingherc Oct 3, 2024

henningandersen left a comment

henningandersen Oct 4, 2024

henningandersen left a comment

elasticsearchmachine commented Oct 7, 2024

ywangd commented Oct 8, 2024

kingherc commented Oct 8, 2024

ywangd commented Oct 8, 2024

Fast refresh indices should use search shards #113478

Fast refresh indices should use search shards #113478

Conversation

kingherc commented Sep 24, 2024 • edited Loading

elasticsearchmachine commented Sep 30, 2024

arteam left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Oct 3, 2024

Choose a reason for hiding this comment

kingherc Oct 3, 2024

Choose a reason for hiding this comment

henningandersen Oct 4, 2024

Choose a reason for hiding this comment

kingherc left a comment

Choose a reason for hiding this comment

kingherc Oct 3, 2024

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Oct 4, 2024

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 7, 2024

💚 Backport successful

ywangd commented Oct 8, 2024

kingherc commented Oct 8, 2024

ywangd commented Oct 8, 2024

kingherc commented Sep 24, 2024 •

edited

Loading