Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MlJobIT testDeleteJobAfterMissingAliases failing #112823

Closed
elasticsearchmachine opened this issue Sep 12, 2024 · 3 comments
Closed

[CI] MlJobIT testDeleteJobAfterMissingAliases failing #112823

elasticsearchmachine opened this issue Sep 12, 2024 · 3 comments
Assignees
Labels
:ml Machine learning needs:risk Requires assignment of a risk label (low, medium, blocker) Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:ml:qa:native-multi-node-tests:javaRestTest" --tests "org.elasticsearch.xpack.ml.integration.MlJobIT.testDeleteJobAfterMissingAliases" -Dtests.seed=7941FCE8814A3939 -Dtests.locale=pis -Dtests.timezone=Asia/Ashkhabad -Druntime.java=22

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: 2 active tasks found:
cluster:admin/xpack/ml/job/close    mDq4nhjfQ8GdURPgCC5tAA:20219 -                            transport  1726153078332 14:57:58 5.8m        127.0.0.1 javaRestTest-2 
cluster:admin/xpack/ml/job/close[n] mDq4nhjfQ8GdURPgCC5tAA:20225 mDq4nhjfQ8GdURPgCC5tAA:20219 transport  1726153078338 14:57:58 5.8m        127.0.0.1 javaRestTest-2 
 expected:<0> but was:<2>

Issue Reasons:

  • [main] 2 failures in test testDeleteJobAfterMissingAliases (0.2% fail rate in 904 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :ml Machine learning >test-failure Triaged test failures from CI labels Sep 12, 2024
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test testDeleteJobAfterMissingAliases (0.2% fail rate in 904 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added Team:ML Meta label for the ML team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Sep 12, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/ml-core (Team:ML)

edsavage added a commit that referenced this issue Oct 16, 2024
A large number (almost the entirety) of tests in the `MlJobIT` tests suite have been muted. In all cases the cause of failure of the tests is the same, persistent tasks for `cluster:admin/xpack/ml/job/close` and `cluster:admin/xpack/ml/job/close[n]` have been detected as present after the test case has completed.

Examination of the tests show that the majority of them do not call either `close` directly or indirectly, indicating that the root cause lies with some previous test. As the `close` task inherits the default timeout of half an hour, an instance of it lingering about can cause a lot of damage to subsequent tests.

The approach taken in this PR is to call the `_task/_cancel` endpoint after every test execution in the `MlJobIT` suite as the final operation. This should restrict the impact of the lingering `close` task to the test responsible, and the reduction in noise should permit better identification of the culprit.

Closes #105239, #113581, #113046, #112729, #113528, #112701, #113742, #113370, #112823, #112088, #112212, #112730, #113654, #113655, #112381, #113477, #112382, #113651, #112510
gitworkflows added a commit to gsoc2/elasticsearch that referenced this issue Oct 16, 2024
* Avoid leaking blackholed register ops in tests (#114287)

Today when we reboot a node in a test case derived from
`AbstractCoordinatorTestCase` we lose the contents of
`blackholedRegisterOperations`, but it's important that these operations
_eventually_ run. With this commit we copy these operations over into
the new node.

* Mute org.elasticsearch.xpack.esql.qa.single_node.RestEsqlIT testProfileOrdinalsGroupingOperator {SYNC} #114380

* Skip storing ignored source for single-element leaf arrays (#113937)

* Minimize storing array source

* restrict to fields

* revert changes for `addIgnoredFieldFromContext`

* fix test

* spotless

* count nulls

* Mute org.elasticsearch.xpack.inference.services.cohere.CohereServiceTests testInfer_StreamRequest #114385

* Add mappings for OTel event body (#114332)

Also changes mappings from body_* to body.*

* Revert "Fix BWC for file-settings based role mappings (#113900)" and related  (#114326)

Revert "Fix BWC for file-settings based role mappings (#113900)" and related changes. Reverted commits:

- 763764c7fac0d5738534e632d7da327711a272d0
- bc8f9dc7f3882a461d4b89d69c7554a4cb3858ac
- ce07060dce69f961c0906079529e91c7dd7d4b48

This is due to a bug in the above fix. We will reintroduce a paired down version of the fix in a subsequent PR.

* Update forcemerge.asciidoc (#114377)

As per request https://github.com/elastic/elasticsearch/pull/114315#issuecomment-2400521895 doing the PR on the main branch.

* Avoid noisy errors in testSyntheticSourceKeepArrays (#114391)

* Minimize storing array source

* restrict to fields

* revert changes for `addIgnoredFieldFromContext`

* fix test

* spotless

* count nulls

* Avoid noisy errors in testSyntheticSourceKeepArrays

* update

* update

* update

* update

* Mute org.elasticsearch.index.mapper.extras.ScaledFloatFieldMapperTests testSyntheticSourceKeepArrays #114406

* Entitlements for System.exit (#114015)

* Entitlements for System.exit

* Respond to Simon's comments

* Rename trampoline -> bridge

* Require exactly one bridge jar

* Use Type helpers to generate descriptor strings

* Various cleanup from PR comments

* Remove null "receiver" for static methods

* Use List<Type> instead of voidDescriptor

* Clarifying comment

* Whoops, getMethod

* SuppressForbidden System.exit

* Spotless

* Use embedded provider plugin to keep ASM off classpath

* Oops... forgot the punchline

* Move ASM license to impl

* Use ProviderLocator and simplify bridgeJar logic

* Avoid eager resolution of configurations during task configuration

* Remove compile-time dependency agent->bridge

---------

Co-authored-by: Mark Vieira <[email protected]>

* Re-enable ScaledFloatFieldMapperTests.testSyntheticSourceKeepArrays (#114408)

* Default enable cluster state role mapper (#114337)

This PR default-enables cluster-state role mappings as the first part of the mitigation for a regression in ECK introduced by https://github.com/elastic/elasticsearch/pull/107410. 

Prior to this PR, cluster-state role mappings were written to cluster-state, but not read from it. 

With this PR, cluster-state role mappings will be read and used to assign roles to users, i.e. in user role resolution. 

However, they will not be included in the output of the [Get role mappings API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-get-role-mapping.html) yet. Exposing them via API is a target for a follow-up fix.

Relates: ES-9628
Supersedes: https://github.com/elastic/elasticsearch/pull/113900

* Ensure green step in synonyms rule yaml test (#114400)

Fixes test issue serverless 2922.

* Unmute SecureHdfsSearchableSnapshotsIT

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym set not found} #114432

* Timeout on buildkite artifact upload but do not fail the build (#114430)

this workarounds an issue we currently see on windows ci boxes where we
run into timeouts in this step in our builds

* Unmute many tests (#114431)

These look to have been muted due to suite timeouts that we've since
fixed. Let's try running these again.

Closes #109687
Closes #112144
Closes #112624
Closes #113315
Closes #113316
Closes #113327
Closes #113340

* [ML] Remove threading from tests (#113212)

We are getting InterruptedExceptions ~1% of the time when running these
tests, but we can remove the threading from this test and still verify
the one-by-one behavior.

Fix #112471

Co-authored-by: Elastic Machine <[email protected]>

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Get a synonym rule} #114443

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym rule not found} #114444

* Bump default timeout for test suites on Windows to 60 minutes (#114428)

Co-authored-by: Elastic Machine <[email protected]>

* Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService (#113981)

* Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService

* Update docs/changelog/113981.yaml

* Updating AlibabaService chunkedInfer to handle sparse embedding task types

---------

Co-authored-by: Elastic Machine <[email protected]>

* Remove ccs_telemetry feature flag (#113825)

This removes `ccs_telemetry` feature flag, and instead introduces an
undocumented, true by default setting: - `search.ccs.collect_telemetry`
- enables CCS search telemetry collection and
`_cluster/stats?include_remote=true`. Can be disabled if this is causing
any problems.

* (Doc+) Link API doc to parent object - part2 (#113541)

* (Doc+) Cross-link CAT APIs to parent object

---------

Co-authored-by: Lisa Cawley <[email protected]>
Co-authored-by: shainaraskas <[email protected]>

* Remove type param from `BaseNodesRequest` (#114399)

This type parameter is only needed so that the `.timeout(TimeValue)`
method returns a request of the right type, but this still requires an
unchecked cast. Yet there's no real need to return anything from this
method, we can just use a regular setter. This commit does that.

* LogsDB `host` and `timestamp` mappings tests (#114001)

Here we are testing mappings of `host` and `timestamp` fields as they are
used as default fields to sort on when using LogsDB. LogsDB uses a
`host.name` field mapped as a `keyword` and a `@timestamp` field, required
by data streams. Some mappings throw errors as a result of incompatibilities
when trying to merge object fields. Such errors are expected.

* Mute org.elasticsearch.xpack.inference.InferenceRestIT test {p0=inference/30_semantic_text_inference/Calculates embeddings using the default ELSER 2 endpoint} #114412

* Prevent flattening of ordered and unordered interval sources (#114234)

This PR applies a temporary patch to fix an issue with ordered and unordered intervals source.
The flattening that is applied in Lucene modifies the final gap preventing valid queries to match.
The fix already exists in Lucene but will be released in Lucene 10.x later this year.
Since the bug prevents the combination of ordered and unordered intervals with gaps, this change applies
a workaround to ensure that the bug is fixed in Elasticsearch 8x.

Relates #113554

* Mute org.elasticsearch.xpack.inference.InferenceRestIT test {p0=inference/40_semantic_text_query/Query a field that uses the default ELSER 2 endpoint} #114376

* [ML] Stream Anthropic Completion (#114321)

Enable chat completion streaming responses for Anthropic's server sent
events.

Co-authored-by: Elastic Machine <[email protected]>

* ESQL: Delay construction of warnings (#114368)

Delay construction of `Warnings` until they are needed to save memory
when evaluating many many many expressions. Most expressions won't use
warnings at all and there isn't any need to make registering warnings
super duper fast. So let's make the construction lazy to save a little
memory. It's like 200 bytes per expression which isn't much, but it's
possible to have thousands of expressions in a single query. Abusive,
but possible.

This also consolidates all `Warnings` usages to a single `Warnings`
class. We had two. We don't need two.

* Ensure that all rewriteable are called in retrievers (#114366)

This PR ensures that all retriever applies the rewrite to all their rewriteable.
Rewriting eagerly at the retriever level ensures that we don't rewrite the same query multiple times
when compound retrievers are used.

* Return `_ignored_source` only if explicitly requested via `stored_fields` or `fields` (#114145)

We do not want to return `_ignored_source` with every search hit. Other than being expensive
(especially when there are many ignored fields), it also exposes some implementation details. Anyway, it might
still be useful to have the ability to retrieve it if necessary, at least for debugging purposes. For this
reason we require that it is explicitly requested using `stored_fields` or `fields` and we do not return
it by default or if requested via a wildcard `*` in `stored_fields` or `fields`.

* IPinfo privacy detection support  (#114456)

* [ML] Filter empty task settings objects from the API response (#114389)

Inference endpoints that do not define task settings or where there are 
no defaults return an empty task_setting object. Filter this object from 
the response

* Azure: Explain why we don't use batch delete (#114379)

* Mute org.elasticsearch.search.retriever.StandardRetrieverBuilderParsingTests testRewrite #114466

* Mute org.elasticsearch.search.retriever.RankDocsRetrieverBuilderTests testRewrite #114467

* Improve performance of Int3Hash#removeAndAdd (#114383)

* Mute org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT #114471

* Add telemetry for retrievers (#114109)

* Update wolfi image and fix breaking change (#114390)

* Actually add `terminate` docs page (#114440)

A docs page for the `terminate` processor was added in
https://github.com/elastic/elasticsearch/pull/114157, but the change
to include it in the outer processor reference page was omitted. This
change corrects that oversight.

* Refactor change point detection (#114289)

* Move change detection code to separate class

* Uniformize ChangeDetector and SkipeAndDipDetector

* Separate ChangeDetectorTests and ChangePointAggregatorTests.

* Public entrypoint for change point detection

* Move p-value computation to ChangeDetector

* Move main entrypoint to a separate file

* Fix synonyms CI tests timeout (#114476)

* Use synonym index alias, add timeout

* Unmute tests

* Clean up factory retention settings from elasticsearch (#114396)

This removes the possibility for a plugin to provide factory retention settings. Factory retention settings have been deprecated and completely replaced by #111972.

Note: this feature is not in use. If someone wants to set global retention they can use the cluster settings as defined in #111972.

* Mute org.elasticsearch.packaging.test.DockerTests test022InstallPluginsFromLocalArchive #111063

* Reduce double and float precision requirements on rest CSV tests (#114313)

Fixes https://github.com/elastic/elasticsearch-serverless/issues/2837

The failing value is `5.801464200000001`, which rounds to `5.8014642`. However, `5.8014642` is roudned to `5.801464199`.
With a precision of 7, both are truncated to `5.801464`.

Not the most elegant solution, but it works for this case, which may be a quite edgy one.

* Fix standard retriever rewrite (#114480)

Closes #114466

* [Build] Add AGPL license to open source poms (#114403)

Aftermath of coming back to open source licensing

* Give the kibana system user permission to read security entities (#114363)

* Give the kibana system user .entities read permissions

* Update docs/changelog/114363.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>

* Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml=reference/esql/esql-across-clusters/line_196} #114488

* Mute org.elasticsearch.gradle.internal.PublishPluginFuncTest org.elasticsearch.gradle.internal.PublishPluginFuncTest #114492

* Handle InternalSendException inline for non-forking handlers (#114375)

When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in #109236

* ESQL: Use less memory in listener (#114358)

Use less memory in the top level listener by fetching the output
attributes from the plan before starting rather than after finishing.
The plan *can* be very large so let's not hold on to it longer than we
have to.

* Speed up XPackRestIT a little (#114425)

This speeds up all of the `profiling` tests in `XPackRestIT` by
replacing a "wait for refresh" with a "refresh as fast as you can".
It'll only be a few seconds of speed up, but it's something.

While I'm here I'm reenabling one of our tests that doesn't seem to be
causing the slow down.

Closes #113340

* Mute org.elasticsearch.xpack.inference.DefaultElserIT testInferCreatesDefaultElser #114503

* [ES|QL] Named parameter for field names and field name patterns (#112905)

* named parameters for field name and pattern

* [CI] Fix PublishPluginFuncTest (#114511)

* Improve performance of LongObjectPagedHashMap#removeAndAdd and ObjectObjectPagedHashMap#removeAndAdd (#114280)

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym set not found} #114432

* Fix enum switch case error in AlibabaSearchService (#114504)

Co-authored-by: Elastic Machine <[email protected]>

* Improve handling of failure to create persistent task (#114386)

Today if creating a persistent task fails with an exception then we
submit a cluster state update to fail the task but until that update
executes we will retry the failing task creation and cluster state
submission on all other cluster state updates that change the persistent
tasks metadata.

With this commit we register a placeholder task on the executing node to
block further attempts to create it until the cluster state update is
processed.

* No longer require logs@settings component template to enable logsdb by default. (#114501)

This change also opts out apm logs from logsdb.

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Get a synonym rule} #114443

* Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT test {p0=synonyms/60_synonym_rule_get/Synonym rule not found} #114444

* Verify Maxmind database types in the geoip processor (#114527)

* [ML] Stream Azure Completion (#114464)

Includes both Azure AI Studio and Azure Open AI.
Both streaming responses are processed using Open AI's SSE format.

* Add chunking settings configuration to ElasticsearchService/ELSER (#114429)

* Add chunking settings configuration to ElasticsearchService/ELSER

* Update docs/changelog/114429.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>

* Adding support for registered country fields for maxmind geoip databases (#114521)

Co-authored-by: Joe Gallo <[email protected]>

* Update "Securing Clients and integrations" to include Fleet (#113731)

* Add link to NO_COPIES allocation explain message (#113656)

* tweaked no-valid-shard-copies message

* untweaked misformatting in allocation explain asciidoc

* [ML] Mute tests using mock web server for streaming (#114542)

Relates #114385

* [ML] Upgrade to AWS SDK v2 (#114309)

- Replaced AWS 1.12.740 with 2.28.13
- Removed `aws-java-sdk*` and its transitive dependencies.
- Added `awssdk:bedrockruntime` as an `implementations`, all transitive
  dependencies are added as `api` matching their marked `Compile` in
  maven.
- Added `awssdk:netty-nio-client` as our client implementation, since
  our v1 integration is using the respective Async client.
- Added netty packages as `runtimeOnly` since they are only used during
  runtime.
- Replaced AWS's use of SLF4J-1.7 with our declaration of SLF4J-2.x,
  since SLF4J includes backwards-compatible bindings.
- Migrated all references from the v1 package (`com.amazonaws`) to the
  v2 package (`software.amazon.awssdk`).

Notable changes in the SDK:
- *Result objects are renamed to *Response objects.
- Objects are now immutable and require Builders to set fields.
- Getters no longer have the `get*` prefix, e.g. `getModelId()` is now
  `modelId()`.
- `Future` has been replaced with `CompletableFuture`.
- There is no longer a need to invoke the `IdleConnectionReaper`, this
  is now done when the client is closed.
- Builders have a consumer mutation pattern for modifying many fields at
  once.

Security changes:
- The underlying Builder objects always check to see if the
  `.aws/credentials` and `.aws/config` files exist, even if they are not
  used, so our `plugin-security.policy` now allows reading these files.
- The Builder always checks for the `http.proxyHost` property before
  defaulting to the hardcoded Bedrock URL.

Resolve #110590

* Updating tests to account for rewritting nested retrievers (#114502)

* Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsAggregation #114554

* [TEST] Add coverage for field caps and ES|QL to LogsDB QA testing (#114505)

* Add coverage for field caps and ES|QL to LogsDB QA testing

* address comments

* address comments

* address comments

* Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsQuery #114563

* Ensure clean thread context in `MasterService` (#114512)

`ThreadContext#stashContext` doesn't guarantee to give a clean thread
context, but it's important we don't allow the callers' thread contexts
to leak into the cluster state update. This commit captures the desired
thread context at startup rather than using `stashContext` when forking
the processor.

* Fix TDigestState.read CB leaks (#114303)

Closes https://github.com/elastic/elasticsearch/issues/114194

Fixes `TDigestState.read()` CB leak on error on `.reserve()`.

* Fix `LogsdbTestSuiteIT` unexpected warning (#114481)

Just expect the warning when adding the template.

* Fix bitset filter cache loading in Stateless (#114191)

As recognized in PR #113478 reviewing, the bitset filter cache
was wrongly eagerly loaded only for fast refresh indices on index
nodes. However, it should be eagerly loaded for any index that
can be searched. This PR fixes this.

* Fix deployment_stats.state for target_allocation_count=0 (#114570)

* Additional index settings provider validation (#113838)

Fail if an index settings provider adds a setting that was added by another index settings provider.

* ES|QL: Add support for cached strings in plan serialization (#112929)

* Improve exception message for bad environment variable placeholders in settings (#114552)

Closes #110858

* Unmute ComparisonTests (#114248)

Closes https://github.com/elastic/elasticsearch/issues/111721

The test got muted time ago. I'm unmuting it because: - I couldn't
reproduce it. The test uses an accuracy delta for assertions, so I
executed it _in a loop_ in case there was some edge case with it, but no
luck - The classes were heavily changed

* Simplifying TextSimilarityRankBuilder to operate through the standard QueryPhase (#114567)

* Enable pushing Sort/Filter by ReferenceAttribute down to Lucene, and thereby optimize Sort by ST_DISTANCE (#112938)

The ST_DISTANCE function added in #108764 was optimized for lucene pushdown in a series of followup PRs, but this did not include sorting by distance. Now this is resolved, for two key scenarios, both known to be valued by users:

* Sorting by distance:
    `FROM index | EVAL distance=ST_DISTANCE(field, literal) | SORT distance`
* Sorting and filtering by distance:
    `FROM index | EVAL distance=ST_DISTANCE(field, literal) | WHERE distance < literal | SORT distance`

The key changes required to make this work:
* Add to the EsQueryExec the appropriate sort->_geo_distance sort type
* Enhance PushTopNToSource to understand how to pushdown the sort even when there is an EVAL in between the FROM and the SORT (between the TopNExec and the EsQueryExec in the physical plan).
* Enhance PushFiltersToSource to understand how to pushdown the filter even when there is an EVAL in between the FROM and the WHERE (between the Filter and the EsQueryExec in the physical plan).

A useful bonus feature of this additional EVAL intelligence is that other, non-spatial cases are now also pushed down. In particular EVALs that are simple aliases are considered and pushed down, for both filtering and sorting.

Local benchmark results, very approximate, but show massive improvements for distanceSort and distanceFilterSort, which relate to the two cases listed above.

Benchmark	Query DSL	ESQL before this PR	ESQL after this PR	Comments
distanceFilter	10	5	5	Optimized in #109972
distanceEvalFilter	10	10000	1500	Still slow due to unnecessary EVAL
distanceSort	150	12000	160	
distanceFilterSort	20	10000	24	

NOTE: This enables pushing down sorting by any ReferenceAttribute that either refers to a sortable FieldAttribute, or to an StDistance function that itself refers to a suitable FieldAttribute of geo_point type.

---------

Co-authored-by: Alexander Spies <[email protected]>

* Verify `CancellableFanOut` items processed before completion (#114595)

Enhances `CancellableFanOutTests#testConcurrency` to ensure that all the
per-item response-handling methods have finished executing before
`onCompletion` is called.

* Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testMatchAllQuery #114607

* Fix dim validation for bit element_type (#114533)

A silly bug has reared its ugly head. Apparently, our dimension
validations are predicated on JSON parsing order, that is not good. 

So, this commit adjusts the dim validations so that it is an actual
validation, instead of something that occurs during parsing.

Additionally, I found that our custom formats were not overriding
`getMaxDimensions` correctly. Typically, and in production, this isn't
that big of a deal, but I have found it useful to do this for other
testing purposes (so that we don't have to rely on the perfield codec
for more direct and advanced testing).

* ESQL: Push down filters even in case of renames in Evals (#114411)

Optimize queries like
... | EVAL b = a, c = b | WHERE c > 2
to
... | WHERE a > 2 | EVAL b = a, c = b

* [Failure store - selector syntax] Refactor IndicesOptions builder (#114597)

**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
https://github.com/elastic/elasticsearch/pull/113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is replacing the the indices options boolean constructor with
the builders. The goal is to give me and the reviewer a very narrow
scope change when we can ensure we did not make any mistakes during the
conversion. Also it will reduce a bit the change list in
https://github.com/elastic/elasticsearch/pull/113144/files.

* [ML] Add sentence overlap option to the sentence chunking settings (#114461)

* ESQL: Improve error message in test (#114524)

Improve an error message in the test for `profile`ing the ordinals-based
grouping operator. It's failed in the past with a rather cryptic error
message. This will either keep it passing fully or give us a better
error message when it does fail.

Closes #114380

* Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTests testPushSpatialIntersectsEvalToSource {default} #114627

* Mute org.elasticsearch.xpack.esql.optimizer.PhysicalPlanOptimizerTests testPushWhereEvalToSource {default} #114628

* Fixing test failure for #114556 (#114617)

* AwaitsFixes for #114625

* Revert "AwaitsFixes for #114625"

This reverts commit d37c1c636bfc87df0906a6427e824f6bca722245.

The automuter got there first.

* SQL: Remove dependency on `org.elasticsearch.Version` (#112094)

This removes SQL's use of `org.elasticsearch.Version` class and usages replaced by `SqlVersion`.

All the currently considered released versions (`7.0.0` to `8.16.0`) have been declared as `SqlVersion` instances. These are still tested against.

The last "known release" (`8.16.0`) is considered the "server compatibility version" and all clients at or past this release are compatible with the server; notably, they can be also on a newer version than server's. 
Clients released before this "server compatibility version" respect existing compatibility requirements (must/can be older up to one major lower, but past `7.7.0`).
The "server compatibility version" will not be updated with newer stack releases (at least not until #112745 is addressed).

Fixes #102689

* Replace "::<type>" casts to explicit casting functions (#114639)

Fixes https://github.com/elastic/elasticsearch/issues/114613

Those tests were added with `::<type>` casts, which don't work in older versions.

As they aren't testing anything around those casts, I'm replacing them with `TO_<TYPE>()` functions to let them work everywhere.

* Second parsing pass tracks array scopes properly (#114621)

* Change exception type when timing out waiting for specific seqno in fleet search api. (#114526)

Without this change request fails with `ElasticsearchTimeoutException` if waiting for seqno times out. This results in a 500 status code.

With this change the `SearchTimeoutException` is used which results in a 504 status code. This is a more appropriate response code for time-outs.

Closes #114395

* Allow stored source in logsdb and tsdb (#114454)

* ESQL: Retry test on 403 (#114450)

Retry the async test when you get a 403 - that could be because security
has not yet booted. We should have permission to fetch everything.

* CCS metadata is opt-in in ESQL JSON responses (#114437)

Since Kibana only needs CCS metadata in ESQL responses from certain well-defined locations,
we are making CCS metadata opt-in. This feature is patterned after ESQL profiling, where
you specify "profile": true in the ESQL body and if you asked for it will be present in the response
always (it will be written to the .async-search index and you can’t turn it off in later async-search
requests against this particular query ID) and if you didn’t ask for it at the beginning it will never
be present (it will NOT be written to the .async-search index when it is persisted).

The new option is "include_ccs_metadata": true/false.

* ESQL: Speed up grouping by bytes (#114021)

This speeds up grouping by bytes valued fields (keyword, text, ip, and
wildcard) when the input is an ordinal block:
```
    bytes_refs 22.213 ± 0.322 -> 19.848 ± 0.205 ns/op (*maybe* real, maybe noise. still good)
       ordinal didn't exist   ->  2.988 ± 0.011 ns/op
```
I see this as 20ns -> 3ns, an 85% speed up. We never hard the ordinals
branch before so I'm expecting the same performance there - about 20ns
per op.

This also speeds up grouping by a pair of byte valued fields:
```
two_bytes_refs 83.112 ± 42.348  -> 46.521 ± 0.386 ns/op
  two_ordinals 83.531 ± 23.473  ->  8.617 ± 0.105 ns/op
```
The speed up is much better when the fields are ordinals because hashing
bytes is comparatively slow.

I believe the ordinals case is quite common. I've run into it in quite a
few profiles.

* ESQL: Test partially filtered aggs (#114510)

Tests for partially filtered aggs. It uses the existing aggs tests and
adds junk rows that are filtered away. That way we don't have to add new
testing assertions to each class - we just can reuse the existing
assertions.

* Mute org.elasticsearch.xpack.inference.integration.ModelRegistryIT testGetModel #114657

* [ES|QL] Add hypot function (#114382)

Adds a hypotenuse function

* Replace cloud-ess docker image with wolfi-ess (#114413)

* Replace cloud-ess docker image with wolfi-ess
   We just replaced the existing implementation of cloud-ess with what was wolfi-ess which is a wolfi based ess image. 
   The cloud image itself will be removed in a future commit it was not used anywhere

* Switch to test cloud docker image instead of default docker in packaging pr tests. 
  This adds way more coverage than the default docker image which is also barely touched

* Initial InstrumenterTests (#114422)

* Initial InstrumenterTests

* Assert on instrumentation method arguments

* Unmute test that does not exist anymore (#114655)

Closes #111631.

* Support IPinfo database configurations (#114548)

* Move tests out of geo ip processor tests (#114656)

* [Inference API] Introduce Update API to change some aspects of existing inference endpoints (#114457)

* Refactor IPinfoIpDataLookupsTests tests (and others) (#114667)

* Introduce `index.mapping.source.mode` setting to override `_source.mode` (#114433)

* featur : introduce index.mapping.source.mode setting

Introduce a new `index.mapper.source.mode` setting which will be used
to override the mapping level `_source.mode`. For now the mapping
level setting will stay and be deprecated later with another PR.

The setting takes precedence always precedence. When not defined
the index mode is used and can be overridden by the _source.mode
mapping level definition.

* Add feature flag for subobjects auto (#114616)

* Avoid throw exception in SyntheticSourceIndexSettingsProvider (#114479)

Co-authored-by: Nhat Nguyen <[email protected]>

* Add a callback for onConnectionClosed to MockTransportService (#114564)

The callback is added to allow inserting additional behaviour such as
delay when handling closed connection.

* Add ESQL match function (#113374)

* Add alias event.dataset -> data_stream.dataset (#114642)

* [ML] Feature flag default configs (#114660)

* Renovate Bot PRs should run ci checks (#114699)

* Simplify NodeShutdownShardsIT (#114583)

We no longer need to manually reroute after registering node shutdown in test
since https://github.com/elastic/elasticsearch/pull/103251

* Add generated code changes for HypotEvaluator (#114697)

* ES|QL: Restrict sorting for _source and counter field types (#114638)

* Preserve thread context when waiting for segment generation in RTG (#114623)

Closes ES-9778

* Fix failing tests after PR clash (#114625)

Two PRs conflicted without github or CI noticing. The first added these
tests, and the second modified their behaviour. Both went green in CI
and both were merged within an hour of each other.

* PR that added the tests:
  * https://github.com/elastic/elasticsearch/pull/112938
  * merged 14:13CET
* PR that changed the behaviour of these tests:
  * https://github.com/elastic/elasticsearch/pull/114411
  * merged 14:48CET

* Guard second doc parsing pass with index setting (#114649)

* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Fix termStats posting usage (#114644)

* Node shutdown test integration test (#114582)

This change adds a test case that verifies that the node
can be shutdown while hosting an index with 0-1 or 0-all
auto-expand configuration.

* Update docker.elastic.co/wolfi/chainguard-base:latest Docker digest to 277ebb4 (main) (#114409)

* Update docker.elastic.co/wolfi/chainguard-base:latest Docker digest to 277ebb4
* Tweak renovate replace pattern

---------

Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
Co-authored-by: Rene Groeschke <[email protected]>

* Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml=reference/rest-api/usage/line_38} #113694

* Introduce CRUD APIs for data stream options (#113945)

In this PR we introduce two endpoint PUT and GET to manage the data
stream options and consequently the failure store configuration on the
data stream level. This means that we can manage the failure store of
existing data streams.

The APIs look like:

```
# Enable/disable 
PUT _data_stream/my-data-stream/_options
{
  "failure_store": {
    "enabled": true
  }
}

# Remove existing configuration
DELETE _data_stream/my-data-stream/_options

# Retrieve 
GET _data_stream/my-data-stream/_options
{
  "failure_store": {
    "enabled": true
  }
}
```

Future work:

- Document the new APIs
- Convert `DataStreamOptionsIT.java` to a yaml test.

* Expands semantic_text tutorial with hybrid search (#114398)

* Creates a new page for the hybrid search tutorial

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Adds search  response example

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

* Update docs/reference/search/search-your-data/semantic-text-hybrid-search

Co-authored-by: István Zoltán Szabó <[email protected]>

---------

Co-authored-by: István Zoltán Szabó <[email protected]>

* Add ResolvedExpression wrapper (#114592)

**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
https://github.com/elastic/elasticsearch/pull/113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is introducing a wrapper around the resolved expression that
used to be a `String` to create the base on which the selectors are
going to be added.

The current PR is just a refactoring and does not and should not change
any existing behaviour.

* Update IndexSettingProvider#getAdditionalIndexSettings() signature (#114150)

With logsdb another index mode is available, the isTimeSeries parameter is limiting. Instead, we should just push down the index mode from template to index settings provider.

Follow up from #113451
Relates to #113583

* Fix Max Score Propagation in RankDocsQuery (#114716)

Fix rank doc query when some segments have no ranked docs

* [ML] Switch default chunking strategy to sentence (#114453)

* Don't close/recreate adaptive allocations metrics (#114721)

* Simplify `XContent` output of epoch times (#114491)

Today the overloads of `XContentBuilder#timeField` do two rather
different things: one formats an object as a `String` representation of
a time (where the object is either an unambiguous time object or else a
`long`) and the other formats only a `long` as one or two fields
depending on the `?human` flag.

This is trappy in a number of ways:

- `long` means an absolute (epoch) time, but sometimes folks will
  mistakenly use this for time intervals too.

- `long` means only milliseconds, there is no facility to specify a
  different unit.

- the dependence on the `?human` flag in exactly one of the overloads is
  kinda weird.

This commit removes the confusion by dropping support for considering a
`Long` as a valid representation of a time at all, and instead requiring
callers to either convert it into a proper time object or else call a
method that is explicitly expecting an epoch time in milliseconds.

* Clarify use of special values for publish addresses (#114551)

Special values like `0.0.0.0` may resolve to multiple IP addresses just
like hostnames, so the same considerations apply when using such values
as a publish address. This commit spells this case out in the docs and
cleans up the nearby wording a little.

* [ML] Pick best model variant for the default elser endpoint (#114690)

* [ML] Ignore unrecognized openai sse fields (#114715)

Azure / Llama sends back fields we do not expect - rewriting the parser
to better handle unknown fields (by dropping them).

* [ML] Send mid-stream errors to users (#114549)

If apache sends an error mid stream, forward it to the user rather than
the now-ignored listener.

* Add a query rules tester API call (#114168)

* Add a query rules tester API call

* Update docs/changelog/114168.yaml

* Wrap client call in async with origin

* Remove unused param

* PR feedback

* Remove redundant test

* CI workaround - add ent-search as ml dependency so it can find node features

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/10_basic/Test using the deprecated elasticsearch_version field results in a warning} #114748

* Mute org.elasticsearch.xpack.eql.EqlRestIT testIndexWildcardPatterns #114749

* Refactor merge scheduling code to allow overrides (#114547)

This code refactors how the merge scheduler is configured to allow
different engine implementations to configure different merge schedulers.

* [DOCS] ES|QL: Adding a tip to the WHERE documentation (#114050)

* Adding a tip to make null field behavior more apparent.

* Update docs/reference/esql/processing-commands/where.asciidoc

Co-authored-by: Andrei Stefan <[email protected]>

* Update docs/reference/esql/processing-commands/where.asciidoc

Rephrasing for clarity

Co-authored-by: Liam Thompson <[email protected]>

---------

Co-authored-by: Andrei Stefan <[email protected]>
Co-authored-by: Liam Thompson <[email protected]>

* Mute org.elasticsearch.xpack.eql.EqlRestIT testBadRequests #114752

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich stats REST response structure} #114753

* Remove PushTopNToSource support for ExchangeExec (#114637)

This appears to be dead code, so we're removing it.

* Test StDistance multivalue consistency and fixed two CartesianPoint bugs (#114729)

* Fix Synthetic Source Handling for `bit` Type in `dense_vector` Field (#114407)

**Description:**

This PR addresses the issue described in [#114402](https://github.com/elastic/elasticsearch/issues/114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [#114402](https://github.com/elastic/elasticsearch/issues/114402)
- Introduced in [#110059](https://github.com/elastic/elasticsearch/pull/110059)

* Mute org.elasticsearch.xpack.rank.rrf.RRFRankClientYamlTestSuiteIT test {yaml=rrf/800_rrf_with_text_similarity_reranker_retriever/explain using rrf retriever and text-similarity} #114757

* only return deprecation warning for elser service (#114507)

Co-authored-by: Elastic Machine <[email protected]>

* [ML] Stream Google Completion (#114596)

Google supports SSE for chat completion and sends the same payload as
their non-streaming calls, so we can reuse the SSE parser with our
existing parse function.

The downside is, google requires a different URI, so we refactored away
from the visitor pattern to allow for a different URI creating and set
during request time rather than on model instantiation time.

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/30_tsdb_index/enrich documents over _bulk} #114761

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich documents over _bulk via an alias} #114763

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/10_basic/Test enrich crud apis} #114766

* [ML] Dynamically get of num allocations (#114636)

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/20_standard_index/enrich documents over _bulk} #114768

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/50_data_stream/enrich documents over _bulk via a data stream} #114769

* Mute org.elasticsearch.xpack.eql.EqlRestValidationIT testDefaultIndicesOptions #114771

* Mute org.elasticsearch.xpack.enrich.EnrichIT testEnrichSpecialTypes #114773

* Mute org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT testEveryActionIsEitherOperatorOnlyOrNonOperator #102992

* Mute org.elasticsearch.xpack.enrich.EnrichIT testDeleteExistingPipeline #114775

* Support IPinfo databases in the ip_location processor (#114735)

* [ML] Default inference endpoint for the multilingual-e5-small model (#114683)

* [ML] Stream Bedrock Completion (#114732)

Notes:
- Adds a new API to the chatCompletionRequest to invoke the Bedrock
  Stream API
- Create a StreamingChatProcessor that subscribes to streaming results
  from bedrock and handles the parsing on another thread.
- There was no good way (that I could see) to extend the Provider-based
  CompletionRequestEntity, so they have been flattened into one
  RequestEntity that can be shared between ConverseRequest and
  ConverseStreamRequest.

* Adding new bbq index types behind a feature flag (#114439)

new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.

* Add IndicesMetrics instead of IndicesService to toClose (#114782)

The same line already exists in
[L543](https://github.com/ywangd/elasticsearch/blob/9f4a7927bdc366f8ca98c4652ac7d1102d9430f5/server/src/main/java/org/elasticsearch/node/Node.java#L543).
It should have no practial impact since AbstractLifecycleComponent#close
short-circuits if its lifecycle is already closed. The original code
meant to close IndicesMetrics. This PR adds it.

Relates: #113737

* Mute org.elasticsearch.test.rest.ClientYamlTestSuiteIT org.elasticsearch.test.rest.ClientYamlTestSuiteIT #114787

* Mute org.elasticsearch.xpack.inference.rest.ServerSentEventsRestActionListenerTests testNoStream #114788

* Mute org.elasticsearch.xpack.eql.EqlRestValidationIT testAllowNoIndicesOption #114789

* Mute org.elasticsearch.xpack.eql.EqlStatsIT testEqlRestUsage #114790

* ESQL: Add skips to tests that were added retroactively (#114727)

Skip some csv tests that cannot be used in bwc tests before 8.13/8.14.

* Remove snapshot build restriction for match and qstr functions (#114482)

* ESQL: Introduce per agg filter (#113735)

Add support for aggregation scoped filters that work dynamically on the
 data in each group.

| STATS
    success = COUNT(*) WHERE 200 <= code AND code < 300,
   redirect = COUNT(*) WHERE 300 <= code AND code < 400,
 client_err = COUNT(*) WHERE 400 <= code AND code < 500,
 server_err = COUNT(*) WHERE 500 <= code AND code < 600,
 total_count = COUNT(*)

Implementation wise, the base AggregateFunction has been extended to
 allow a filter to be passed on. This is required to incorporate the
 filter as part of the aggregate equality/identify which would fail with
 the filter as an external component.

As part of the process, the serialization for the existing aggregations
 had to be fixed so AggregateFunction implementations so that it
 delegates to their parent first.

* Skip spatial.AirportsSortCityName before 8.13 (#114795)

Fix https://github.com/elastic/elasticsearch/issues/114767.

TopN didn't work in this scenario on old versions.

* Remove unused v7-only APIs (#114733)

Removes several REST endpoints that only existed in the now-inaccessible
v7-compatible mode.

* Mute org.elasticsearch.xpack.eql.EqlRestIT testUnicodeChars #114791

* [TestFix] ExplainLifecycleIT testStepInfoPreservedOnAutoRetry failing (#114294)

* Extend timeout of test and add logging on fail

* Unmute unstable test

* Switch to using logger for output

Keeps the forbiddenApis check happy

* Switch to using assertion messages to display

To display debug info

* Adjust logic of previous step info preservation

Add additional checks to ensure previous step info can't be cleared
when auto retrying, only updated with new info.

Also added logic to ensure previous step info is cleared when
transitioning to a new action

* Undo accidentally added lines from merge

* Remove v7 compat from `{PUT,DELETE} /_snapshot/${REPO}` APIs (#114726)

This exception mangling only existed for v7 API compatibility and is no
longer needed.

* [TEST] Migrated ccs-unavailable-clusters QA tests (#114764)

Ccs-unavailable-clusters QA tests migrated to the new REST testing
framework, using 'elasticsearch.internal-java-rest-test' Gradle plugin

* Add documentation for passthrough field type (#114720)

* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <[email protected]>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <[email protected]>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <[email protected]>

* address comment

---------

Co-authored-by: Felix Barnsteiner <[email protected]>

* Remove all v7-only REST endpoints (#114765)

These endpoints were deprecated in v7 and are unsupported in v8 so we
can remove them entirely in v9.

* Remove unused `ChunkedToXContent#toXContentChunkedV7` (#114728)

We don't support the v7 REST API in v9, so this commit removes the
now-unused `ChunkedToXContent#toXContentChunkedV7` method. It also
introduces a similar `ChunkedToXContent#toXContentChunkedV8` method for
implementations to use for v8 REST API compatibility.

* [ML] Wait for allocation on scale up from 0 (#114719)

* Mute org.elasticsearch.xpack.rank.rrf.RRFRetrieverBuilderNestedDocsIT testRRFExplainWithNamedRetrievers #114820

* Fix minor formatting issue (#114815)

The list with two options doesn't get rendered as a list, due to the
snippet in between.

https://www.elastic.co/guide/en/elasticsearch/reference/master/passthrough.html#passthrough-conflicts

* [DOCS] Fix User agent processor properties (#112518)

* Mute org.elasticsearch.ingest.geoip.HttpClientTests org.elasticsearch.ingest.geoip.HttpClientTests #112618

* Mute org.elasticsearch.xpack.remotecluster.RemoteClusterSecurityWithApmTracingRestIT testTracingCrossCluster #112731

* [EIS] Validate EIS Gateway URL if set (#114600)

* #111893 Add Warnings For Missing Index Templates (#114589)

* Add data stream template validation

to snapshot restore

* Add data stream template validation

to data stream promotion endpoint

* Add new assertion for response headers

Add a new assertion to synchronously execute a request and check the
response contains a specific warning header

* Test for warning header on snapshot restore

When missing templates

* Test for promotion warnings

* Add documentation for the potential error states

* PR changes

* Spotless reformatting

* Add logic to look in snapshot global metadata

This checks if the snapshot contains a matching template for the DS

* Comment on test cleanup to explain it was copied

* Removed cluster service field

* Mute org.elasticsearch.xpack.enrich.EnrichIT testImmutablePolicy #114839

* Introduce utils for _really_ stashing the thread context (#114786)

`ThreadContext#stashContext` does not yield a completely fresh context:
it preserves headers related to tracing the original request. That may
be appropriate in many situations, but sometimes we really do want to
detach processing entirely from the original task. This commit
introduces new utilities to do that.

* [TEST] Fix ccs-unavailable-clusters QA tests build (#114833)

Properly use `configureEach` on the task configuration to postpone the
tasks creation and configuration in the build process

* Mark Data Stream Lifecycle APIs to stable (#114780)

Data Stream Lifecycle has GA'ed in 8.14, so we can safely mark these as
stable.

* Remove all replaced-in-v8 REST endpoints (#114800)

These endpoints were deprecated in v7 and are replaced in v8 with
different endpoints so we can remove the v7 endpoint names in v9.

* Set min number of allocations for ElasticSearchInternalService to 0 (#114829)

* Set min number of allocations for ElasticSearchInternalService to 0

* Updating IT tests with new min allocations value

---------

Co-authored-by: Elastic Machine <[email protected]>

* Updating queries used in rrf with text similarity tests (#114838)

* Fix bbq index feature exposure for testing & remove feature flag (#114832)

We actually don't need a cluster feature, a capability added if the
feature flag is enabled is enough for testing.

closes https://github.com/elastic/elasticsearch/issues/114787

* Allow synthetic source and disabled source for standard indices (#114817)

When using the index.mapping.source.mode setting we need to make sure 
that it takes precedence and that is used also when standard index mode
is used. Without this patch we always return stored source if
_source.mode is not used and the setting is.

Relates #114433

* ESQL: Fix grammar changes around per agg filtering (#114848)

Remove dev flag left in grammar for agg filtering
Related to #113735

* [ML] Create an ml node inference endpoint referencing an existing deployment (#114750)

* Support multi-valued fields in compute engine for ST_DISTANCE (#114836)

In #112063 we added support for multivalued fields to the compute engine for ST_INTERSECTS and relatives, but not for ST_DISTANCE. In #114729 it was discovered that, at least for the common case of a field and a constant, this support was not needed due to ST_DISTANCE being re-written to ST_INTERSECTS. However, for many other cases, like ST_DISTANCE used on the coordinator node, or between two fields, this lack of support would result in null values.

This PR fixes those cases, making sure ST_DISTANCE uses the Block-Builder approach similar to what was done for ST_INTERSECTS et al.

* Ensuring consistent ordering for inner hits in collapse test for rrf (#114740)

* Revert "[ML] Dynamically get of num allocations (#114636)" (#114861)

This reverts commit 8040fbb0d05401d40ea856f0a4982e8aaab48340.

* Mute org.elasticsearch.license.LicensingTests org.elasticsearch.license.LicensingTests #114865

* Retry throttled snapshot deletions (#113237)

Closes ES-8562

* Make mapping a distinct concept in logsdb data generation (#114370)

* Download IPinfo ip location databases (#114847)

* Revert "[EIS] Validate EIS Gateway URL if set (#114600)" (#114867)

This reverts commit 39168e139d98b2eacade007fcd616715a6106c10.

* [ML] Unmute MlJobIT tests (#114553)

A large number (almost the entirety) of tests in the `MlJobIT` tests suite have been muted. In all cases the cause of failure of the tests is the same, persistent tasks for `cluster:admin/xpack/ml/job/close` and `cluster:admin/xpack/ml/job/close[n]` have been detected as present after the test case has completed.

Examination of the tests show that the majority of them do not call either `close` directly or indirectly, indicating that the root cause lies with some previous test. As the `close` task inherits the default timeout of half an hour, an instance of it lingering about can cause a lot of damage to subsequent tests.

The approach taken in this PR is to call the `_task/_cancel` endpoint after every test execution in the `MlJobIT` suite as the final operation. This should restrict the impact of the lingering `close` task to the test responsible, and the reduction in noise should permit better identification of the culprit.

Closes #105239, #113581, #113046, #112729, #113528, #112701, #113742, #113370, #112823, #112088, #112212, #112730, #113654, #113655, #112381, #113477, #112382, #113651, #112510

* Fixing randomization issue for RRFRetrieverBuilderNestedDocsIT (#114859)

* Mute org.elasticsearch.datastreams.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testTermsQuery #114873

* Mute org.elasticsearch.xpack.enrich.EnrichIT testDeleteIsCaseSensitive #114840

* Remove dead branches for v7 REST API (#114850)

In v9 the `getRestApiVersion()` method on `RestRequest`,
`XContentBuilder` and `XContentParser` can never return `V_7`, so we can
replace all the expressions of the form `$x$.getRestApiVersion() == V_7`
with `false`. This commit does that, and then refactors away the
resulting dead code using (largely) automated transformations.

* Removing tech-preview header and updating documentation for retrievers and RRF (#114810)

* Retry `S3BlobContainer#getRegister` on all exceptions (#114813)

S3 register reads are subject to the regular client retry policy, but in
practice we see failures of these reads sometimes for errors that are
transient but for which the SDK does not retry. This commit adds another
layer of retries to these reads.

Relates ES-9721

* OTel mappings: avoid metrics to be rejected when  attributes are malformed (#114856)

* #104411 Add warning headers for ingest pipelines containing special characters (#114837)

* Add logs and headers

For pipeline creation when name is invalid

* Fix YAML tests and add YAML test for warnings

* Update docs/changelog/114837.yaml

* Changelog entry

* Changelog entry

* Update docs/changelog/114837.yaml

* Changelog entry

* [Failure store - selector syntax] Replace failureOptions with selector options internally. (#114812)

**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
https://github.com/elastic/elasticsearch/pull/113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is replacing the `FailureStoreOptions` with the
`SelectorOptions`, there shouldn't be any perceivable change to the user
since we kept the query parameter "failure_store" for now. It will be
removed in the next PR which will introduce the parsing of the
expressions. 

_The current PR is just a refactoring and does not and should not change
any existing behaviour._

* Mute org.elasticsearch.packaging.test.EnrollmentProcessTests test20DockerAutoFormCluster #114885

* Document _cat/indices behavior when encountering source only indices (#114884)

Closes https://github.com/elastic/elasticsearch/issues/114546

* Inline `MockTransportService#getLocalDiscoNode()` (#114883)

This method just delegates to `getLocalNode()`, we may as well call the
more widely-used method with the shorter name directly.

* Better DataType string checks (#114863)

* Use DataType.isString

* Add DataType.stringTypes()

* Fix shouldHideSignature check

* Fix NPE in AdaptiveAllocationsScalerService (#114880)

* Fix NPE in AdaptiveAllocationsScalerService

* Update docs/changelog/114880.yaml

* Delete docs/changelog/114880.yaml

* ESQL: Fix MvPercentileTests precision issues (#114844)

Fixes https://github.com/elastic/elasticsearch/issues/114588
Fixes https://github.com/elastic/elasticsearch/issues/114587
Fixes https://github.com/elastic/elasticsearch/issues/114586
Fixes https://github.com/elastic/elasticsearch/issues/114585
Fixes https://github.com/elastic/elasticsearch/issues/113008
Fixes https://github.com/elastic/elasticsearch/issues/113007
Fixes https://github.com/elastic/elasticsearch/issues/113006
Fixes https://github.com/elastic/elasticsearch/issues/113005

Fixed the long precision issue by allowing a +/-1 range.

Also made a minor refactor to simplify using different matchers for different types.

* Remove the min_compatible_shard_node option and associated classes (#114713)

Any similar functionality in the future should use capabilities instead

* Fixes flaky ST_CENTROID_AGG tests (#114892)

Even with Kahan summation, we were occasionally getting floating point differences at the 14th decimal point, well beyond anything a GIS use case would care about.

* Fix ST_CENTROID_AGG when no records are aggregated (#114888)

This was returning an invalid result `POINT(NaN NaN)` and now instead returns `null`.

* Mute org.elasticsearch.test.rest.ClientYamlTestSuiteIT test {yaml=cluster.stats/30_ccs_stats/cross-cluster search stats search} #114902

* Mute org.elasticsearch.xpack.enrich.EnrichRestIT test {p0=enrich/40_synthetic_source/enrich documents over _bulk} #114825

* Reset array scope tracking for nested objects (#114891)

* Reset array scope tracking for nested objects

* update

* update

* update

* ESQL: adapt to new range in ToDatetimeTests (#114605)

Two tests shared the same name in `ToDatetimeTests`, so that needed
fixing. But then also the ranges in the masked test needed adjusting
after the change that added the masking test.

Fixes #108093

* Fix setOnce in EmbeddingRequestChunker (#114900)

* [DOCS] Adds link to tutorial and API docs to trained model autoscaling. (#114904)

* Mute org.elasticsearch.xpack.inference.DefaultEndPointsIT testInferDeploysDefaultElser #114913

* Inject the `host.name` field mapping only if required for `logsdb` index mode (#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an addi…
@edsavage
Copy link
Contributor

Fixed by #114553

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Oct 25, 2024
A large number (almost the entirety) of tests in the `MlJobIT` tests suite have been muted. In all cases the cause of failure of the tests is the same, persistent tasks for `cluster:admin/xpack/ml/job/close` and `cluster:admin/xpack/ml/job/close[n]` have been detected as present after the test case has completed.

Examination of the tests show that the majority of them do not call either `close` directly or indirectly, indicating that the root cause lies with some previous test. As the `close` task inherits the default timeout of half an hour, an instance of it lingering about can cause a lot of damage to subsequent tests.

The approach taken in this PR is to call the `_task/_cancel` endpoint after every test execution in the `MlJobIT` suite as the final operation. This should restrict the impact of the lingering `close` task to the test responsible, and the reduction in noise should permit better identification of the culprit.

Closes elastic#105239, elastic#113581, elastic#113046, elastic#112729, elastic#113528, elastic#112701, elastic#113742, elastic#113370, elastic#112823, elastic#112088, elastic#112212, elastic#112730, elastic#113654, elastic#113655, elastic#112381, elastic#113477, elastic#112382, elastic#113651, elastic#112510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning needs:risk Requires assignment of a risk label (low, medium, blocker) Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants