Skip to content

Commit

Permalink
Merge branch '8.x' into backport/8.x/pr-119507
Browse files Browse the repository at this point in the history
  • Loading branch information
szabosteve authored Jan 6, 2025
2 parents 4ce8c60 + dbd0e59 commit 4a07d60
Show file tree
Hide file tree
Showing 80 changed files with 1,906 additions and 789 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.artifacts.dsl.DependencyHandler;
import org.gradle.api.artifacts.type.ArtifactTypeDefinition;
import org.gradle.api.file.FileCollection;
import org.gradle.api.plugins.JavaPluginExtension;
import org.gradle.api.provider.Provider;
import org.gradle.api.specs.Specs;
Expand Down Expand Up @@ -89,8 +90,8 @@ public void apply(Project project) {
Map<String, TaskProvider<?>> versionTasks = versionTasks(project, "destructiveDistroUpgradeTest", buildParams.getBwcVersions());
TaskProvider<Task> destructiveDistroTest = project.getTasks().register("destructiveDistroTest");

Configuration examplePlugin = configureExamplePlugin(project);

Configuration examplePluginConfiguration = configureExamplePlugin(project);
FileCollection examplePluginFileCollection = examplePluginConfiguration;
List<TaskProvider<Test>> windowsTestTasks = new ArrayList<>();
Map<ElasticsearchDistributionType, List<TaskProvider<Test>>> linuxTestTasks = new HashMap<>();

Expand All @@ -103,9 +104,9 @@ public void apply(Project project) {
t2 -> distribution.isDocker() == false || dockerSupport.get().getDockerAvailability().isAvailable()
);
addDistributionSysprop(t, DISTRIBUTION_SYSPROP, distribution::getFilepath);
addDistributionSysprop(t, EXAMPLE_PLUGIN_SYSPROP, () -> examplePlugin.getSingleFile().toString());
addDistributionSysprop(t, EXAMPLE_PLUGIN_SYSPROP, () -> examplePluginFileCollection.getSingleFile().toString());
t.exclude("**/PackageUpgradeTests.class");
}, distribution, examplePlugin.getDependencies());
}, distribution, examplePluginConfiguration.getDependencies());

if (distribution.getPlatform() == Platform.WINDOWS) {
windowsTestTasks.add(destructiveTask);
Expand Down
3 changes: 2 additions & 1 deletion distribution/docker/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ ext.expansions = { Architecture architecture, DockerBase base ->
'bin_dir' : base == DockerBase.IRON_BANK ? 'scripts' : 'bin',
'build_date' : buildDate,
'config_dir' : base == DockerBase.IRON_BANK ? 'scripts' : 'config',
'git_revision' : buildParams.gitRevision,
'git_revision' : buildParams.gitRevision.get(),
'license' : base == DockerBase.IRON_BANK ? 'Elastic License 2.0' : 'Elastic-License-2.0',
'package_manager' : base.packageManager,
'docker_base' : base.name().toLowerCase(),
Expand Down Expand Up @@ -553,6 +553,7 @@ subprojects { Project subProject ->
inputs.file("${parent.projectDir}/build/markers/${buildTaskName}.marker")
executable = 'docker'
outputs.file(tarFile)
outputs.doNotCacheIf("Build cache is disabled for export tasks") { true }
args "save",
"-o",
tarFile,
Expand Down
6 changes: 6 additions & 0 deletions docs/changelog/119054.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 119054
summary: "[Security Solution] allows `kibana_system` user to manage .reindexed-v8-*\
\ Security Solution indices"
area: Authorization
type: enhancement
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/119476.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 119476
summary: Fix TopN row size estimate
area: ES|QL
type: bug
issues:
- 106956
5 changes: 5 additions & 0 deletions docs/changelog/119495.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119495
summary: Add mapping for `event_name` for OTel logs
area: Data streams
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/119516.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119516
summary: "Fix: do not let `_resolve/cluster` hang if remote is unresponsive"
area: Search
type: bug
issues: []
39 changes: 24 additions & 15 deletions docs/reference/data-management.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,26 @@
--
The data you store in {es} generally falls into one of two categories:

* Content: a collection of items you want to search, such as a catalog of products
* Time series data: a stream of continuously-generated timestamped data, such as log entries

Content might be frequently updated,
* *Content*: a collection of items you want to search, such as a catalog of products
* *Time series data*: a stream of continuously-generated timestamped data, such as log entries
*Content* might be frequently updated,
but the value of the content remains relatively constant over time.
You want to be able to retrieve items quickly regardless of how old they are.

Time series data keeps accumulating over time, so you need strategies for
*Time series data* keeps accumulating over time, so you need strategies for
balancing the value of the data against the cost of storing it.
As it ages, it tends to become less important and less-frequently accessed,
so you can move it to less expensive, less performant hardware.
For your oldest data, what matters is that you have access to the data.
It's ok if queries take longer to complete.

To help you manage your data, {es} offers you:

* <<index-lifecycle-management, {ilm-cap}>> ({ilm-init}) to manage both indices and data streams and it is fully customisable, and
* <<data-stream-lifecycle, Data stream lifecycle>> which is the built-in lifecycle of data streams and addresses the most
common lifecycle management needs.
To help you manage your data, {es} offers you the following options:

preview::["The built-in data stream lifecycle is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]
* <<index-lifecycle-management, {ilm-cap}>>
* <<data-stream-lifecycle, Data stream lifecycle>>
* {curator-ref-current}/about.html[Elastic Curator]

**{ilm-init}** can be used to manage both indices and data streams and it allows you to:
**{ilm-init}** can be used to manage both indices and data streams. It allows you to do the following:

* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
Data older than this period can be deleted by {es}.
Expand All @@ -38,12 +35,24 @@ Data older than this period can be deleted by {es}.
for your older indices while reducing operating costs and maintaining search performance.
* Perform <<async-search-intro, asynchronous searches>> of data stored on less-performant hardware.

**Data stream lifecycle** is less feature rich but is focused on simplicity, so it allows you to easily:
**Data stream lifecycle** is less feature rich but is focused on simplicity. It allows you to do the following:

* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
Data older than this period can be deleted by {es} at a later time.
* Improve the performance of your data stream by performing background operations that will optimise the way your data
stream is stored.
* Improve the performance of your data stream by performing background operations that will optimise the way your data stream is stored.

**Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].

NOTE: <<xpack-rollup,Data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.

[TIP]
====
{ilm-init} is not available on {es-serverless}.
In an {ecloud} or self-managed environment, ILM lets you automatically transition indices through data tiers according to your performance needs and retention requirements. This allows you to balance hardware costs with performance. {es-serverless} eliminates this complexity by optimizing your cluster performance for you.
Data stream lifecycle is an optimized lifecycle tool that lets you focus on the most common lifecycle management needs, without unnecessary hardware-centric concepts like data tiers.
====
--

include::ilm/index.asciidoc[]
Expand Down
18 changes: 18 additions & 0 deletions docs/reference/data-store-architecture.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
= Data store architecture

[partintro]
--

{es} is a distributed document store. Instead of storing information as rows of columnar data, {es} stores complex data structures that have been serialized as JSON documents. When you have multiple {es} nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately
from any node.

The topics in this section provides information about the architecture of {es} and how it stores and retrieves data:

* <<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas.
* <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
* <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
--
include::nodes-shards.asciidoc[]
include::docs/data-replication.asciidoc[leveloffset=-1]
include::modules/shard-ops.asciidoc[]
6 changes: 1 addition & 5 deletions docs/reference/docs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@
For the most up-to-date API details, refer to {api-es}/group/endpoint-document[Document APIs].
--

This section starts with a short introduction to {es}'s <<docs-replication,data
replication model>>, followed by a detailed description of the following CRUD
APIs:
This section describes the following CRUD APIs:

.Single document APIs
* <<docs-index_>>
Expand All @@ -24,8 +22,6 @@ APIs:
* <<docs-update-by-query>>
* <<docs-reindex>>

include::docs/data-replication.asciidoc[]

include::docs/index_.asciidoc[]

include::docs/get.asciidoc[]
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/docs/data-replication.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

[[docs-replication]]
=== Reading and Writing documents
=== Reading and writing documents

[discrete]
==== Introduction
Expand Down
38 changes: 19 additions & 19 deletions docs/reference/high-availability.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,28 @@

[partintro]
--
Your data is important to you. Keeping it safe and available is important
to {es}. Sometimes your cluster may experience hardware failure or a power
loss. To help you plan for this, {es} offers a number of features
to achieve high availability despite failures.
Your data is important to you. Keeping it safe and available is important to Elastic. Sometimes your cluster may experience hardware failure or a power loss. To help you plan for this, {es} offers a number of features to achieve high availability despite failures. Depending on your deployment type, you might need to provision servers in different zones or configure external repositories to meet your organization's availability needs.

* With proper planning, a cluster can be
<<high-availability-cluster-design,designed for resilience>> to many of the
things that commonly go wrong, from the loss of a single node or network
connection right up to a zone-wide outage such as power loss.
* *<<high-availability-cluster-design,Design for resilience>>*
+
Distributed systems like Elasticsearch are designed to keep working even if some of their components have failed. An Elasticsearch cluster can continue operating normally if some of its nodes are unavailable or disconnected, as long as there are enough well-connected nodes to take over the unavailable node's responsibilities.
+
If you're designing a smaller cluster, you might focus on making your cluster resilient to single-node failures. Designers of larger clusters must also consider cases where multiple nodes fail at the same time.
// need to improve connections to ECE, EC hosted, ECK pod/zone docs in the child topics

* You can use <<xpack-ccr,{ccr}>> to replicate data to a remote _follower_
cluster which may be in a different data centre or even on a different
continent from the leader cluster. The follower cluster acts as a hot
standby, ready for you to fail over in the event of a disaster so severe that
the leader cluster fails. The follower cluster can also act as a geo-replica
to serve searches from nearby clients.
* *<<xpack-ccr,Cross-cluster replication>>*
+
To effectively distribute read and write operations across nodes, the nodes in a cluster need good, reliable connections to each other. To provide better connections, you typically co-locate the nodes in the same data center or nearby data centers.
+
Co-locating nodes in a single location exposes you to the risk of a single outage taking your entire cluster offline. To maintain high availability, you can prepare a second cluster that can take over in case of disaster by implementing {ccr} (CCR).
+
CCR provides a way to automatically synchronize indices from a leader cluster to a follower cluster. This cluster could be in a different data center or even a different content from the leader cluster. If the primary cluster fails, the secondary cluster can take over.
+
TIP: You can also use CCR to create secondary clusters to serve read requests in geo-proximity to your users.

* The last line of defence against data loss is to take
<<snapshots-take-snapshot,regular snapshots>> of your cluster so that you can
restore a completely fresh copy of it elsewhere if needed.
* *<<snapshot-restore,Snapshots>>*
+
Take snapshots of your cluster that can be restored in case of failure.
--

include::high-availability/cluster-design.asciidoc[]

include::ccr/index.asciidoc[]
8 changes: 6 additions & 2 deletions docs/reference/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,18 @@ include::security/index.asciidoc[]

include::watcher/index.asciidoc[]

include::ccr/index.asciidoc[leveloffset=-1]

include::data-store-architecture.asciidoc[]

include::rest-api/index.asciidoc[]

include::commands/index.asciidoc[]

include::how-to.asciidoc[]

include::troubleshooting.asciidoc[]

include::rest-api/index.asciidoc[]

include::migration/index.asciidoc[]

include::release-notes.asciidoc[]
Expand Down
37 changes: 25 additions & 12 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,34 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
Now use <<semantic-search-semantic-text, semantic text>> to perform
<<semantic-search, semantic search>> on your data.

//[discrete]
//[[default-enpoints]]
//=== Default {infer} endpoints
[discrete]
[[adaptive-allocations]]
=== Adaptive allocations

Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.

When adaptive allocations are enabled:

* The number of allocations scales up automatically when the load increases.
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.

For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.

[discrete]
[[default-enpoints]]
=== Default {infer} endpoints

//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
//The following list contains the default {infer} endpoints listed by `inference_id`:
Your {es} deployment contains preconfigured {infer} endpoints which makes them easier to use when defining `semantic_text` fields or using {infer} processors.
The following list contains the default {infer} endpoints listed by `inference_id`:

//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)

//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
//The API call will automatically download and deploy the model which might take a couple of minutes.
//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
//For these models, the minimum number of allocations is `0`.
//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
The API call will automatically download and deploy the model which might take a couple of minutes.
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
For these models, the minimum number of allocations is `0`.
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.


[discrete]
Expand Down
Loading

0 comments on commit 4a07d60

Please sign in to comment.