Merge branch 'main' into bcbii_test

ChrisHegarty · Oct 29, 2024 · 6804f84 · 6804f84
2 parents 86f55fe + 2522c98
commit 6804f84
Show file tree

Hide file tree

Showing 111 changed files with 1,934 additions and 633 deletions.
diff --git a/.buildkite/packer_cache.sh b/.buildkite/packer_cache.sh
@@ -29,6 +29,6 @@ for branch in "${branches[@]}"; do
   fi
 
   export JAVA_HOME="$HOME/.java/$ES_BUILD_JAVA"
-  "checkout/${branch}/gradlew" --project-dir "$CHECKOUT_DIR" --parallel -s resolveAllDependencies -Dorg.gradle.warning.mode=none -DisCI
+  "checkout/${branch}/gradlew" --project-dir "$CHECKOUT_DIR" --parallel -s resolveAllDependencies -Dorg.gradle.warning.mode=none -DisCI --max-workers=4
   rm -rf "checkout/${branch}"
 done
diff --git a/docs/changelog/114855.yaml b/docs/changelog/114855.yaml
@@ -0,0 +1,5 @@
+pr: 114855
+summary: Add query rules retriever
+area: Relevance
+type: enhancement
+issues: [ ]
diff --git a/docs/changelog/115656.yaml b/docs/changelog/115656.yaml
@@ -0,0 +1,5 @@
+pr: 115656
+summary: Fix stream support for `TaskType.ANY`
+area: Machine Learning
+type: bug
+issues: []
diff --git a/docs/changelog/115715.yaml b/docs/changelog/115715.yaml
@@ -0,0 +1,5 @@
+pr: 115715
+summary: Avoid `catch (Throwable t)` in `AmazonBedrockStreamingChatProcessor`
+area: Machine Learning
+type: bug
+issues: []
diff --git a/docs/changelog/115721.yaml b/docs/changelog/115721.yaml
@@ -0,0 +1,5 @@
+pr: 115721
+summary: Change Reindexing metrics unit from millis to seconds
+area: Reindex
+type: enhancement
+issues: []
diff --git a/docs/reference/cat/shards.asciidoc b/docs/reference/cat/shards.asciidoc
@@ -33,7 +33,7 @@ For <<data-streams,data streams>>, the API returns information about the stream'
 * If the {es} {security-features} are enabled, you must have the `monitor` or
 `manage` <<privileges-list-cluster,cluster privilege>> to use this API.
 You must also have the `monitor` or `manage` <<privileges-list-indices,index privilege>>
-for any data stream, index, or alias you retrieve.
+to view the full information for any data stream, index, or alias you retrieve.
 
 [[cat-shards-path-params]]
 ==== {api-path-parms-title}

diff --git a/docs/reference/esql/esql-kibana.asciidoc b/docs/reference/esql/esql-kibana.asciidoc
@@ -171,14 +171,44 @@ FROM kibana_sample_data_logs
 [[esql-kibana-time-filter]]
 === Time filtering
 
-To display data within a specified time range, use the
-{kibana-ref}/set-time-filter.html[time filter]. The time filter is only enabled
-when the indices you're querying have a field called `@timestamp`.
+To display data within a specified time range, you can use the standard time filter, 
+custom time parameters, or a WHERE command.
 
-If your indices do not have a timestamp field called `@timestamp`, you can limit
-the time range using the <<esql-where>> command and the <<esql-now>> function.
+[discrete]
+==== Standard time filter
+The standard {kibana-ref}/set-time-filter.html[time filter] is enabled
+when the indices you're querying have a field named `@timestamp`.
+
+[discrete]
+==== Custom time parameters
+If your indices do not have a field named `@timestamp`, you can use
+the `?_tstart` and `?_tend` parameters to specify a time range. These parameters 
+work with any timestamp field and automatically sync with the {kibana-ref}/set-time-filter.html[time filter].
+
+[source,esql]
+----
+FROM my_index
+| WHERE custom_timestamp >= ?_tstart AND custom_timestamp < ?_tend
+----
+
+You can also use the `?_tstart` and `?_tend` parameters with the <<esql-bucket>> function 
+to create auto-incrementing time buckets in {esql} <<esql-kibana-visualizations,visualizations>>. 
+For example:
+
+[source,esql]
+----
+FROM kibana_sample_data_logs
+| STATS average_bytes = AVG(bytes) BY BUCKET(@timestamp, 50, ?_tstart, ?_tend)
+----
+
+This example uses `50` buckets, which is the maximum number of buckets.
+
+[discrete]
+==== WHERE command
+You can also limit the time range using the <<esql-where>> command and the <<esql-now>> function.
 For example, if the timestamp field is called `timestamp`, to query the last 15
 minutes of data:
+
 [source,esql]
 ----
 FROM kibana_sample_data_logs

diff --git a/docs/reference/how-to/knn-search.asciidoc b/docs/reference/how-to/knn-search.asciidoc
@@ -16,10 +16,11 @@ structures. So these same recommendations also help with indexing speed.
 The default <<dense-vector-element-type,`element_type`>> is `float`. But this
 can be automatically quantized during index time through
 <<dense-vector-quantization,`quantization`>>. Quantization will reduce the
-required memory by 4x, but it will also reduce the precision of the vectors and
-increase disk usage for the field (by up to 25%). Increased disk usage is a
+required memory by 4x, 8x, or as much as 32x, but it will also reduce the precision of the vectors and
+increase disk usage for the field (by up to 25%, 12.5%, or 3.125%, respectively). Increased disk usage is a
 result of {es} storing both the quantized and the unquantized vectors.
-For example, when quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors. The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
+For example, when int8 quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors.
+The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
 
 For `float` vectors with `dim` greater than or equal to `384`, using a
 <<dense-vector-quantization,`quantized`>> index is highly recommended.
@@ -68,12 +69,23 @@ Another option is to use  <<synthetic-source,synthetic `_source`>>.
 kNN search. HNSW is a graph-based algorithm which only works efficiently when
 most vector data is held in memory. You should ensure that data nodes have at
 least enough RAM to hold the vector data and index structures. To check the
-size of the vector data, you can use the <<indices-disk-usage>> API. As a
-loose rule of thumb, and assuming the default HNSW options, the bytes used will
-be `num_vectors * 4 * (num_dimensions + 12)`. When using the `byte` <<dense-vector-element-type,`element_type`>>
-the space required will be closer to  `num_vectors * (num_dimensions + 12)`. Note that
-the required RAM is for the filesystem cache, which is separate from the Java
-heap.
+size of the vector data, you can use the <<indices-disk-usage>> API.
+
+Here are estimates for different element types and quantization levels:
++
+--
+`element_type: float`: `num_vectors * num_dimensions * 4`
+`element_type: float` with `quantization: int8`: `num_vectors * (num_dimensions + 4)`
+`element_type: float` with `quantization: int4`: `num_vectors * (num_dimensions/2 + 4)`
+`element_type: float` with `quantization: bbq`: `num_vectors * (num_dimensions/8 + 12)`
+`element_type: byte`: `num_vectors * num_dimensions`
+`element_type: bit`: `num_vectors * (num_dimensions/8)`
+--
+
+If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The
+default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`.
+
+Note that the required RAM is for the filesystem cache, which is separate from the Java heap.
 
 The data nodes should also leave a buffer for other ways that RAM is needed.
 For example your index might also include text fields and numerics, which also

diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -35,21 +35,21 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
 Now use <<semantic-search-semantic-text, semantic text>> to perform
 <<semantic-search, semantic search>> on your data.
 
-[discrete]
-[[default-enpoints]]
-=== Default {infer} endpoints
+//[discrete]
+//[[default-enpoints]]
+//=== Default {infer} endpoints
 
-Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
-The following list contains the default {infer} endpoints listed by `inference_id`:
+//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
+//The following list contains the default {infer} endpoints listed by `inference_id`:
 
-* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
-* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
+//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
+//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
 
-Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
-The API call will automatically download and deploy the model which might take a couple of minutes.
-Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
-For these models, the minimum number of allocations is `0`. 
-If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
+//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
+//The API call will automatically download and deploy the model which might take a couple of minutes.
+//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
+//For these models, the minimum number of allocations is `0`. 
+//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
 
 
 [discrete]

diff --git a/docs/reference/mapping/types/semantic-text.asciidoc b/docs/reference/mapping/types/semantic-text.asciidoc
@@ -13,47 +13,25 @@ Long passages are <<auto-text-chunking, automatically chunked>> to smaller secti
 The `semantic_text` field type specifies an inference endpoint identifier that will be used to generate embeddings.
 You can create the inference endpoint by using the <<put-inference-api>>.
 This field type and the <<query-dsl-semantic-query,`semantic` query>> type make it simpler to perform semantic search on your data.
-If you don't specify an inference endpoint, the <<infer-service-elser,ELSER service>> is used by default.
 
 Using `semantic_text`, you won't need to specify how to generate embeddings for your data, or how to index it.
 The {infer} endpoint automatically determines the embedding generation, indexing, and query to use.
 
-If you use the ELSER service, you can set up `semantic_text` with the following API request:
-
 [source,console]
 ------------------------------------------------------------
 PUT my-index-000001
-{
-  "mappings": {
-    "properties": {
-      "inference_field": {
-        "type": "semantic_text"
-      }
-    }
-  }
-}
-------------------------------------------------------------
-
-NOTE: In Serverless, you must create an {infer} endpoint using the <<put-inference-api>> and reference it when setting up `semantic_text` even if you use the ELSER service. 
-
-If you use a service other than ELSER, you must create an {infer} endpoint using the <<put-inference-api>> and reference it when setting up `semantic_text` as the following example demonstrates:
-
-[source,console]
-------------------------------------------------------------
-PUT my-index-000002
 {
   "mappings": {
     "properties": {
       "inference_field": {
         "type": "semantic_text",
-        "inference_id": "my-openai-endpoint" <1>
+        "inference_id": "my-elser-endpoint"
       }
     }
   }
 }
 ------------------------------------------------------------
 // TEST[skip:Requires inference endpoint]
-<1> The `inference_id` of the {infer} endpoint to use to generate embeddings.
 
 
 The recommended way to use semantic_text is by having dedicated {infer} endpoints for ingestion and search.
@@ -62,7 +40,7 @@ After creating dedicated {infer} endpoints for both, you can reference them usin
 
 [source,console]
 ------------------------------------------------------------
-PUT my-index-000003
+PUT my-index-000002
 {
   "mappings": {
     "properties": {

diff --git a/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc b/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc
@@ -19,6 +19,19 @@ The goal is to create search queries that enable users to:
 
 To achieve these goals we'll use different Elasticsearch queries to perform full-text search, apply filters, and combine multiple search criteria.
 
+[discrete]
+[[full-text-filter-tutorial-requirements]]
+=== Requirements
+
+You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
+Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:
+
+[source,sh]
+----
+curl -fsSL https://elastic.co/start-local | sh
+----
+// NOTCONSOLE
+
 [discrete]
 [[full-text-filter-tutorial-create-index]]
 === Step 1: Create an index

diff --git a/docs/reference/quickstart/getting-started.asciidoc b/docs/reference/quickstart/getting-started.asciidoc
@@ -15,12 +15,17 @@ You can {kibana-ref}/console-kibana.html#import-export-console-requests[convert
 ====
 
 [discrete]
-[[getting-started-prerequisites]]
-=== Prerequisites
+[[getting-started-requirements]]
+=== Requirements
 
-Before you begin, you need to have a running {es} cluster.
-The fastest way to get started is with a <<run-elasticsearch-locally,local development environment>>.
-Refer to <<elasticsearch-intro-deploy,Run {es}>> for other deployment options.
+You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
+Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:
+
+[source,sh]
+----
+curl -fsSL https://elastic.co/start-local | sh
+----
+// NOTCONSOLE
 
 ////
 [source,console]

diff --git a/docs/reference/quickstart/index.asciidoc b/docs/reference/quickstart/index.asciidoc
@@ -9,23 +9,34 @@ Unless otherwise noted, these examples will use queries written in <<query-dsl,Q
 == Requirements
 
 You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
-Get started <<run-elasticsearch-locally,locally in Docker>> , or see our <<elasticsearch-intro-deploy,other deployment options>>.
+Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:
+
+[source,sh]
+----
+curl -fsSL https://elastic.co/start-local | sh
+----
+// NOTCONSOLE
+
+Alternatively, refer to our <<elasticsearch-intro-deploy,other deployment options>>.
 
 [discrete]
 [[quickstart-list]]
 == Hands-on quick starts
 
 * <<getting-started,Basics: Index and search data using {es} APIs>>. Learn about indices, documents, and mappings, and perform a basic search using the Query DSL.
 * <<full-text-filter-tutorial, Basics: Full-text search and filtering>>. Learn about different options for querying data, including full-text search and filtering, using the Query DSL.
+* <<semantic-search-semantic-text, Semantic search>>: Learn how to create embeddings for your data with `semantic_text` and query using the `semantic` query.
+** <<semantic-text-hybrid-search, Hybrid search with `semantic_text`>>: Learn how to combine semantic search with full-text search.
+* <<bring-your-own-vectors, Bring your own dense vector embeddings>>: Learn how to ingest dense vector embeddings into {es}.
 
-[discrete]
-[[quickstart-python-links]]
-== Working in Python
+.Working in Python
+******************
 
 If you're interested in using {es} with Python, check out Elastic Search Labs:
 
 * https://github.com/elastic/elasticsearch-labs[`elasticsearch-labs` repository]: Contains a range of Python https://github.com/elastic/elasticsearch-labs/tree/main/notebooks[notebooks] and https://github.com/elastic/elasticsearch-labs/tree/main/example-apps[example apps].
 * https://www.elastic.co/search-labs/tutorials/search-tutorial/welcome[Tutorial]: This walks you through building a complete search solution with {es} from the ground up using Flask.
+******************
 
 include::getting-started.asciidoc[]
 include::full-text-filtering-tutorial.asciidoc[]
diff --git a/docs/reference/run-elasticsearch-locally.asciidoc b/docs/reference/run-elasticsearch-locally.asciidoc
@@ -42,6 +42,7 @@ To set up {es} and {kib} locally, run the `start-local` script:
 curl -fsSL https://elastic.co/start-local | sh
 ----
 // NOTCONSOLE
+// REVIEWED[OCT.28.2024]
 
 This script creates an `elastic-start-local` folder containing configuration files and starts both {es} and {kib} using Docker.
 
@@ -50,29 +51,13 @@ After running the script, you can access Elastic services at the following endpo
 * *{es}*: http://localhost:9200
 * *{kib}*: http://localhost:5601
 
-The script generates a random password for the `elastic` user, which is displayed at the end of the installation and stored in the `.env` file.
+The script generates a random password for the `elastic` user, and an API key, stored in the `.env` file.
 
 [CAUTION]
 ====
 This setup is for local testing only. HTTPS is disabled, and Basic authentication is used for {es}. For security, {es} and {kib} are accessible only through `localhost`.
 ====
 
-[discrete]
-[[api-access]]
-=== API access
-
-An API key for {es} is generated and stored in the `.env` file as `ES_LOCAL_API_KEY`.
-Use this key to connect to {es} with a https://www.elastic.co/guide/en/elasticsearch/client/index.html[programming language client] or the <<rest-apis,REST API>>.
-
-From the `elastic-start-local` folder, check the connection to Elasticsearch using `curl`:
-
-[source,sh]
-----     
-source .env
-curl $ES_LOCAL_URL -H "Authorization: ApiKey ${ES_LOCAL_API_KEY}"
-----
-// NOTCONSOLE
-
 [discrete]
 [[local-dev-additional-info]]
 === Learn more