Skip to content

Commit

Permalink
Merge branch 'main' into fix_bwc_indexing_failures
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticmachine authored Oct 28, 2024
2 parents fb29905 + da108fc commit 9c3798c
Show file tree
Hide file tree
Showing 76 changed files with 626 additions and 503 deletions.
2 changes: 1 addition & 1 deletion .buildkite/packer_cache.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ for branch in "${branches[@]}"; do
fi

export JAVA_HOME="$HOME/.java/$ES_BUILD_JAVA"
"checkout/${branch}/gradlew" --project-dir "$CHECKOUT_DIR" --parallel -s resolveAllDependencies -Dorg.gradle.warning.mode=none -DisCI
"checkout/${branch}/gradlew" --project-dir "$CHECKOUT_DIR" --parallel -s resolveAllDependencies -Dorg.gradle.warning.mode=none -DisCI --max-workers=4
rm -rf "checkout/${branch}"
done
5 changes: 5 additions & 0 deletions docs/changelog/115715.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 115715
summary: Avoid `catch (Throwable t)` in `AmazonBedrockStreamingChatProcessor`
area: Machine Learning
type: bug
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/115721.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 115721
summary: Change Reindexing metrics unit from millis to seconds
area: Reindex
type: enhancement
issues: []
40 changes: 35 additions & 5 deletions docs/reference/esql/esql-kibana.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -171,14 +171,44 @@ FROM kibana_sample_data_logs
[[esql-kibana-time-filter]]
=== Time filtering

To display data within a specified time range, use the
{kibana-ref}/set-time-filter.html[time filter]. The time filter is only enabled
when the indices you're querying have a field called `@timestamp`.
To display data within a specified time range, you can use the standard time filter,
custom time parameters, or a WHERE command.

If your indices do not have a timestamp field called `@timestamp`, you can limit
the time range using the <<esql-where>> command and the <<esql-now>> function.
[discrete]
==== Standard time filter
The standard {kibana-ref}/set-time-filter.html[time filter] is enabled
when the indices you're querying have a field named `@timestamp`.

[discrete]
==== Custom time parameters
If your indices do not have a field named `@timestamp`, you can use
the `?_tstart` and `?_tend` parameters to specify a time range. These parameters
work with any timestamp field and automatically sync with the {kibana-ref}/set-time-filter.html[time filter].

[source,esql]
----
FROM my_index
| WHERE custom_timestamp >= ?_tstart AND custom_timestamp < ?_tend
----

You can also use the `?_tstart` and `?_tend` parameters with the <<esql-bucket>> function
to create auto-incrementing time buckets in {esql} <<esql-kibana-visualizations,visualizations>>.
For example:

[source,esql]
----
FROM kibana_sample_data_logs
| STATS average_bytes = AVG(bytes) BY BUCKET(@timestamp, 50, ?_tstart, ?_tend)
----

This example uses `50` buckets, which is the maximum number of buckets.

[discrete]
==== WHERE command
You can also limit the time range using the <<esql-where>> command and the <<esql-now>> function.
For example, if the timestamp field is called `timestamp`, to query the last 15
minutes of data:

[source,esql]
----
FROM kibana_sample_data_logs
Expand Down
24 changes: 12 additions & 12 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,21 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
Now use <<semantic-search-semantic-text, semantic text>> to perform
<<semantic-search, semantic search>> on your data.

[discrete]
[[default-enpoints]]
=== Default {infer} endpoints
//[discrete]
//[[default-enpoints]]
//=== Default {infer} endpoints

Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
The following list contains the default {infer} endpoints listed by `inference_id`:
//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
//The following list contains the default {infer} endpoints listed by `inference_id`:

* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)

Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
The API call will automatically download and deploy the model which might take a couple of minutes.
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
For these models, the minimum number of allocations is `0`.
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
//The API call will automatically download and deploy the model which might take a couple of minutes.
//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
//For these models, the minimum number of allocations is `0`.
//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.


[discrete]
Expand Down
26 changes: 2 additions & 24 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,47 +13,25 @@ Long passages are <<auto-text-chunking, automatically chunked>> to smaller secti
The `semantic_text` field type specifies an inference endpoint identifier that will be used to generate embeddings.
You can create the inference endpoint by using the <<put-inference-api>>.
This field type and the <<query-dsl-semantic-query,`semantic` query>> type make it simpler to perform semantic search on your data.
If you don't specify an inference endpoint, the <<infer-service-elser,ELSER service>> is used by default.

Using `semantic_text`, you won't need to specify how to generate embeddings for your data, or how to index it.
The {infer} endpoint automatically determines the embedding generation, indexing, and query to use.

If you use the ELSER service, you can set up `semantic_text` with the following API request:

[source,console]
------------------------------------------------------------
PUT my-index-000001
{
"mappings": {
"properties": {
"inference_field": {
"type": "semantic_text"
}
}
}
}
------------------------------------------------------------

NOTE: In Serverless, you must create an {infer} endpoint using the <<put-inference-api>> and reference it when setting up `semantic_text` even if you use the ELSER service.

If you use a service other than ELSER, you must create an {infer} endpoint using the <<put-inference-api>> and reference it when setting up `semantic_text` as the following example demonstrates:

[source,console]
------------------------------------------------------------
PUT my-index-000002
{
"mappings": {
"properties": {
"inference_field": {
"type": "semantic_text",
"inference_id": "my-openai-endpoint" <1>
"inference_id": "my-elser-endpoint"
}
}
}
}
------------------------------------------------------------
// TEST[skip:Requires inference endpoint]
<1> The `inference_id` of the {infer} endpoint to use to generate embeddings.


The recommended way to use semantic_text is by having dedicated {infer} endpoints for ingestion and search.
Expand All @@ -62,7 +40,7 @@ After creating dedicated {infer} endpoints for both, you can reference them usin

[source,console]
------------------------------------------------------------
PUT my-index-000003
PUT my-index-000002
{
"mappings": {
"properties": {
Expand Down
13 changes: 13 additions & 0 deletions docs/reference/quickstart/full-text-filtering-tutorial.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ The goal is to create search queries that enable users to:

To achieve these goals we'll use different Elasticsearch queries to perform full-text search, apply filters, and combine multiple search criteria.

[discrete]
[[full-text-filter-tutorial-requirements]]
=== Requirements

You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:

[source,sh]
----
curl -fsSL https://elastic.co/start-local | sh
----
// NOTCONSOLE

[discrete]
[[full-text-filter-tutorial-create-index]]
=== Step 1: Create an index
Expand Down
15 changes: 10 additions & 5 deletions docs/reference/quickstart/getting-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,17 @@ You can {kibana-ref}/console-kibana.html#import-export-console-requests[convert
====

[discrete]
[[getting-started-prerequisites]]
=== Prerequisites
[[getting-started-requirements]]
=== Requirements

Before you begin, you need to have a running {es} cluster.
The fastest way to get started is with a <<run-elasticsearch-locally,local development environment>>.
Refer to <<elasticsearch-intro-deploy,Run {es}>> for other deployment options.
You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:

[source,sh]
----
curl -fsSL https://elastic.co/start-local | sh
----
// NOTCONSOLE

////
[source,console]
Expand Down
19 changes: 15 additions & 4 deletions docs/reference/quickstart/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,34 @@ Unless otherwise noted, these examples will use queries written in <<query-dsl,Q
== Requirements

You'll need a running {es} cluster, together with {kib} to use the Dev Tools API Console.
Get started <<run-elasticsearch-locally,locally in Docker>> , or see our <<elasticsearch-intro-deploy,other deployment options>>.
Run the following command in your terminal to set up a <<run-elasticsearch-locally,single-node local cluster in Docker>>:

[source,sh]
----
curl -fsSL https://elastic.co/start-local | sh
----
// NOTCONSOLE

Alternatively, refer to our <<elasticsearch-intro-deploy,other deployment options>>.

[discrete]
[[quickstart-list]]
== Hands-on quick starts

* <<getting-started,Basics: Index and search data using {es} APIs>>. Learn about indices, documents, and mappings, and perform a basic search using the Query DSL.
* <<full-text-filter-tutorial, Basics: Full-text search and filtering>>. Learn about different options for querying data, including full-text search and filtering, using the Query DSL.
* <<semantic-search-semantic-text, Semantic search>>: Learn how to create embeddings for your data with `semantic_text` and query using the `semantic` query.
** <<semantic-text-hybrid-search, Hybrid search with `semantic_text`>>: Learn how to combine semantic search with full-text search.
* <<bring-your-own-vectors, Bring your own dense vector embeddings>>: Learn how to ingest dense vector embeddings into {es}.

[discrete]
[[quickstart-python-links]]
== Working in Python
.Working in Python
******************
If you're interested in using {es} with Python, check out Elastic Search Labs:
* https://github.com/elastic/elasticsearch-labs[`elasticsearch-labs` repository]: Contains a range of Python https://github.com/elastic/elasticsearch-labs/tree/main/notebooks[notebooks] and https://github.com/elastic/elasticsearch-labs/tree/main/example-apps[example apps].
* https://www.elastic.co/search-labs/tutorials/search-tutorial/welcome[Tutorial]: This walks you through building a complete search solution with {es} from the ground up using Flask.
******************

include::getting-started.asciidoc[]
include::full-text-filtering-tutorial.asciidoc[]
19 changes: 2 additions & 17 deletions docs/reference/run-elasticsearch-locally.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ To set up {es} and {kib} locally, run the `start-local` script:
curl -fsSL https://elastic.co/start-local | sh
----
// NOTCONSOLE
// REVIEWED[OCT.28.2024]

This script creates an `elastic-start-local` folder containing configuration files and starts both {es} and {kib} using Docker.

Expand All @@ -50,29 +51,13 @@ After running the script, you can access Elastic services at the following endpo
* *{es}*: http://localhost:9200
* *{kib}*: http://localhost:5601

The script generates a random password for the `elastic` user, which is displayed at the end of the installation and stored in the `.env` file.
The script generates a random password for the `elastic` user, and an API key, stored in the `.env` file.

[CAUTION]
====
This setup is for local testing only. HTTPS is disabled, and Basic authentication is used for {es}. For security, {es} and {kib} are accessible only through `localhost`.
====

[discrete]
[[api-access]]
=== API access

An API key for {es} is generated and stored in the `.env` file as `ES_LOCAL_API_KEY`.
Use this key to connect to {es} with a https://www.elastic.co/guide/en/elasticsearch/client/index.html[programming language client] or the <<rest-apis,REST API>>.

From the `elastic-start-local` folder, check the connection to Elasticsearch using `curl`:

[source,sh]
----
source .env
curl $ES_LOCAL_URL -H "Authorization: ApiKey ${ES_LOCAL_API_KEY}"
----
// NOTCONSOLE

[discrete]
[[local-dev-additional-info]]
=== Learn more
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,45 @@ This tutorial uses the <<inference-example-elser,`elser` service>> for demonstra
[[semantic-text-requirements]]
==== Requirements

This tutorial uses the <<infer-service-elser,ELSER service>> for demonstration, which is created automatically as needed.
To use the `semantic_text` field type with an {infer} service other than ELSER, you must create an inference endpoint using the <<put-inference-api>>.
To use the `semantic_text` field type, you must have an {infer} endpoint deployed in
your cluster using the <<put-inference-api>>.

NOTE: In Serverless, you must create an {infer} endpoint using the <<put-inference-api>> and reference it when setting up `semantic_text` even if you use the ELSER service.
[discrete]
[[semantic-text-infer-endpoint]]
==== Create the {infer} endpoint

Create an inference endpoint by using the <<put-inference-api>>:

[source,console]
------------------------------------------------------------
PUT _inference/sparse_embedding/my-elser-endpoint <1>
{
"service": "elser", <2>
"service_settings": {
"adaptive_allocations": { <3>
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
},
"num_threads": 1
}
}
------------------------------------------------------------
// TEST[skip:TBD]
<1> The task type is `sparse_embedding` in the path as the `elser` service will
be used and ELSER creates sparse vectors. The `inference_id` is
`my-elser-endpoint`.
<2> The `elser` service is used in this example.
<3> This setting enables and configures {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations].
Adaptive allocations make it possible for ELSER to automatically scale up or down resources based on the current load on the process.

[NOTE]
====
You might see a 502 bad gateway error in the response when using the {kib} Console.
This error usually just reflects a timeout, while the model downloads in the background.
You can check the download progress in the {ml-app} UI.
If using the Python client, you can set the `timeout` parameter to a higher value.
====

[discrete]
[[semantic-text-index-mapping]]
Expand All @@ -41,7 +75,8 @@ PUT semantic-embeddings
"mappings": {
"properties": {
"content": { <1>
"type": "semantic_text" <2>
"type": "semantic_text", <2>
"inference_id": "my-elser-endpoint" <3>
}
}
}
Expand All @@ -50,14 +85,18 @@ PUT semantic-embeddings
// TEST[skip:TBD]
<1> The name of the field to contain the generated embeddings.
<2> The field to contain the embeddings is a `semantic_text` field.
Since no `inference_id` is provided, the <<infer-service-elser,ELSER service>> is used by default.
To use a different {infer} service, you must create an {infer} endpoint first using the <<put-inference-api>> and then specify it in the `semantic_text` field mapping using the `inference_id` parameter.
<3> The `inference_id` is the inference endpoint you created in the previous step.
It will be used to generate the embeddings based on the input text.
Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text.

[NOTE]
====
If you're using web crawlers or connectors to generate indices, you have to <<indices-put-mapping,update the index mappings>> for these indices to include the `semantic_text` field.
Once the mapping is updated, you'll need to run a full web crawl or a full connector sync.
This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
If you're using web crawlers or connectors to generate indices, you have to
<<indices-put-mapping,update the index mappings>> for these indices to
include the `semantic_text` field. Once the mapping is updated, you'll need to run
a full web crawl or a full connector sync. This ensures that all existing
documents are reprocessed and updated with the new semantic embeddings,
enabling semantic search on the updated data.
====


Expand Down
Loading

0 comments on commit 9c3798c

Please sign in to comment.