Skip to content

Commit

Permalink
[DOCS] Add documentation for near real-time search (#57560) (#58133)
Browse files Browse the repository at this point in the history
* Adding documentation for near real-time search.

* Adding link to NRT topic and clarifying some text.

* Adding diagrams and incorporating changes from David T.
  • Loading branch information
Adam Locke authored Jun 15, 2020
1 parent 5057b57 commit a537b7c
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 13 deletions.
14 changes: 7 additions & 7 deletions docs/reference/docs/refresh.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ visible at some point after the request returns.

[float]
==== Choosing which setting to use

Unless you have a good reason to wait for the change to become visible always
use `refresh=false`, or, because that is the default, just leave the `refresh`
parameter out of the URL. That is the simplest and fastest choice.
// tag::refresh-default[]
Unless you have a good reason to wait for the change to become visible, always
use `refresh=false` (the default setting). The simplest and fastest choice is to omit the `refresh` parameter from the URL.

If you absolutely must have the changes made by a request visible synchronously
with the request then you must pick between putting more load on
Elasticsearch (`true`) and waiting longer for the response (`wait_for`). Here
are a few points that should inform that decision:
with the request, you must choose between putting more load on
Elasticsearch (`true`) and waiting longer for the response (`wait_for`).
// end::refresh-default[]
Here are a few points that should inform that decision:

* The more changes being made to the index the more work `wait_for` saves
compared to `true`. In the case that the index is only changed once every
Expand Down
Binary file added docs/reference/images/lucene-in-memory-buffer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 3 additions & 4 deletions docs/reference/intro.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ the {stack}. {ls} and {beats} facilitate collecting, aggregating, and
enriching your data and storing it in {es}. {kib} enables you to
interactively explore, visualize, and share insights into your data and manage
and monitor the stack. {es} is where the indexing, search, and analysis
magic happen.
magic happens.

{es} provides real-time search and analytics for all types of data. Whether you
{es} provides near real-time search and analytics for all types of data. Whether you
have structured or unstructured text, numerical data, or geospatial data,
{es} can efficiently store and index it in a way that supports fast searches.
You can go far beyond simple data retrieval and aggregate information to discover
Expand Down Expand Up @@ -46,8 +46,7 @@ as JSON documents. When you have multiple {es} nodes in a cluster, stored
documents are distributed across the cluster and can be accessed immediately
from any node.

When a document is stored, it is indexed and fully searchable in near
real-time--within 1 second. {es} uses a data structure called an
When a document is stored, it is indexed and fully searchable in <<near-real-time,near real-time>>--within 1 second. {es} uses a data structure called an
inverted index that supports very fast full-text searches. An inverted index
lists every unique word that appears in any document and identifies all of the
documents each word occurs in.
Expand Down
6 changes: 4 additions & 2 deletions docs/reference/search/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Depending on your data, you can use a query to get answers to questions like:
* What pages on my website contain a specific word or phrase?
* What processes on my server take longer than 500 milliseconds to respond?
* What users on my network ran `regsvr32.exe` within the last week?
* How many of my products have a price greater than $20?
* How many of my products have a price greater than $20?

A _search_ consists of one or more queries that are combined and sent to {es}.
Documents that match a search's queries are returned in the _hits_, or
Expand All @@ -29,11 +29,13 @@ a specific number of results.
=== In this section

* <<run-a-search>>
* <<near-real-time>>
* <<modules-cross-cluster-search>>
* <<async-search-intro>>

--

include::run-a-search.asciidoc[]
include::{es-repo-dir}/search/near-real-time.asciidoc[]
include::{es-repo-dir}/async-search.asciidoc[]
include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
25 changes: 25 additions & 0 deletions docs/reference/search/near-real-time.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[[near-real-time]]
== Near real-time search
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search?

Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared.

Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<<img-pre-refresh,Figure 1>>) are written to a new segment (<<img-post-refresh,Figure 2>>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.

[[img-pre-refresh]]
.A Lucene index with new documents in the in-memory buffer
image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"]

Lucene allows new segments to be written and opened, making the documents they contain visible to search ​without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance.

[[img-post-refresh]]
.The buffer contents are written to a segment, which is searchable, but is not yet committed
image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"]

In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means:

* Waiting for the refresh interval
* Setting the <<docs-refresh,?refresh>> option
* Using the <<indices-refresh,Refresh API>> to explicitly complete a refresh (`POST _refresh`)

By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe.

0 comments on commit a537b7c

Please sign in to comment.