-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Adding documentation for near real-time search. * Adding link to NRT topic and clarifying some text. * Adding diagrams and incorporating changes from David T.
- Loading branch information
Adam Locke
authored
Jun 15, 2020
1 parent
5057b57
commit a537b7c
Showing
6 changed files
with
39 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
[[near-real-time]] | ||
== Near real-time search | ||
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search? | ||
|
||
Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared. | ||
|
||
Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<<img-pre-refresh,Figure 1>>) are written to a new segment (<<img-post-refresh,Figure 2>>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file. | ||
|
||
[[img-pre-refresh]] | ||
.A Lucene index with new documents in the in-memory buffer | ||
image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"] | ||
|
||
Lucene allows new segments to be written and opened, making the documents they contain visible to search without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance. | ||
|
||
[[img-post-refresh]] | ||
.The buffer contents are written to a segment, which is searchable, but is not yet committed | ||
image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"] | ||
|
||
In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means: | ||
|
||
* Waiting for the refresh interval | ||
* Setting the <<docs-refresh,?refresh>> option | ||
* Using the <<indices-refresh,Refresh API>> to explicitly complete a refresh (`POST _refresh`) | ||
|
||
By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe. |