From 671027919939b8a84b1fc246d3477abb85e7db97 Mon Sep 17 00:00:00 2001
From: debadair <debadair@elastic.co>
Date: Fri, 4 Oct 2019 19:41:46 -0700
Subject: [PATCH] Reformats reindex API (#47483)

* Reformats reindex API

* Incorporated review feedback.
---
 docs/reference/docs/reindex.asciidoc | 1542 ++++++++++++--------------
 1 file changed, 719 insertions(+), 823 deletions(-)
diff --git a/docs/reference/docs/reindex.asciidoc b/docs/reference/docs/reindex.asciidoc
index 2f6d1561e67c1..009c8deb7785c 100644
--- a/docs/reference/docs/reindex.asciidoc
+++ b/docs/reference/docs/reindex.asciidoc
@@ -1,16 +1,20 @@
 [[docs-reindex]]
 === Reindex API
+++++
+<titleabbrev>Reindex</titleabbrev>
+++++
 
-IMPORTANT: Reindex requires <<mapping-source-field,`_source`>> to be enabled for
-all documents in the source index.
+Copies documents from one index to another. 
 
-IMPORTANT: Reindex does not attempt to set up the destination index.  It does
-not copy the settings of the source index.  You should set up the destination
-index prior to running a `_reindex` action, including setting up mappings, shard
-counts, replicas, etc.
+[IMPORTANT]
+=================================================
+Reindex requires <<mapping-source-field,`_source`>> to be enabled for
+all documents in the source index.
 
-The most basic form of `_reindex` just copies documents from one index to another.
-This will copy documents from the `twitter` index into the `new_twitter` index:
+You must set up the destination index before calling `_reindex`.
+Reindex does not copy the settings from the source index. 
+Mappings, shard counts, replicas, and so on must be configured ahead of time.
+=================================================
 
 [source,console]
 --------------------------------------------------
@@ -26,7 +30,7 @@ POST _reindex
 --------------------------------------------------
 // TEST[setup:big_twitter]
 
-That will return something like this:
+////
 
 [source,console-result]
 --------------------------------------------------
@@ -52,145 +56,201 @@ That will return something like this:
 --------------------------------------------------
 // TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
 
+////
+
+[[docs-reindex-api-request]]
+==== {api-request-title}
+
+`POST /_reindex`
+
+[[docs-reindex-api-desc]]
+==== {api-description-title}
+
+Extracts the document source from the source index and indexes the documents into the destination index. 
+You can copy all documents to the destination index, or reindex a subset of the documents. 
+
 Just like <<docs-update-by-query,`_update_by_query`>>, `_reindex` gets a
 snapshot of the source index but its target must be a **different** index so
 version conflicts are unlikely. The `dest` element can be configured like the
-index API to control optimistic concurrency control. Just leaving out
-`version_type` (as above) or setting it to `internal` will cause Elasticsearch
+index API to control optimistic concurrency control. Omitting
+`version_type` or setting it to `internal` causes Elasticsearch
 to blindly dump documents into the target, overwriting any that happen to have
-the same type and id:
-
-[source,console]
---------------------------------------------------
-POST _reindex
-{
-  "source": {
-    "index": "twitter"
-  },
-  "dest": {
-    "index": "new_twitter",
-    "version_type": "internal"
-  }
-}
---------------------------------------------------
-// TEST[setup:twitter]
+the same ID.
 
-Setting `version_type` to `external` will cause Elasticsearch to preserve the
+Setting `version_type` to `external` causes Elasticsearch to preserve the
 `version` from the source, create any documents that are missing, and update
 any documents that have an older version in the destination index than they do
-in the source index:
+in the source index.
 
-[source,console]
+Setting `op_type` to `create` causes `_reindex` to only create missing
+documents in the target index. All existing documents will cause a version
+conflict. 
+
+By default, version conflicts abort the `_reindex` process. 
+To continue reindexing if there are conflicts, set the `"conflicts"` request body parameter to `proceed`. 
+In this case, the response includes a count of the version conflicts that were encountered.
+Note that the handling of other error types is unaffected by the `"conflicts"` parameter.
+
+[[docs-reindex-task-api]]
+===== Running reindex asynchronously
+
+If the request contains `wait_for_completion=false`, {es}
+performs some preflight checks, launches the request, and returns a
+<<tasks,`task`>> you can use to cancel or get the status of the task. 
+{es} creates a record of this task as a document at `.tasks/task/${taskId}`. 
+When you are done with a task, you should delete the task document so 
+{es} can reclaim the space.
+
+[[docs-reindex-many-indices]]
+===== Reindexing many indices
+If you have many indices to reindex it is generally better to reindex them
+one at a time rather than using a glob pattern to pick up many indices. That
+way you can resume the process if there are any errors by removing the
+partially completed index and starting over at that index. It also makes
+parallelizing the process fairly simple: split the list of indices to reindex
+and run each list in parallel.
+
+One-off bash scripts seem to work nicely for this:
+
+[source,bash]
+----------------------------------------------------------------
+for index in i1 i2 i3 i4 i5; do
+  curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{
+    "source": {
+      "index": "'$index'"
+    },
+    "dest": {
+      "index": "'$index'-reindexed"
+    }
+  }'
+done
+----------------------------------------------------------------
+// NOTCONSOLE
+
+[[docs-reindex-throttle]]
+===== Throttling
+
+Set `requests_per_second` to any positive decimal number (`1.4`, `6`,
+`1000`, etc.) to throttle the rate at which `_reindex` issues batches of index
+operations. Requests are throttled by padding each batch with a wait time. 
+To disable throttling, set `requests_per_second` to `-1`.
+
+The throttling is done by waiting between batches so that the `scroll` that `_reindex`
+uses internally can be given a timeout that takes into account the padding.
+The padding time is the difference between the batch size divided by the
+`requests_per_second` and the time spent writing. By default the batch size is
+`1000`, so if `requests_per_second` is set to `500`:
+
+[source,txt]
 --------------------------------------------------
-POST _reindex
-{
-  "source": {
-    "index": "twitter"
-  },
-  "dest": {
-    "index": "new_twitter",
-    "version_type": "external"
-  }
-}
+target_time = 1000 / 500 per second = 2 seconds
+wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
 --------------------------------------------------
-// TEST[setup:twitter]
 
-Settings `op_type` to `create` will cause `_reindex` to only create missing
-documents in the target index. All existing documents will cause a version
-conflict:
+Since the batch is issued as a single `_bulk` request, large batch sizes 
+cause Elasticsearch to create many requests and then wait for a while before
+starting the next set. This is "bursty" instead of "smooth".
+
+[[docs-reindex-rethrottle]]
+===== Rethrottling
+
+The value of `requests_per_second` can be changed on a running reindex using
+the `_rethrottle` API:
 
 [source,console]
 --------------------------------------------------
-POST _reindex
-{
-  "source": {
-    "index": "twitter"
-  },
-  "dest": {
-    "index": "new_twitter",
-    "op_type": "create"
-  }
-}
+POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
 --------------------------------------------------
-// TEST[setup:twitter]
 
-By default, version conflicts abort the `_reindex` process. The `"conflicts"` request body
-parameter can be used to instruct `_reindex` to proceed with the next document on version conflicts.
-It is important to note that the handling of other error types is unaffected by the `"conflicts"` parameter.
-When `"conflicts": "proceed"` is set in the request body, the `_reindex` process will continue on version conflicts
-and return a count of version conflicts encountered:
+The task ID can be found using the <<tasks,tasks API>>.
+
+Just like when setting it on the Reindex API, `requests_per_second`
+can be either `-1` to disable throttling or any decimal number
+like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
+query takes effect immediately, but rethrottling that slows down the query will
+take effect after completing the current batch. This prevents scroll
+timeouts.
+
+[[docs-reindex-slice]]
+===== Slicing
+
+Reindex supports <<sliced-scroll>> to parallelize the reindexing process.
+This parallelization can improve efficiency and provide a convenient way to
+break the request down into smaller parts.
+
+NOTE: Reindexing from remote clusters does not support
+<<docs-reindex-manual-slice, manual>> or
+<<docs-reindex-automatic-slice, automatic slicing>>.
+
+[[docs-reindex-manual-slice]]
+====== Manual slicing
+Slice a reindex request manually by providing a slice id and total number of
+slices to each request:
 
 [source,console]
---------------------------------------------------
+----------------------------------------------------------------
 POST _reindex
 {
-  "conflicts": "proceed",
   "source": {
-    "index": "twitter"
+    "index": "twitter",
+    "slice": {
+      "id": 0,
+      "max": 2
+    }
   },
   "dest": {
-    "index": "new_twitter",
-    "op_type": "create"
+    "index": "new_twitter"
   }
 }
---------------------------------------------------
-// TEST[setup:twitter]
-
-You can limit the documents by adding a query to the `source`.
-This will only copy tweets made by `kimchy` into `new_twitter`:
-
-[source,console]
---------------------------------------------------
 POST _reindex
 {
   "source": {
     "index": "twitter",
-    "query": {
-      "term": {
-        "user": "kimchy"
-      }
+    "slice": {
+      "id": 1,
+      "max": 2
     }
   },
   "dest": {
     "index": "new_twitter"
   }
 }
---------------------------------------------------
-// TEST[setup:twitter]
+----------------------------------------------------------------
+// TEST[setup:big_twitter]
 
-`index` in `source` can be a list, allowing you to copy from lots 
-of sources in one request. This will copy documents from the
-`twitter` and `blog` indices:
+You can verify this works by:
 
 [source,console]
---------------------------------------------------
-POST _reindex
+----------------------------------------------------------------
+GET _refresh
+POST new_twitter/_search?size=0&filter_path=hits.total
+----------------------------------------------------------------
+// TEST[continued]
+
+which results in a sensible `total` like this one:
+
+[source,console-result]
+----------------------------------------------------------------
 {
-  "source": {
-    "index": ["twitter", "blog"]
-  },
-  "dest": {
-    "index": "all_together"
+  "hits": {
+    "total" : {
+        "value": 120,
+        "relation": "eq"
+    }
   }
 }
---------------------------------------------------
-// TEST[setup:twitter]
-// TEST[s/^/PUT blog\/post\/post1?refresh\n{"test": "foo"}\n/]
+----------------------------------------------------------------
 
-NOTE: The Reindex API makes no effort to handle ID collisions so the last
-document written will "win" but the order isn't usually predictable so it is
-not a good idea to rely on this behavior. Instead, make sure that IDs are unique
-using a script.
+[[docs-reindex-automatic-slice]]
+====== Automatic slicing
 
-It's also possible to limit the number of processed documents by setting
-`max_docs`. This will only copy a single document from `twitter` to
-`new_twitter`:
+You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
+slice on `_uid`. Use `slices` to specify the number of slices to use:
 
 [source,console]
---------------------------------------------------
-POST _reindex
+----------------------------------------------------------------
+POST _reindex?slices=5&refresh
 {
-  "max_docs": 1,
   "source": {
     "index": "twitter"
   },
@@ -198,104 +258,80 @@ POST _reindex
     "index": "new_twitter"
   }
 }
---------------------------------------------------
-// TEST[setup:twitter]
+----------------------------------------------------------------
+// TEST[setup:big_twitter]
 
-If you want a particular set of documents from the `twitter` index you'll
-need to use `sort`. Sorting makes the scroll less efficient but in some contexts
-it's worth it. If possible, prefer a more selective query to `max_docs` and `sort`.
-This will copy 10000 documents from `twitter` into `new_twitter`:
+You can also this verify works by:
 
 [source,console]
---------------------------------------------------
-POST _reindex
-{
-  "max_docs": 10000,
-  "source": {
-    "index": "twitter",
-    "sort": { "date": "desc" }
-  },
-  "dest": {
-    "index": "new_twitter"
-  }
-}
---------------------------------------------------
-// TEST[setup:twitter]
+----------------------------------------------------------------
+POST new_twitter/_search?size=0&filter_path=hits.total
+----------------------------------------------------------------
+// TEST[continued]
 
-The `source` section supports all the elements that are supported in a
-<<search-request-body,search request>>. For instance, only a subset of the
-fields from the original documents can be reindexed using `source` filtering
-as follows:
+which results in a sensible `total` like this one:
 
-[source,console]
---------------------------------------------------
-POST _reindex
+[source,console-result]
+----------------------------------------------------------------
 {
-  "source": {
-    "index": "twitter",
-    "_source": ["user", "_doc"]
-  },
-  "dest": {
-    "index": "new_twitter"
+  "hits": {
+    "total" : {
+        "value": 120,
+        "relation": "eq"
+    }
   }
 }
---------------------------------------------------
-// TEST[setup:twitter]
-
-[[reindex-scripts]]
-Like `_update_by_query`, `_reindex` supports a script that modifies the
-document. Unlike `_update_by_query`, the script is allowed to modify the
-document's metadata. This example bumps the version of the source document:
+----------------------------------------------------------------
 
-[source,console]
---------------------------------------------------
-POST _reindex
-{
-  "source": {
-    "index": "twitter"
-  },
-  "dest": {
-    "index": "new_twitter",
-    "version_type": "external"
-  },
-  "script": {
-    "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
-    "lang": "painless"
-  }
-}
---------------------------------------------------
-// TEST[setup:twitter]
+Setting `slices` to `auto` will let Elasticsearch choose the number of slices
+to use. This setting will use one slice per shard, up to a certain limit. If
+there are multiple source indices, it will choose the number of slices based
+on the index with the smallest number of shards.
 
-Just as in `_update_by_query`, you can set `ctx.op` to change the
-operation that is executed on the destination index:
+Adding `slices` to `_reindex` just automates the manual process used in the
+section above, creating sub-requests which means it has some quirks:
 
-`noop`::
-
-Set `ctx.op = "noop"` if your script decides that the document doesn't have
-to be indexed in the destination index. This no operation will be reported
-in the `noop` counter in the <<docs-reindex-response-body, response body>>.
+* You can see these requests in the <<docs-reindex-task-api,Tasks APIs>>. These
+sub-requests are "child" tasks of the task for the request with `slices`.
+* Fetching the status of the task for the request with `slices` only contains
+the status of completed slices.
+* These sub-requests are individually addressable for things like cancelation
+and rethrottling.
+* Rethrottling the request with `slices` will rethrottle the unfinished
+sub-request proportionally.
+* Canceling the request with `slices` will cancel each sub-request.
+* Due to the nature of `slices` each sub-request won't get a perfectly even
+portion of the documents. All documents will be addressed, but some slices may
+be larger than others. Expect larger slices to have a more even distribution.
+* Parameters like `requests_per_second` and `max_docs` on a request with
+`slices` are distributed proportionally to each sub-request. Combine that with
+the point above about distribution being uneven and you should conclude that
+using `max_docs` with `slices` might not result in exactly `max_docs` documents
+being reindexed.
+* Each sub-request gets a slightly different snapshot of the source index,
+though these are all taken at approximately the same time.
 
-`delete`::
+[[docs-reindex-picking-slices]]
+====== Picking the number of slices
 
-Set `ctx.op = "delete"` if your script decides that the document must be
- deleted from the destination index. The deletion will be reported in the
- `deleted` counter in the <<docs-reindex-response-body, response body>>.
+If slicing automatically, setting `slices` to `auto` will choose a reasonable
+number for most indices. If slicing manually or otherwise tuning
+automatic slicing, use these guidelines.
 
-Setting `ctx.op` to anything else will return an error, as will setting any
-other field in `ctx`.
+Query performance is most efficient when the number of `slices` is equal to the
+number of shards in the index. If that number is large (e.g. 500),
+choose a lower number as too many `slices` will hurt performance. Setting
+`slices` higher than the number of shards generally does not improve efficiency
+and adds overhead.
 
-Think of the possibilities! Just be careful; you are able to
-change:
+Indexing performance scales linearly across available resources with the
+number of slices.
 
- * `_id`
- * `_index`
- * `_version`
- * `_routing`
+Whether query or indexing performance dominates the runtime depends on the
+documents being reindexed and cluster resources.
 
-Setting `_version` to `null` or clearing it from the `ctx` map is just like not
-sending the version in an indexing request; it will cause the document to be
-overwritten in the target index regardless of the version on the target or the
-version type you use in the `_reindex` request.
+[[docs-reindex-routing]]
+===== Reindex routing
 
 By default if `_reindex` sees a document with routing then the routing is
 preserved unless it's changed by the script. You can set `routing` on the
@@ -339,6 +375,8 @@ POST _reindex
 --------------------------------------------------
 // TEST[s/^/PUT source\n/]
 
+
+
 By default `_reindex` uses scroll batches of 1000. You can change the
 batch size with the `size` field in the `source` element:
 
@@ -376,289 +414,278 @@ POST _reindex
 --------------------------------------------------
 // TEST[s/^/PUT source\n/]
 
-[float]
-[[reindex-from-remote]]
-==== Reindex from Remote
+[[docs-reindex-api-query-params]]
+==== {api-query-parms-title}
 
-Reindex supports reindexing from a remote Elasticsearch cluster:
+include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_completion]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=scroll]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=slices]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=max_docs]
+
+[[docs-reindex-api-request-body]]
+==== {api-request-body-title}
+
+`conflicts`::
+(Optional, enum) Set to `proceed` to continue reindexing even if there are conflicts. 
+Defaults to `abort`.
+
+`source`::
+`index`:::
+(Required, string) The name of the index you are copying _from_. 
+Also accepts a comma-separated list of indices to reindex from multiple sources.  
+
+`max_docs`:::
+(Optional, integer) The maximum number of documents to reindex.
+
+`query`:::
+(Optional, <<query-dsl, query object>>) Specifies the documents to reindex using the Query DSL.
+
+`remote`:::
+`host`::::
+(Optional, string) The URL for the remote instance of {es} that you want to index _from_.
+Required when indexing from remote.
+`username`::::
+(Optional, string) The username to use for authentication with the remote host.
+`password`::::
+(Optional, string) The password to use for authentication with the remote host. 
+`socket_timeout`:::: 
+(Optional, <<time-units, time units>>) The remote socket read timeout. Defaults to 30 seconds.
+`connect_timeout`:::: 
+(Optional, <<time-units, time units>>) The remote connection timeout. Defaults to 30 seconds.
+
+`size`:::
+{Optional, integer) The number of documents to index per batch. 
+Use when indexing from remote to ensure that the batches fit within the on-heap buffer, 
+which defaults to a maximum size of 100 MB. 
+
+`slice`:::
+`id`::::
+(Optional, integer) Slice ID for <<docs-reindex-manual-slice, manual slicing>>. 
+`max`::::
+(Optional, integer) Total number of slices. 
+
+`sort`:::
+(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
+Use in conjunction with `max_docs` to control what documents are reindexed.
+
+`_source`:::
+(Optional, string) If `true` reindexes all source fields. 
+Set to a list to reindex select fields. 
+Defaults to `true`. 
+
+`dest`::
+`index`:::
+(Required, string) The name of the index you are copying _to_.
+
+`version_type`:::
+(Optional, enum) The versioning to use for the indexing operation.  
+Valid values: `internal`, `external`, `external_gt`, `external_gte`. 
+See <<index-version-types>> for more information.
+
+`op_type`::: 
+(Optional, enum) Set to create to only index documents that do not already exist (put if absent). 
+Valid values: `index`, `create`. Defaults to `index`.
+
+`script`::
+`source`::: 
+(Optional, string) The script to run to update the document source or metadata when reindexing. 
+`lang`:::
+(Optional, enum) The script language: `painless`, `expression`, `mustache`, `java`. 
+For more information, see <<modules-scripting>>.
+
+
+[[docs-reindex-api-response-body]]
+==== {api-response-body-title}
+
+`took`::
+
+(integer) The total milliseconds the entire operation took.
+
+`timed_out`::
+
+{boolean) This flag is set to `true` if any of the requests executed during the
+reindex timed out.
+
+`total`::
+
+(integer) The number of documents that were successfully processed.
+
+`updated`::
+
+(integer) The number of documents that were successfully updated.
+
+`created`::
+
+(integer) The number of documents that were successfully created.
+
+`deleted`::
+
+(integer) The number of documents that were successfully deleted.
+
+`batches`::
+
+(integer) The number of scroll responses pulled back by the reindex.
+
+`noops`::
+
+(integer) The number of documents that were ignored because the script used for
+the reindex returned a `noop` value for `ctx.op`.
+
+`version_conflicts`::
+
+{integer)The number of version conflicts that reindex hit.
+
+`retries`::
+
+(integer) The number of retries attempted by reindex. `bulk` is the number of bulk
+actions retried and `search` is the number of search actions retried.
+
+`throttled_millis`::
+
+(integer) Number of milliseconds the request slept to conform to `requests_per_second`.
+
+`requests_per_second`::
+
+(integer) The number of requests per second effectively executed during the reindex.
+
+`throttled_until_millis`::
+
+(integer) This field should always be equal to zero in a `_reindex` response. It only
+has meaning when using the <<docs-reindex-task-api, Task API>>, where it
+indicates the next time (in milliseconds since epoch) a throttled request will be
+executed again in order to conform to `requests_per_second`.
+
+`failures`::
+
+(array) Array of failures if there were any unrecoverable errors during the process. If
+this is non-empty then the request aborted because of those failures. Reindex
+is implemented using batches and any failure causes the entire process to abort
+but all failures in the current batch are collected into the array. You can use
+the `conflicts` option to prevent reindex from aborting on version conflicts.
+
+[[docs-reindex-api-example]]
+==== {api-examples-title}
+
+[[docs-reindex-select-query]]
+===== Reindex select documents with a query
+
+You can limit the documents by adding a query to the `source`.
+For example, the following request only copies tweets made by `kimchy` into `new_twitter`:
 
 [source,console]
 --------------------------------------------------
 POST _reindex
 {
   "source": {
-    "remote": {
-      "host": "http://otherhost:9200",
-      "username": "user",
-      "password": "pass"
-    },
-    "index": "source",
+    "index": "twitter",
     "query": {
-      "match": {
-        "test": "data"
+      "term": {
+        "user": "kimchy"
       }
     }
   },
   "dest": {
-    "index": "dest"
+    "index": "new_twitter"
   }
 }
 --------------------------------------------------
-// TEST[setup:host]
-// TEST[s/^/PUT source\n/]
-// TEST[s/otherhost:9200",/\${host}"/]
-// TEST[s/"username": "user",//]
-// TEST[s/"password": "pass"//]
-
-The `host` parameter must contain a scheme, host, port (e.g.
-`https://otherhost:9200`), and optional path (e.g. `https://otherhost:9200/proxy`).
-The `username` and `password` parameters are optional, and when they are present `_reindex`
-will connect to the remote Elasticsearch node using basic auth. Be sure to use `https` when
-using basic auth or the password will be sent in plain text.
-There are a range of <<reindex-ssl,settings>> available to configure the behaviour of the
- `https` connection.
+// TEST[setup:twitter]
 
-Remote hosts have to be explicitly whitelisted in elasticsearch.yml using the
-`reindex.remote.whitelist` property. It can be set to a comma delimited list
-of allowed remote `host` and `port` combinations (e.g.
-`otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*`). Scheme is
-ignored by the whitelist -- only host and port are used, for example:
+[[docs-reindex-select-sort]]
+===== Reindex select documents with sort
 
+You can limit the number of processed documents by setting `max_docs`. 
+For example, this request copies a single document from `twitter` to
+`new_twitter`:
 
-[source,yaml]
+[source,console]
 --------------------------------------------------
-reindex.remote.whitelist: "otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"
+POST _reindex
+{
+  "max_docs": 1,
+  "source": {
+    "index": "twitter"
+  },
+  "dest": {
+    "index": "new_twitter"
+  }
+}
 --------------------------------------------------
+// TEST[setup:twitter]
 
-The whitelist must be configured on any nodes that will coordinate the reindex.
-
-This feature should work with remote clusters of any version of Elasticsearch
-you are likely to find. This should allow you to upgrade from any version of
-Elasticsearch to the current version by reindexing from a cluster of the old
-version.
-
-To enable queries sent to older versions of Elasticsearch the `query` parameter
-is sent directly to the remote host without validation or modification.
-
-NOTE: Reindexing from remote clusters does not support
-<<docs-reindex-manual-slice, manual>> or
-<<docs-reindex-automatic-slice, automatic slicing>>.
+You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex. 
+Sorting makes the scroll less efficient but in some contexts it's worth it. 
+If possible, it's better to use a more selective query instead of `max_docs` and `sort`.
 
-Reindexing from a remote server uses an on-heap buffer that defaults to a
-maximum size of 100mb. If the remote index includes very large documents you'll
-need to use a smaller batch size. The example below sets the batch size to `10`
-which is very, very small.
+For example, following request copies 10000 documents from `twitter` into `new_twitter`:
 
 [source,console]
 --------------------------------------------------
 POST _reindex
 {
+  "max_docs": 10000,
   "source": {
-    "remote": {
-      "host": "http://otherhost:9200"
-    },
-    "index": "source",
-    "size": 10,
-    "query": {
-      "match": {
-        "test": "data"
-      }
-    }
+    "index": "twitter",
+    "sort": { "date": "desc" }
   },
   "dest": {
-    "index": "dest"
+    "index": "new_twitter"
   }
 }
 --------------------------------------------------
-// TEST[setup:host]
-// TEST[s/^/PUT source\n/]
-// TEST[s/otherhost:9200/\${host}/]
+// TEST[setup:twitter]
 
-It is also possible to set the socket read timeout on the remote connection
-with the `socket_timeout` field and the connection timeout with the
-`connect_timeout` field. Both default to 30 seconds. This example
-sets the socket read timeout to one minute and the connection timeout to 10
-seconds:
+[[docs-reindex-multiple-indices]]
+===== Reindex from multiple indices
+
+The `index` attribute in `source` can be a list, allowing you to copy from lots 
+of sources in one request. This will copy documents from the
+`twitter` and `blog` indices:
 
 [source,console]
 --------------------------------------------------
 POST _reindex
 {
   "source": {
-    "remote": {
-      "host": "http://otherhost:9200",
-      "socket_timeout": "1m",
-      "connect_timeout": "10s"
-    },
-    "index": "source",
-    "query": {
-      "match": {
-        "test": "data"
-      }
-    }
+    "index": ["twitter", "blog"]
   },
   "dest": {
-    "index": "dest"
+    "index": "all_together"
   }
 }
 --------------------------------------------------
-// TEST[setup:host]
-// TEST[s/^/PUT source\n/]
-// TEST[s/otherhost:9200/\${host}/]
+// TEST[setup:twitter]
+// TEST[s/^/PUT blog\/post\/post1?refresh\n{"test": "foo"}\n/]
 
-[float]
-[[reindex-ssl]]
-===== Configuring SSL parameters
+NOTE: The Reindex API makes no effort to handle ID collisions so the last
+document written will "win" but the order isn't usually predictable so it is
+not a good idea to rely on this behavior. Instead, make sure that IDs are unique
+using a script.
 
-Reindex from remote supports configurable SSL settings. These must be
-specified in the `elasticsearch.yml` file, with the exception of the
-secure settings, which you add in the Elasticsearch keystore.
-It is not possible to configure SSL in the body of the `_reindex` request.
+[[docs-reindex-filter-source]]
+===== Reindex select fields with a source filter
 
-The following settings are supported:
+You can use source filtering to reindex a subset of the fields in the original documents.
+For example, the following request only reindexes the `user` and `_doc` fields of each document:
 
-`reindex.ssl.certificate_authorities`::
-List of paths to PEM encoded certificate files that should be trusted. 
-You cannot specify both `reindex.ssl.certificate_authorities` and
-`reindex.ssl.truststore.path`.
-
-`reindex.ssl.truststore.path`::
-The path to the Java Keystore file that contains the certificates to trust.
-This keystore can be in "JKS" or "PKCS#12" format.
-You cannot specify both `reindex.ssl.certificate_authorities` and
-`reindex.ssl.truststore.path`.
-
-`reindex.ssl.truststore.password`::
-The password to the truststore (`reindex.ssl.truststore.path`).
-This setting cannot be used with `reindex.ssl.truststore.secure_password`.
-
-`reindex.ssl.truststore.secure_password` (<<secure-settings,Secure>>)::
-The password to the truststore (`reindex.ssl.truststore.path`).
-This setting cannot be used with `reindex.ssl.truststore.password`.
-
-`reindex.ssl.truststore.type`::
-The type of the truststore (`reindex.ssl.truststore.path`).
-Must be either `jks` or `PKCS12`. If the truststore path ends in ".p12", ".pfx"
-or "pkcs12", this setting defaults to `PKCS12`. Otherwise, it defaults to `jks`.
-
-`reindex.ssl.verification_mode`::
-Indicates the type of verification to protect against man in the middle attacks
-and certificate forgery. 
-One of `full` (verify the hostname and the certificate path), `certificate`
-(verify the certificate path, but not the hostname) or `none` (perform no
-verification - this is strongly discouraged in production environments).
-Defaults to `full`.
-
-`reindex.ssl.certificate`::
-Specifies the path to the PEM encoded certificate (or certificate chain) to be
-used for HTTP client authentication (if required by the remote cluster)
-This setting requires that `reindex.ssl.key` also be set.
-You cannot specify both `reindex.ssl.certificate` and `reindex.ssl.keystore.path`.
-
-`reindex.ssl.key`::
-Specifies the path to the PEM encoded private key associated with the
-certificate used for client authentication (`reindex.ssl.certificate`).
-You cannot specify both `reindex.ssl.key` and `reindex.ssl.keystore.path`.
-
-`reindex.ssl.key_passphrase`::
-Specifies the passphrase to decrypt the PEM encoded private key
-(`reindex.ssl.key`) if it is encrypted.
-Cannot be used with `reindex.ssl.secure_key_passphrase`. 
-
-`reindex.ssl.secure_key_passphrase` (<<secure-settings,Secure>>)::
-Specifies the passphrase to decrypt the PEM encoded private key
-(`reindex.ssl.key`) if it is encrypted.
-Cannot be used with `reindex.ssl.key_passphrase`. 
-
-`reindex.ssl.keystore.path`::
-Specifies the path to the keystore that contains a private key and certificate
-to be used for HTTP client authentication (if required by the remote cluster).
-This keystore can be in "JKS" or "PKCS#12" format.
-You cannot specify both `reindex.ssl.key` and `reindex.ssl.keystore.path`.
-
-`reindex.ssl.keystore.type`::
-The type of the keystore (`reindex.ssl.keystore.path`). Must be either `jks` or `PKCS12`.
-If the keystore path ends in ".p12", ".pfx" or "pkcs12", this setting defaults 
-to `PKCS12`. Otherwise, it defaults to `jks`.
-
-`reindex.ssl.keystore.password`::
-The password to the keystore (`reindex.ssl.keystore.path`). This setting cannot be used 
-with `reindex.ssl.keystore.secure_password`.
-
-`reindex.ssl.keystore.secure_password` (<<secure-settings,Secure>>)::
-The password to the keystore (`reindex.ssl.keystore.path`).
-This setting cannot be used with `reindex.ssl.keystore.password`.
-
-`reindex.ssl.keystore.key_password`::
-The password for the key in the keystore (`reindex.ssl.keystore.path`).
-Defaults to the keystore password. This setting cannot be used with 
-`reindex.ssl.keystore.secure_key_password`.
-
-`reindex.ssl.keystore.secure_key_password` (<<secure-settings,Secure>>)::
-The password for the key in the keystore (`reindex.ssl.keystore.path`).
-Defaults to the keystore password. This setting cannot be used with 
-`reindex.ssl.keystore.key_password`.
-
-[float]
-==== URL Parameters
-
-In addition to the standard parameters like `pretty`, the Reindex API also
-supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`,
-`scroll`, and `requests_per_second`.
-
-Sending the `refresh` url parameter will cause all indexes to which the request
-wrote to be refreshed. This is different than the Index API's `refresh`
-parameter which causes just the shard that received the new data to be
-refreshed. Also unlike the Index API it does not support `wait_for`.
-
-If the request contains `wait_for_completion=false` then Elasticsearch will
-perform some preflight checks, launch the request, and then return a `task`
-which can be used with <<docs-reindex-task-api,Tasks APIs>>
-to cancel or get the status of the task. Elasticsearch will also create a
-record of this task as a document at `.tasks/task/${taskId}`. This is yours
-to keep or remove as you see fit. When you are done with it, delete it so
-Elasticsearch can reclaim the space it uses.
-
-`wait_for_active_shards` controls how many copies of a shard must be active
-before proceeding with the reindexing. See <<index-wait-for-active-shards,here>>
-for details. `timeout` controls how long each write request waits for unavailable
-shards to become available. Both work exactly how they work in the
-<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
-the `scroll` parameter to control how long it keeps the "search context" alive,
-(e.g. `?scroll=10m`). The default value is 5 minutes.
-
-`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
-`1000`, etc.) and throttles the rate at which `_reindex` issues batches of index
-operations by padding each batch with a wait time. The throttling can be
-disabled by setting `requests_per_second` to `-1`.
-
-The throttling is done by waiting between batches so that the `scroll` which `_reindex`
-uses internally can be given a timeout that takes into account the padding.
-The padding time is the difference between the batch size divided by the
-`requests_per_second` and the time spent writing. By default the batch size is
-`1000`, so if the `requests_per_second` is set to `500`:
-
-[source,txt]
---------------------------------------------------
-target_time = 1000 / 500 per second = 2 seconds
-wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
---------------------------------------------------
-
-Since the batch is issued as a single `_bulk` request, large batch sizes will
-cause Elasticsearch to create many requests and then wait for a while before
-starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
-
-[float]
-[[docs-reindex-response-body]]
-==== Response body
-
-//////////////////////////
 [source,console]
 --------------------------------------------------
-POST /_reindex?wait_for_completion
+POST _reindex
 {
   "source": {
-    "index": "twitter"
+    "index": "twitter",
+    "_source": ["user", "_doc"]
   },
   "dest": {
     "index": "new_twitter"
@@ -667,227 +694,8 @@ POST /_reindex?wait_for_completion
 --------------------------------------------------
 // TEST[setup:twitter]
 
-//////////////////////////
-
-The JSON response looks like this:
-
-[source,console-result]
---------------------------------------------------
-{
-  "took": 639,
-  "timed_out": false,
-  "total": 5,
-  "updated": 0,
-  "created": 5,
-  "deleted": 0,
-  "batches": 1,
-  "noops": 0,
-  "version_conflicts": 2,
-  "retries": {
-    "bulk": 0,
-    "search": 0
-  },
-  "throttled_millis": 0,
-  "requests_per_second": 1,
-  "throttled_until_millis": 0,
-  "failures": [ ]
-}
---------------------------------------------------
-// TESTRESPONSE[s/: [0-9]+/: $body.$_path/]
-
-`took`::
-
-The total milliseconds the entire operation took.
-
-`timed_out`::
-
-This flag is set to `true` if any of the requests executed during the
-reindex timed out.
-
-`total`::
-
-The number of documents that were successfully processed.
-
-`updated`::
-
-The number of documents that were successfully updated.
-
-`created`::
-
-The number of documents that were successfully created.
-
-`deleted`::
-
-The number of documents that were successfully deleted.
-
-`batches`::
-
-The number of scroll responses pulled back by the reindex.
-
-`noops`::
-
-The number of documents that were ignored because the script used for
-the reindex returned a `noop` value for `ctx.op`.
-
-`version_conflicts`::
-
-The number of version conflicts that reindex hit.
-
-`retries`::
-
-The number of retries attempted by reindex. `bulk` is the number of bulk
-actions retried and `search` is the number of search actions retried.
-
-`throttled_millis`::
-
-Number of milliseconds the request slept to conform to `requests_per_second`.
-
-`requests_per_second`::
-
-The number of requests per second effectively executed during the reindex.
-
-`throttled_until_millis`::
-
-This field should always be equal to zero in a `_reindex` response. It only
-has meaning when using the <<docs-reindex-task-api, Task API>>, where it
-indicates the next time (in milliseconds since epoch) a throttled request will be
-executed again in order to conform to `requests_per_second`.
-
-`failures`::
-
-Array of failures if there were any unrecoverable errors during the process. If
-this is non-empty then the request aborted because of those failures. Reindex
-is implemented using batches and any failure causes the entire process to abort
-but all failures in the current batch are collected into the array. You can use
-the `conflicts` option to prevent reindex from aborting on version conflicts.
-
-[float]
-[[docs-reindex-task-api]]
-==== Works with the Task API
-
-You can fetch the status of all running reindex requests with the
-<<tasks,Task API>>:
-
-[source,console]
---------------------------------------------------
-GET _tasks?detailed=true&actions=*reindex
---------------------------------------------------
-// TEST[skip:No tasks to retrieve]
-
-The response looks like:
-
-[source,console-result]
---------------------------------------------------
-{
-  "nodes" : {
-    "r1A2WoRbTwKZ516z6NEs5A" : {
-      "name" : "r1A2WoR",
-      "transport_address" : "127.0.0.1:9300",
-      "host" : "127.0.0.1",
-      "ip" : "127.0.0.1:9300",
-      "attributes" : {
-        "testattr" : "test",
-        "portsfile" : "true"
-      },
-      "tasks" : {
-        "r1A2WoRbTwKZ516z6NEs5A:36619" : {
-          "node" : "r1A2WoRbTwKZ516z6NEs5A",
-          "id" : 36619,
-          "type" : "transport",
-          "action" : "indices:data/write/reindex",
-          "status" : {    <1>
-            "total" : 6154,
-            "updated" : 3500,
-            "created" : 0,
-            "deleted" : 0,
-            "batches" : 4,
-            "version_conflicts" : 0,
-            "noops" : 0,
-            "retries": {
-              "bulk": 0,
-              "search": 0
-            },
-            "throttled_millis": 0,
-            "requests_per_second": -1,
-            "throttled_until_millis": 0
-          },
-          "description" : "",
-          "start_time_in_millis": 1535149899665,
-          "running_time_in_nanos": 5926916792,
-          "cancellable": true,
-          "headers": {}
-        }
-      }
-    }
-  }
-}
---------------------------------------------------
-
-<1> This object contains the actual status. It is identical to the response JSON
-except for the important addition of the `total` field. `total` is the total number
-of operations that the `_reindex` expects to perform. You can estimate the
-progress by adding the `updated`, `created`, and `deleted` fields. The request
-will finish when their sum is equal to the `total` field.
-
-With the task id you can look up the task directly. The following example 
-retrieves information about the task `r1A2WoRbTwKZ516z6NEs5A:36619`:
-
-[source,console]
---------------------------------------------------
-GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619
---------------------------------------------------
-// TEST[catch:missing]
-
-The advantage of this API is that it integrates with `wait_for_completion=false`
-to transparently return the status of completed tasks. If the task is completed
-and `wait_for_completion=false` was set, it will return a
-`results` or an `error` field. The cost of this feature is the document that
-`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
-you to delete that document.
-
-
-[float]
-[[docs-reindex-cancel-task-api]]
-==== Works with the Cancel Task API
-
-Any reindex can be canceled using the <<task-cancellation,Task Cancel API>>. For 
-example:
-
-[source,console]
---------------------------------------------------
-POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
---------------------------------------------------
-
-The task ID can be found using the <<tasks,Tasks API>>.
-
-Cancelation should happen quickly but might take a few seconds. The Tasks
-API will continue to list the task until it wakes to cancel itself.
-
-
-[float]
-[[docs-reindex-rethrottle]]
-==== Rethrottling
-
-The value of `requests_per_second` can be changed on a running reindex using
-the `_rethrottle` API:
-
-[source,console]
---------------------------------------------------
-POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
---------------------------------------------------
-
-The task ID can be found using the <<tasks,tasks API>>.
-
-Just like when setting it on the Reindex API, `requests_per_second`
-can be either `-1` to disable throttling or any decimal number
-like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
-query takes effect immediately, but rethrottling that slows down the query will
-take effect after completing the current batch. This prevents scroll
-timeouts.
-
-[float]
 [[docs-reindex-change-name]]
-==== Reindex to change the name of a field
+===== Reindex to change the name of a field
 
 `_reindex` can be used to build a copy of an index with renamed fields. Say you
 create an index containing documents that look like this:
@@ -948,276 +756,364 @@ which will return:
 --------------------------------------------------
 // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/]
 
-[float]
-[[docs-reindex-slice]]
-==== Slicing
+[[docs-reindex-daily-indices]]
+===== Reindex daily indices
 
-Reindex supports <<sliced-scroll>> to parallelize the reindexing process.
-This parallelization can improve efficiency and provide a convenient way to
-break the request down into smaller parts.
+You can use `_reindex` in combination with <<modules-scripting-painless, Painless>> to reindex
+daily indices to apply a new template to the existing documents.
 
-NOTE: Reindexing from remote clusters does not support
-<<docs-reindex-manual-slice, manual>> or
-<<docs-reindex-automatic-slice, automatic slicing>>.
+Assuming you have indices that contain documents like:
 
-[float]
-[[docs-reindex-manual-slice]]
-===== Manual slicing
-Slice a reindex request manually by providing a slice id and total number of
-slices to each request:
+[source,console]
+----------------------------------------------------------------
+PUT metricbeat-2016.05.30/_doc/1?refresh
+{"system.cpu.idle.pct": 0.908}
+PUT metricbeat-2016.05.31/_doc/1?refresh
+{"system.cpu.idle.pct": 0.105}
+----------------------------------------------------------------
+
+The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
+but it applies only to the newly created indices. Painless can be used to reindex
+the existing documents and apply the new template.
+
+The script below extracts the date from the index name and creates a new index
+with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
+into `metricbeat-2016.05.31-1`.
 
 [source,console]
 ----------------------------------------------------------------
 POST _reindex
 {
   "source": {
-    "index": "twitter",
-    "slice": {
-      "id": 0,
-      "max": 2
-    }
+    "index": "metricbeat-*"
   },
   "dest": {
-    "index": "new_twitter"
-  }
-}
-POST _reindex
-{
-  "source": {
-    "index": "twitter",
-    "slice": {
-      "id": 1,
-      "max": 2
-    }
+    "index": "metricbeat"
   },
-  "dest": {
-    "index": "new_twitter"
+  "script": {
+    "lang": "painless",
+    "source": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
   }
 }
 ----------------------------------------------------------------
-// TEST[setup:big_twitter]
+// TEST[continued]
 
-You can verify this works by:
+All documents from the previous metricbeat indices can now be found in the `*-1` indices.
 
 [source,console]
 ----------------------------------------------------------------
-GET _refresh
-POST new_twitter/_search?size=0&filter_path=hits.total
+GET metricbeat-2016.05.30-1/_doc/1
+GET metricbeat-2016.05.31-1/_doc/1
 ----------------------------------------------------------------
 // TEST[continued]
 
-which results in a sensible `total` like this one:
-
-[source,console-result]
-----------------------------------------------------------------
-{
-  "hits": {
-    "total" : {
-        "value": 120,
-        "relation": "eq"
-    }
-  }
-}
-----------------------------------------------------------------
+The previous method can also be used in conjunction with <<docs-reindex-change-name, changing a field name>>
+to load only the existing data into the new index and rename any fields if needed.
 
-[float]
-[[docs-reindex-automatic-slice]]
-===== Automatic slicing
+[[docs-reindex-api-subset]]
+===== Extract a random subset of an index
 
-You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
-slice on `_uid`. Use `slices` to specify the number of slices to use:
+`_reindex` can be used to extract a random subset of an index for testing:
 
 [source,console]
 ----------------------------------------------------------------
-POST _reindex?slices=5&refresh
+POST _reindex
 {
+  "max_docs": 10,
   "source": {
-    "index": "twitter"
+    "index": "twitter",
+    "query": {
+      "function_score" : {
+        "query" : { "match_all": {} },
+        "random_score" : {}
+      }
+    },
+    "sort": "_score"    <1>
   },
   "dest": {
-    "index": "new_twitter"
+    "index": "random_twitter"
   }
 }
 ----------------------------------------------------------------
 // TEST[setup:big_twitter]
 
-You can also this verify works by:
+<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
+effect unless you override the sort to `_score`.
 
-[source,console]
-----------------------------------------------------------------
-POST new_twitter/_search?size=0&filter_path=hits.total
-----------------------------------------------------------------
-// TEST[continued]
+[[reindex-scripts]]
+===== Modify documents during reindexing
 
-which results in a sensible `total` like this one:
+Like `_update_by_query`, `_reindex` supports a script that modifies the
+document. Unlike `_update_by_query`, the script is allowed to modify the
+document's metadata. This example bumps the version of the source document:
 
-[source,console-result]
-----------------------------------------------------------------
+[source,console]
+--------------------------------------------------
+POST _reindex
 {
-  "hits": {
-    "total" : {
-        "value": 120,
-        "relation": "eq"
-    }
+  "source": {
+    "index": "twitter"
+  },
+  "dest": {
+    "index": "new_twitter",
+    "version_type": "external"
+  },
+  "script": {
+    "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
+    "lang": "painless"
   }
 }
-----------------------------------------------------------------
+--------------------------------------------------
+// TEST[setup:twitter]
 
-Setting `slices` to `auto` will let Elasticsearch choose the number of slices
-to use. This setting will use one slice per shard, up to a certain limit. If
-there are multiple source indices, it will choose the number of slices based
-on the index with the smallest number of shards.
+Just as in `_update_by_query`, you can set `ctx.op` to change the
+operation that is executed on the destination index:
 
-Adding `slices` to `_reindex` just automates the manual process used in the
-section above, creating sub-requests which means it has some quirks:
+`noop`::
 
-* You can see these requests in the <<docs-reindex-task-api,Tasks APIs>>. These
-sub-requests are "child" tasks of the task for the request with `slices`.
-* Fetching the status of the task for the request with `slices` only contains
-the status of completed slices.
-* These sub-requests are individually addressable for things like cancelation
-and rethrottling.
-* Rethrottling the request with `slices` will rethrottle the unfinished
-sub-request proportionally.
-* Canceling the request with `slices` will cancel each sub-request.
-* Due to the nature of `slices` each sub-request won't get a perfectly even
-portion of the documents. All documents will be addressed, but some slices may
-be larger than others. Expect larger slices to have a more even distribution.
-* Parameters like `requests_per_second` and `max_docs` on a request with
-`slices` are distributed proportionally to each sub-request. Combine that with
-the point above about distribution being uneven and you should conclude that
-using `max_docs` with `slices` might not result in exactly `max_docs` documents
-being reindexed.
-* Each sub-request gets a slightly different snapshot of the source index,
-though these are all taken at approximately the same time.
+Set `ctx.op = "noop"` if your script decides that the document doesn't have
+to be indexed in the destination index. This no operation will be reported
+in the `noop` counter in the <<docs-reindex-api-response-body, response body>>.
 
-[float]
-[[docs-reindex-picking-slices]]
-====== Picking the number of slices
+`delete`::
 
-If slicing automatically, setting `slices` to `auto` will choose a reasonable
-number for most indices. If slicing manually or otherwise tuning
-automatic slicing, use these guidelines.
+Set `ctx.op = "delete"` if your script decides that the document must be
+ deleted from the destination index. The deletion will be reported in the
+ `deleted` counter in the <<docs-reindex-api-response-body, response body>>.
 
-Query performance is most efficient when the number of `slices` is equal to the
-number of shards in the index. If that number is large (e.g. 500),
-choose a lower number as too many `slices` will hurt performance. Setting
-`slices` higher than the number of shards generally does not improve efficiency
-and adds overhead.
+Setting `ctx.op` to anything else will return an error, as will setting any
+other field in `ctx`.
 
-Indexing performance scales linearly across available resources with the
-number of slices.
+Think of the possibilities! Just be careful; you are able to
+change:
 
-Whether query or indexing performance dominates the runtime depends on the
-documents being reindexed and cluster resources.
+ * `_id`
+ * `_index`
+ * `_version`
+ * `_routing`
 
-[float]
-==== Reindexing many indices
-If you have many indices to reindex it is generally better to reindex them
-one at a time rather than using a glob pattern to pick up many indices. That
-way you can resume the process if there are any errors by removing the
-partially completed index and starting over at that index. It also makes
-parallelizing the process fairly simple: split the list of indices to reindex
-and run each list in parallel.
+Setting `_version` to `null` or clearing it from the `ctx` map is just like not
+sending the version in an indexing request; it will cause the document to be
+overwritten in the target index regardless of the version on the target or the
+version type you use in the `_reindex` request.
 
-One-off bash scripts seem to work nicely for this:
+[[reindex-from-remote]]
+==== Reindex from remote
 
-[source,bash]
-----------------------------------------------------------------
-for index in i1 i2 i3 i4 i5; do
-  curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{
-    "source": {
-      "index": "'$index'"
+Reindex supports reindexing from a remote Elasticsearch cluster:
+
+[source,console]
+--------------------------------------------------
+POST _reindex
+{
+  "source": {
+    "remote": {
+      "host": "http://otherhost:9200",
+      "username": "user",
+      "password": "pass"
     },
-    "dest": {
-      "index": "'$index'-reindexed"
+    "index": "source",
+    "query": {
+      "match": {
+        "test": "data"
+      }
     }
-  }'
-done
-----------------------------------------------------------------
-// NOTCONSOLE
+  },
+  "dest": {
+    "index": "dest"
+  }
+}
+--------------------------------------------------
+// TEST[setup:host]
+// TEST[s/^/PUT source\n/]
+// TEST[s/otherhost:9200",/\${host}"/]
+// TEST[s/"username": "user",//]
+// TEST[s/"password": "pass"//]
 
-[float]
-==== Reindex daily indices
+The `host` parameter must contain a scheme, host, port (e.g.
+`https://otherhost:9200`), and optional path (e.g. `https://otherhost:9200/proxy`).
+The `username` and `password` parameters are optional, and when they are present `_reindex`
+will connect to the remote Elasticsearch node using basic auth. Be sure to use `https` when
+using basic auth or the password will be sent in plain text.
+There are a range of <<reindex-ssl,settings>> available to configure the behaviour of the
+ `https` connection.
 
-Notwithstanding the above advice, you can use `_reindex` in combination with
-<<modules-scripting-painless, Painless>> to reindex daily indices to apply
-a new template to the existing documents.
+Remote hosts have to be explicitly whitelisted in elasticsearch.yml using the
+`reindex.remote.whitelist` property. It can be set to a comma delimited list
+of allowed remote `host` and `port` combinations (e.g.
+`otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*`). Scheme is
+ignored by the whitelist -- only host and port are used, for example:
 
-Assuming you have indices consisting of documents as follows:
 
-[source,console]
-----------------------------------------------------------------
-PUT metricbeat-2016.05.30/_doc/1?refresh
-{"system.cpu.idle.pct": 0.908}
-PUT metricbeat-2016.05.31/_doc/1?refresh
-{"system.cpu.idle.pct": 0.105}
-----------------------------------------------------------------
+[source,yaml]
+--------------------------------------------------
+reindex.remote.whitelist: "otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"
+--------------------------------------------------
 
-The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
-but it applies only to the newly created indices. Painless can be used to reindex
-the existing documents and apply the new template.
+The whitelist must be configured on any nodes that will coordinate the reindex.
 
-The script below extracts the date from the index name and creates a new index
-with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
-into `metricbeat-2016.05.31-1`.
+This feature should work with remote clusters of any version of Elasticsearch
+you are likely to find. This should allow you to upgrade from any version of
+Elasticsearch to the current version by reindexing from a cluster of the old
+version.
+
+To enable queries sent to older versions of Elasticsearch the `query` parameter
+is sent directly to the remote host without validation or modification.
+
+NOTE: Reindexing from remote clusters does not support
+<<docs-reindex-manual-slice, manual>> or
+<<docs-reindex-automatic-slice, automatic slicing>>.
+
+Reindexing from a remote server uses an on-heap buffer that defaults to a
+maximum size of 100mb. If the remote index includes very large documents you'll
+need to use a smaller batch size. The example below sets the batch size to `10`
+which is very, very small.
 
 [source,console]
-----------------------------------------------------------------
+--------------------------------------------------
 POST _reindex
 {
   "source": {
-    "index": "metricbeat-*"
+    "remote": {
+      "host": "http://otherhost:9200"
+    },
+    "index": "source",
+    "size": 10,
+    "query": {
+      "match": {
+        "test": "data"
+      }
+    }
   },
   "dest": {
-    "index": "metricbeat"
-  },
-  "script": {
-    "lang": "painless",
-    "source": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
+    "index": "dest"
   }
 }
-----------------------------------------------------------------
-// TEST[continued]
-
-All documents from the previous metricbeat indices can now be found in the `*-1` indices.
-
-[source,console]
-----------------------------------------------------------------
-GET metricbeat-2016.05.30-1/_doc/1
-GET metricbeat-2016.05.31-1/_doc/1
-----------------------------------------------------------------
-// TEST[continued]
-
-The previous method can also be used in conjunction with <<docs-reindex-change-name, changing a field name>>
-to load only the existing data into the new index and rename any fields if needed.
-
-[float]
-==== Extracting a random subset of an index
+--------------------------------------------------
+// TEST[setup:host]
+// TEST[s/^/PUT source\n/]
+// TEST[s/otherhost:9200/\${host}/]
 
-`_reindex` can be used to extract a random subset of an index for testing:
+It is also possible to set the socket read timeout on the remote connection
+with the `socket_timeout` field and the connection timeout with the
+`connect_timeout` field. Both default to 30 seconds. This example
+sets the socket read timeout to one minute and the connection timeout to 10
+seconds:
 
 [source,console]
-----------------------------------------------------------------
+--------------------------------------------------
 POST _reindex
 {
-  "max_docs": 10,
   "source": {
-    "index": "twitter",
+    "remote": {
+      "host": "http://otherhost:9200",
+      "socket_timeout": "1m",
+      "connect_timeout": "10s"
+    },
+    "index": "source",
     "query": {
-      "function_score" : {
-        "query" : { "match_all": {} },
-        "random_score" : {}
+      "match": {
+        "test": "data"
       }
-    },
-    "sort": "_score"    <1>
+    }
   },
   "dest": {
-    "index": "random_twitter"
+    "index": "dest"
   }
 }
-----------------------------------------------------------------
-// TEST[setup:big_twitter]
+--------------------------------------------------
+// TEST[setup:host]
+// TEST[s/^/PUT source\n/]
+// TEST[s/otherhost:9200/\${host}/]
 
-<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
-effect unless you override the sort to `_score`.
+[[reindex-ssl]]
+===== Configuring SSL parameters
+
+Reindex from remote supports configurable SSL settings. These must be
+specified in the `elasticsearch.yml` file, with the exception of the
+secure settings, which you add in the Elasticsearch keystore.
+It is not possible to configure SSL in the body of the `_reindex` request.
+
+The following settings are supported:
+
+`reindex.ssl.certificate_authorities`::
+List of paths to PEM encoded certificate files that should be trusted. 
+You cannot specify both `reindex.ssl.certificate_authorities` and
+`reindex.ssl.truststore.path`.
+
+`reindex.ssl.truststore.path`::
+The path to the Java Keystore file that contains the certificates to trust.
+This keystore can be in "JKS" or "PKCS#12" format.
+You cannot specify both `reindex.ssl.certificate_authorities` and
+`reindex.ssl.truststore.path`.
+
+`reindex.ssl.truststore.password`::
+The password to the truststore (`reindex.ssl.truststore.path`).
+This setting cannot be used with `reindex.ssl.truststore.secure_password`.
+
+`reindex.ssl.truststore.secure_password` (<<secure-settings,Secure>>)::
+The password to the truststore (`reindex.ssl.truststore.path`).
+This setting cannot be used with `reindex.ssl.truststore.password`.
+
+`reindex.ssl.truststore.type`::
+The type of the truststore (`reindex.ssl.truststore.path`).
+Must be either `jks` or `PKCS12`. If the truststore path ends in ".p12", ".pfx"
+or "pkcs12", this setting defaults to `PKCS12`. Otherwise, it defaults to `jks`.
+
+`reindex.ssl.verification_mode`::
+Indicates the type of verification to protect against man in the middle attacks
+and certificate forgery. 
+One of `full` (verify the hostname and the certificate path), `certificate`
+(verify the certificate path, but not the hostname) or `none` (perform no
+verification - this is strongly discouraged in production environments).
+Defaults to `full`.
+
+`reindex.ssl.certificate`::
+Specifies the path to the PEM encoded certificate (or certificate chain) to be
+used for HTTP client authentication (if required by the remote cluster)
+This setting requires that `reindex.ssl.key` also be set.
+You cannot specify both `reindex.ssl.certificate` and `reindex.ssl.keystore.path`.
+
+`reindex.ssl.key`::
+Specifies the path to the PEM encoded private key associated with the
+certificate used for client authentication (`reindex.ssl.certificate`).
+You cannot specify both `reindex.ssl.key` and `reindex.ssl.keystore.path`.
+
+`reindex.ssl.key_passphrase`::
+Specifies the passphrase to decrypt the PEM encoded private key
+(`reindex.ssl.key`) if it is encrypted.
+Cannot be used with `reindex.ssl.secure_key_passphrase`. 
+
+`reindex.ssl.secure_key_passphrase` (<<secure-settings,Secure>>)::
+Specifies the passphrase to decrypt the PEM encoded private key
+(`reindex.ssl.key`) if it is encrypted.
+Cannot be used with `reindex.ssl.key_passphrase`. 
+
+`reindex.ssl.keystore.path`::
+Specifies the path to the keystore that contains a private key and certificate
+to be used for HTTP client authentication (if required by the remote cluster).
+This keystore can be in "JKS" or "PKCS#12" format.
+You cannot specify both `reindex.ssl.key` and `reindex.ssl.keystore.path`.
+
+`reindex.ssl.keystore.type`::
+The type of the keystore (`reindex.ssl.keystore.path`). Must be either `jks` or `PKCS12`.
+If the keystore path ends in ".p12", ".pfx" or "pkcs12", this setting defaults 
+to `PKCS12`. Otherwise, it defaults to `jks`.
+
+`reindex.ssl.keystore.password`::
+The password to the keystore (`reindex.ssl.keystore.path`). This setting cannot be used 
+with `reindex.ssl.keystore.secure_password`.
+
+`reindex.ssl.keystore.secure_password` (<<secure-settings,Secure>>)::
+The password to the keystore (`reindex.ssl.keystore.path`).
+This setting cannot be used with `reindex.ssl.keystore.password`.
+
+`reindex.ssl.keystore.key_password`::
+The password for the key in the keystore (`reindex.ssl.keystore.path`).
+Defaults to the keystore password. This setting cannot be used with 
+`reindex.ssl.keystore.secure_key_password`.
+
+`reindex.ssl.keystore.secure_key_password` (<<secure-settings,Secure>>)::
+The password for the key in the keystore (`reindex.ssl.keystore.path`).
+Defaults to the keystore password. This setting cannot be used with 
+`reindex.ssl.keystore.key_password`.