Implement on-line reindexing #11240

celesteking · 2015-05-19T20:24:43Z

I'm an end user and I'm new to ES and I've been immediately faced with the problem of getting my data processed correctly.
The problem with ES (vs. Splunk) is that data is being indexed/analyzed first, then stored, then you can't change it. (vs. Splunk, where almost everything happens during search time, thus you can adjust your plans).

Currently, in order to test my new mapping params, I have to (using kopf plugin -- otherwise I'd throw ES immediately out of window for lack of usability):

Have my data ready -- either stored in some safe index or in local .json file on disk
delete index / index template -- /_plugin/kopf/#!/cluster
adjust index appropriately and create it -- /_plugin/kopf/#!/createIndex
Use stream2es in order to copy index from "safe place" to the index above -- cat mydata.json | stream2es stdin --target http://blah:9200/newidx/type1
OR
stream2es es --source http://blah:9200/safe_idx/type1 --target http://blah:9200/newidx/type1
Run some query, see it failing/misbehaving, read docs, see that you need mapping adjustment, GO TO step Discovery: Support local (JVM level) discovery #2.

Now imagine doing this 50 times, because you can't get ES behave properly with your data....
This is very tedious for someone who expected modern easy-to-use software.

Ideally, I want to input my new mapping, press button, and let it do the reindex automatically.
Of course, this would be useful for dev env only as on production, you'd sync the mapping changes with app schema changes.

The text was updated successfully, but these errors were encountered:

drewr · 2015-05-20T12:43:43Z

Ancillary point, but wanted to note that Logstash 1.5.0 added an Elasticsearch input. It's more flexible than stream2es since you have the full power of Logstash's filters and other outputs.

dadoonet · 2015-05-20T12:53:17Z

+1 for what @drewr said. I just wrote a blog about it. In cas it helps: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

clintongormley · 2015-05-25T13:50:13Z

@celesteking we absolutely want to implement a reindex API which will do as you describe, and a whole lot more besides. In fact this is a duplicate of a very old issue (#492). Implementation is blocked by the need for a task management framework (#6914) which will allow long running jobs to be paused, restarted, or cancelled.

Although both of these issues are old, this doesn't mean that they have been forgotten. Both are on the roadmap and we will get to them as soon as we can.

I'm going to close this as a duplicate of #492

clintongormley closed this as completed May 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement on-line reindexing #11240

Implement on-line reindexing #11240

celesteking commented May 19, 2015

drewr commented May 20, 2015

dadoonet commented May 20, 2015

clintongormley commented May 25, 2015

Implement on-line reindexing #11240

Implement on-line reindexing #11240

Comments

celesteking commented May 19, 2015

drewr commented May 20, 2015

dadoonet commented May 20, 2015

clintongormley commented May 25, 2015