Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement on-line reindexing #11240

Closed
celesteking opened this issue May 19, 2015 · 3 comments
Closed

Implement on-line reindexing #11240

celesteking opened this issue May 19, 2015 · 3 comments

Comments

@celesteking
Copy link

I'm an end user and I'm new to ES and I've been immediately faced with the problem of getting my data processed correctly.
The problem with ES (vs. Splunk) is that data is being indexed/analyzed first, then stored, then you can't change it. (vs. Splunk, where almost everything happens during search time, thus you can adjust your plans).

Currently, in order to test my new mapping params, I have to (using kopf plugin -- otherwise I'd throw ES immediately out of window for lack of usability):

  1. Have my data ready -- either stored in some safe index or in local .json file on disk
  2. delete index / index template -- /_plugin/kopf/#!/cluster
  3. adjust index appropriately and create it -- /_plugin/kopf/#!/createIndex
  4. Use stream2es in order to copy index from "safe place" to the index above -- cat mydata.json | stream2es stdin --target http://blah:9200/newidx/type1
    OR
    stream2es es --source http://blah:9200/safe_idx/type1 --target http://blah:9200/newidx/type1
  5. Run some query, see it failing/misbehaving, read docs, see that you need mapping adjustment, GO TO step Discovery: Support local (JVM level) discovery #2.

Now imagine doing this 50 times, because you can't get ES behave properly with your data....
This is very tedious for someone who expected modern easy-to-use software.

Ideally, I want to input my new mapping, press button, and let it do the reindex automatically.
Of course, this would be useful for dev env only as on production, you'd sync the mapping changes with app schema changes.

@drewr
Copy link
Contributor

drewr commented May 20, 2015

Ancillary point, but wanted to note that Logstash 1.5.0 added an Elasticsearch input. It's more flexible than stream2es since you have the full power of Logstash's filters and other outputs.

@dadoonet
Copy link
Member

+1 for what @drewr said. I just wrote a blog about it. In cas it helps: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

@clintongormley
Copy link
Contributor

@celesteking we absolutely want to implement a reindex API which will do as you describe, and a whole lot more besides. In fact this is a duplicate of a very old issue (#492). Implementation is blocked by the need for a task management framework (#6914) which will allow long running jobs to be paused, restarted, or cancelled.

Although both of these issues are old, this doesn't mean that they have been forgotten. Both are on the roadmap and we will get to them as soon as we can.

I'm going to close this as a duplicate of #492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants