-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update by query API #2230
Comments
Hi, when will it be released ? |
Hi @mdojwa Not sure when this will be included into ES. The implementations works fine, but there is one thing that we are missing. An update by query request can take a long time to complete. It would be very help ful to cancel a running update by query request. This isn't implemented yet. Perhaps in the future we might have a process api, where one can see the current running requests and via this api requests can be cancelled. |
I've packaged it (pull request #2231) as a plugin: yakaz/elasticsearch-action-updatebyquery. |
Hi, thanks for this one :) |
The plugin is now ported to ElasticSearch 0.90.0.Beta1! |
+1 |
+1 to this feature. @ofavre - Just one question here. Will all the updates be atomic or sequential ? As in would there be a situation where say half of the documents in the query was updated and the rest was not in case of process crash or restart. |
I've just packaged the code, I didn't write it. But the pull request description states that it treats document in batch, hence there definitively can be cases where the documents are help updated. |
@ofavre - Thanks Oliver. That answers my question perfectly. |
The update by query API allows all documents that with the query to be updated with a script. This feature is experimental. The update by query works a bit different than the delete by query. The update by query api translates the documents that match into bulk index / delete requests. After the bulk limit has been reached, the bulk requests created thus far will be executed. After the bulk requests have been executed the next batch of requests will be prepared and executed. This behavior continues until all documents that matched the query have been processed. The bulk size can be configured with the *action.updatebyquery.bulk_size* option in the elasticsearch configuration. For example: `action.updatebyquery.bulk_size=2500` The commit relates to issue elastic#2230 Example usage ================================================= Index an example document: curl -XPUT 'localhost:9200/twitter/tweet/1' -d ' ``` { "text" : { "message" : "you know for search" } } ``` Execute the following update by query command: curl -XPOST 'localhost:9200/twitter/_update_by_query' -d ' ``` { "query" : { "term" : { "message" : "you" } }, "script" : "ctx._source.field1 += 1" } ``` This will yield the following response: ``` { "ok" : true, "took" : 9, "total" : 1, "updated" : 1, "indices" : [ { "twitter" : { } } ] } ``` By default no bulk item responses are included in the response. If there are bulk item responses included in the response, the bulk response items are grouped by index and shard. This can be controlled by the `response` option. Options ===================================================== Additional general options in request body: * `lang`: The script language. * `params`: The script parameters. Query string options ----------------------------------------------------- * `replication`: The replication type for the delete/index operation (sync or async). * `consistency`: The write consistency of the index/delete operation. * `response`: What bulk response items to include into the update by query response. This can be set to the following: `none`, `failed` and `all`. Defaults to none. Warning: `all` can result in out of memory errors when the query results in many hits. * `routing` : Sets the routing that will be used to route the document to the relevant shard. * `timeout` : Timeout waiting for a shard to become available.
This is a much needed feature for my current project which indexes hundreds of thousands of docs and updates them on a regular basis. Is this feature available via elasticsearch-py python client? |
There is no chance that this unofficial (or not yet official) feature contributed through a plugin is exposed through a mainstream client library. |
Hello guys! As I understand, I can insert JSON data into index, query it, but can't update this JSON record? For example, I need to change structure of all JSON documents in one given collection - and I need to add extra field "is_moderated = true/false". Is it possible to do in current implementation of Elastic Search? If its not possible now - please help me to find workaround for this common task. Thanks! |
You can use the Update API to update any single document. |
+1 |
6 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
Update by query should be implemented as part of the reindex API #492 |
+1 |
1 similar comment
+1 |
Please stop to +vote spamming and let github know we need a voting feature. |
+1 |
+1 |
_update_by_query has landed in master: c7c8bb3 I'm not going close this until it's backported to 2.3 though. |
great work @nik9000! |
Closing, as the _update_by_query @nik9000 mentioned has been backported to 2.3. For those finding this later, see documentation at https://www.elastic.co/guide/en/elasticsearch/reference/2.x/docs-update-by-query.html |
Has the |
No. This wasn't a port of the original, it was its own thing. The bulk
|
The update by query API allows all documents that with the query to be updated with a script. This feature is experimental.
The update by query works a bit different than the delete by query. The update by query api translates the documents that match into bulk index / delete requests. After the bulk limit has been reached, the bulk requests created thus far will be executed. After the bulk requests have been executed the next batch of requests will be prepared and executed. This behavior continues until all documents that matched the query have been processed. The bulk size can be configured with the action.updatebyquery.bulk_size option in the elasticsearch configuration. For example:
action.updatebyquery.bulk_size=2500
Example usage
Index an example document:
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '
Execute the following update by query command:
curl -XPOST 'localhost:9200/twitter/_update_by_query' -d '
This will yield the following response:
By default no bulk item responses are included in the response. If there are bulk item responses included in the response, the bulk response items are grouped by index and shard. This can be controlled by the
response
option.Options:
Additional general options in request body:
lang
: The script language.params
: The script parameters.Query string options:
replication
: The replication type for the delete/index operation (sync or async).consistency
: The write consistency of the index/delete operation.response
: What bulk response items to include into the update by query response. This can be set to the following:none
,failed
andall
. Defaults to none. Warning:all
can result in out of memory errors when the query results in many hits.routing
: Sets the routing that will be used to route the document to the relevant shard.timeout
: Timeout waiting for a shard to become available.This is issue originates from #1607
The text was updated successfully, but these errors were encountered: