Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update by query API #2230

Closed
martijnvg opened this issue Sep 3, 2012 · 67 comments
Closed

Update by query API #2230

martijnvg opened this issue Sep 3, 2012 · 67 comments
Assignees
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >feature

Comments

@martijnvg
Copy link
Member

The update by query API allows all documents that with the query to be updated with a script. This feature is experimental.
The update by query works a bit different than the delete by query. The update by query api translates the documents that match into bulk index / delete requests. After the bulk limit has been reached, the bulk requests created thus far will be executed. After the bulk requests have been executed the next batch of requests will be prepared and executed. This behavior continues until all documents that matched the query have been processed. The bulk size can be configured with the action.updatebyquery.bulk_size option in the elasticsearch configuration. For example:
action.updatebyquery.bulk_size=2500

Example usage

Index an example document:
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '

{
   "text" : {
      "message" : "you know for search"
    }
}

Execute the following update by query command:
curl -XPOST 'localhost:9200/twitter/_update_by_query' -d '

{
    "query" : {
        "term" : {
            "message" : "you"
        }
    },
    "script" : "ctx._source.field1 += 1"
}

This will yield the following response:

{
  "ok" : true,
  "took" : 9,
  "total" : 1,
  "updated" : 1,
  "indices" : [ {
    "twitter" : { }
  } ]
}

By default no bulk item responses are included in the response. If there are bulk item responses included in the response, the bulk response items are grouped by index and shard. This can be controlled by the response option.

Options:

Additional general options in request body:

  • lang: The script language.
  • params: The script parameters.

Query string options:

  • replication: The replication type for the delete/index operation (sync or async).
  • consistency: The write consistency of the index/delete operation.
  • response: What bulk response items to include into the update by query response. This can be set to the following: none, failed and all. Defaults to none. Warning: all can result in out of memory errors when the query results in many hits.
  • routing : Sets the routing that will be used to route the document to the relevant shard.
  • timeout : Timeout waiting for a shard to become available.

This is issue originates from #1607

@mdojwa
Copy link

mdojwa commented Dec 8, 2012

Hi, when will it be released ?

@martijnvg
Copy link
Member Author

Hi @mdojwa Not sure when this will be included into ES. The implementations works fine, but there is one thing that we are missing. An update by query request can take a long time to complete. It would be very help ful to cancel a running update by query request. This isn't implemented yet. Perhaps in the future we might have a process api, where one can see the current running requests and via this api requests can be cancelled.

ofavre pushed a commit to yakaz/elasticsearch-action-updatebyquery that referenced this issue Feb 13, 2013
@ofavre
Copy link

ofavre commented Feb 13, 2013

I've packaged it (pull request #2231) as a plugin: yakaz/elasticsearch-action-updatebyquery.
Have fun.

@mdojwa
Copy link

mdojwa commented Feb 13, 2013

Hi, thanks for this one :)

@ofavre
Copy link

ofavre commented Mar 1, 2013

The plugin is now ported to ElasticSearch 0.90.0.Beta1!

@neogenix
Copy link

neogenix commented Mar 4, 2013

+1

@Vineeth-Mohan
Copy link

+1 to this feature. @ofavre - Just one question here. Will all the updates be atomic or sequential ? As in would there be a situation where say half of the documents in the query was updated and the rest was not in case of process crash or restart.

@ofavre
Copy link

ofavre commented Mar 24, 2013

I've just packaged the code, I didn't write it. But the pull request description states that it treats document in batch, hence there definitively can be cases where the documents are help updated.
For small changes, maybe ensuring that all the documents fit inside one batch can help. The batch size is controller by action.updatebyquery.bulk_size.

@Vineeth-Mohan
Copy link

@ofavre - Thanks Oliver. That answers my question perfectly.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Mar 24, 2014
The update by query API allows all documents that with the query to be updated with a script.  This feature is experimental.
The update by query works a bit different than the delete by query. The update by query api translates the documents that match into bulk index / delete requests. After the bulk limit has been reached, the bulk requests created thus far will be executed. After the bulk requests have been executed the next batch of requests will be prepared and executed. This behavior continues until all documents that matched the query have been processed. The bulk size can be configured with the *action.updatebyquery.bulk_size* option in the elasticsearch configuration. For example:
`action.updatebyquery.bulk_size=2500`

The commit relates to issue elastic#2230

Example usage
=================================================
Index an example document:
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '
```
{
   "text" : {
      "message" : "you know for search"
    }
}
```

Execute the following update by query command:
curl -XPOST 'localhost:9200/twitter/_update_by_query' -d '
```
{
    "query" : {
        "term" : {
            "message" : "you"
        }
    },
    "script" : "ctx._source.field1 += 1"
}
```
This will yield the following response:
```
{
  "ok" : true,
  "took" : 9,
  "total" : 1,
  "updated" : 1,
  "indices" : [ {
    "twitter" : { }
  } ]
}
```
By default no bulk item responses are included in the response. If there are bulk item responses included in the response, the bulk response items are grouped by index and shard. This can be controlled by the `response` option.

Options
=====================================================
Additional general options in request body:
* `lang`:  The script language.
* `params`: The script parameters.

Query string options
-----------------------------------------------------
* `replication`:  The replication type for the delete/index operation (sync or async).
* `consistency`: The write consistency of the index/delete operation.
* `response`:  What bulk response items to include into the update by query response. This can be set to the following: `none`, `failed` and `all`. Defaults to none. Warning: `all` can result in out of memory errors when the query results in many hits.
* `routing` : Sets the routing that will be used to route the document to the relevant shard.
* `timeout` : Timeout waiting for a shard to become available.
@parth-j-gandhi
Copy link

This is a much needed feature for my current project which indexes hundreds of thousands of docs and updates them on a regular basis. Is this feature available via elasticsearch-py python client?

@ofavre
Copy link

ofavre commented Jun 4, 2014

There is no chance that this unofficial (or not yet official) feature contributed through a plugin is exposed through a mainstream client library.
And unless this client library lets you write arbitrary calls easily, you're good to go with urllib and the like.

@1st
Copy link

1st commented Jun 25, 2014

Hello guys!

As I understand, I can insert JSON data into index, query it, but can't update this JSON record? For example, I need to change structure of all JSON documents in one given collection - and I need to add extra field "is_moderated = true/false". Is it possible to do in current implementation of Elastic Search?

If its not possible now - please help me to find workaround for this common task. Thanks!

@ofavre
Copy link

ofavre commented Jun 27, 2014

You can use the Update API to update any single document.
To update several documents at once, the good news is the pull request associated to this issue has been packaged as a plugin : yakaz/elasticsearch-action-updatebyquery.
Using it you can run any script you like and modify the document structure, and it will be re-indexed.

@xelllee
Copy link

xelllee commented Aug 4, 2015

+1

6 similar comments
@IUnknownPtr
Copy link

+1

@abhas9
Copy link

abhas9 commented Aug 16, 2015

+1

@rtrujill007
Copy link

+1

@chibenwa
Copy link

+1

@knoxxs
Copy link

knoxxs commented Aug 26, 2015

+1

@ogorun
Copy link

ogorun commented Sep 16, 2015

+1

@clintongormley
Copy link
Contributor

Update by query should be implemented as part of the reindex API #492

@davidchipping
Copy link

+1

1 similar comment
@xpepermint
Copy link

+1

@sDaniel
Copy link

sDaniel commented Oct 14, 2015

Please stop to +vote spamming and let github know we need a voting feature.
isaacs/github#9

@KurtPreston
Copy link

+1

@nik9000 nik9000 self-assigned this Nov 26, 2015
@pavel-main
Copy link

+1

@nik9000
Copy link
Member

nik9000 commented Mar 1, 2016

_update_by_query has landed in master: c7c8bb3

I'm not going close this until it's backported to 2.3 though.

@lukens
Copy link

lukens commented Mar 1, 2016

great work @nik9000!

@eskibars
Copy link
Contributor

Closing, as the _update_by_query @nik9000 mentioned has been backported to 2.3. For those finding this later, see documentation at https://www.elastic.co/guide/en/elasticsearch/reference/2.x/docs-update-by-query.html

@DaveVdE
Copy link

DaveVdE commented Jun 17, 2016

Has the response query parameter been integrated with this feature too? I can't seem to get it to work.

@nik9000
Copy link
Member

nik9000 commented Jun 17, 2016

No. This wasn't a port of the original, it was its own thing. The bulk
responses are only available for failures that abort the action. We talked
a lot about including more responses but decided that it wasn't worth it
given that they could consume too much memory.
On Jun 17, 2016 5:22 AM, "Dave Van den Eynde" [email protected]
wrote:

Has the response query parameter been integrated with this feature too? I
can't seem to get it to work.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2230 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AANLos0oaH6K9DI0bGcoAeVpTyoHe8yJks5qMmdzgaJpZM4AIsk8
.

@lcawl lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >feature
Projects
None yet
Development

No branches or pull requests