Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose version type to Update & Delete by query #18750

Closed
wants to merge 1 commit into from

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jun 6, 2016

This commit adds a version_type option to the request body for both Update-by-query and Delete-by-query. The option can take the value internal (default) and force. This last one can help to update or delete documents that have been created with an external version number equal to zero.

closes #16654

This commit adds a `version_type` option to the request body for both Update-by-query and Delete-by-query. The option can take the value `internal` (default) and `force`. This last one can help to update or delete documents that have been created with an external version number equal to zero.

closes elastic#16654
@tlrx
Copy link
Member Author

tlrx commented Jun 6, 2016

@nik9000 Do you want to review this one?

@nik9000
Copy link
Member

nik9000 commented Jun 6, 2016

We did a ton of talking about version_type in #15125 and I think we came to the conclusion that exposing force was too dangerous, mostly because of the versioning semantics around replicas.

The trouble with update and delete by query is that they sort of imply internal versioning - you are letting Elasticsearch pick the next version. If you were syncing from an external source and you marked that source as "no longer visible" then it should get a version bump and whatever mechanism you have to syncing it with Elasticsearch should just pick it up and sync again, deleting it based on the change in the external source.

@tlrx
Copy link
Member Author

tlrx commented Jun 7, 2016

Thanks for the precisions Nik. I understand this decision but I'm still kind of mixed about this.

Elasticsearch allows the documents to have an external version potentially starting from 0 where internal versionning starts from 1. Depending of the value of the version number, Update-By-Query and Delete-By-Query will work (in case of version > 0) or report a version conflict (for version = 0). We know the implementation details and why it works like this but from a end-user point of view this might looks like a bug.

I'm OK to not expose version_type in Update-By-Query for the reason you gave, but I don't see why one could not delete documents with external versionning and version equal to 0 using Delete-By-Query. Using the last version of the document to delete should work as expected, I think. FORCE version type has been added for such case in the Delete API and we could expose it at least in DBQ too.

Anyway, this pull request can be changed to either:
a) Expose FORCE for Delete-By-Query only so that documents with version = 0 can still be deleted without any syncing with external source
b) Do not expose version_type but add documentation about how UBQ/DBQ APis handles documents externally versioned

but we must not let the situation as it is.

@bleskes
Copy link
Contributor

bleskes commented Jun 7, 2016

The main thing that makes this complicated is that strictly speaking, delete is an operation just like any other one and therefore the only safe version types are external and internal. Under certain conditions, when people guarantee that there is no concurrency, you can use external_gte (which was added for deletes) and force (added for controlled fixing of issues with external versions where people wanted to repair their data). Strictly speaking, when you use external source, we expect people to reindex from source. Of course, sometime ES is just faster :) . I also agree it's an unfortunate side effect that delete by query doesn't work if people use external version and use version 0.

As @nik9000 already said, when we discussed the potential options with delete/update by query we decided to not get into all of this complexity. Long term, when we have seq# and they are used for replication versioning, we can expose this type of operations safely. For now I would opt for option b you describe - just doc things properly.

@clintongormley
Copy link
Contributor

I also agree it's an unfortunate side effect that delete by query doesn't work if people use external version and use version 0.

This could possibly be worked around by allowing ctx._version to be settable in an update-by-query script. Setting it to null should mean that the document is overwritten without checking the version... maybe?

@tlrx
Copy link
Member Author

tlrx commented Jun 30, 2016

Closed in favor of #19180 which documents this special case.

@tlrx tlrx closed this Jun 30, 2016
@tlrx tlrx removed the review label Jun 30, 2016
@tlrx tlrx deleted the expose-version-type-force branch June 30, 2016 13:54
@lcawl lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES 2.2.0 delete by query plugin fails for data with external versioning
5 participants