Core: remove delete-by-query #10067

mikemccand · 2015-03-11T23:58:54Z

This method is exceptionally trappy. I think we should remove it and add it back only once we can do it safely.

It secretly does a refresh with each request, which means it makes changes visible if maybe you didn't want to (#3593).

It also means if you call this while concurrently index you can easily blow up the number of segments in the shard (merging can't keep up), leading to all sorts of problems e.g. OOME (##6025), super slow indexing, etc.

Finally, it can cause inconsistent docs indexed in primary vs. replica since the query is re-executed on the replica possibly matching different documents.

Until just recently we failed to throttle delete-by-query when merges were falling behind (#9986)

I think we should deprecate (remove in 2.0) for now, and only once we have task management should we add it back without all these traps (#7052).

simonbrandhof · 2015-03-12T18:56:30Z

If delete-by-query is removed, then how would you truncate an index ? Deleting and creating again the index requires to know metadata, which is not always possible.

mikemccand · 2015-03-12T20:54:11Z

If delete-by-query is removed, then how would you truncate an index

Deleting the entire index is much faster than delete-by-query of all docs in the index. But if that's not an option I suggest you run scan+scroll of your query to get all hits and then do bulk-request deletes of the returned ids.

s1monw · 2015-03-12T22:16:22Z

If delete-by-query is removed, then how would you truncate an index ? Deleting and creating again the index requires to know metadata, which is not always possible.

In this case I recommend to just read the metadata from the index and put it in your create request, then delete the index and you are done with it.

s1monw · 2015-03-12T22:17:28Z

@mikemccand I am not sure if we should do this in 1.5 but deprecating it I think is a no brainer?

mikemccand · 2015-03-12T22:50:57Z

@mikemccand I am not sure if we should do this in 1.5 but deprecating it I think is a no brainer?

Yeah my plan is deprecate in 1.5 and remove in 2.0.

mikemccand · 2015-03-23T08:42:30Z

I started to remove this in 2.0 but got stuck because we use delete-by-query internally to allow deleting an entire type from an index ... once we remove this in 2.0 (#8877) then we can do this issue.

mikemccand · 2015-03-24T19:39:10Z

Since #8877 is done I went and removed all delete-by-query logic but ... then I realized: what if a user has a DBQ in the translog on a shard and upgrades to 2.0? Must we support this back-compat case? Can we "require" that users flush (clears the translog) before upgrading?

If not ... I need to put back the DBQ logic for translog and Engine.

s1monw · 2015-03-24T20:25:31Z

If not ... I need to put back the DBQ logic for translog and Engine.

I am afraid we have to.

rjernst · 2015-03-24T20:54:02Z

But we can still remove the public API right? This can just be an internal backcompat implementation detail?

mikemccand · 2015-03-24T20:58:24Z

But we can still remove the public API right? This can just be an internal backcompat implementation detail?

Yeah ... I also need to fix the static back compat indices to leave some DBQs in the translog and confirm on upgrade the docs are in fact deleted ...

mikemccand · 2015-03-25T16:28:34Z

If not ... I need to put back the DBQ logic for translog and Engine.

I am afraid we have to.

Maybe we can simply detect when this ("DBQ in translog on upgrade") happens and refuse to start the shard, saying that you must go back to the prior version and flush the shard?

rjernst · 2015-03-25T21:54:14Z

That seems reasonable? This is an extreme edge case. If they are upgrading from 1.5+, the flush is done automatically on shutdown. And we already recommend running flush before any shutdown in general. So the only way for them to hit this is to have done a DBQ since the last time a flush happened (max 30 mins by default?), and then to have shutdown without doing a flush.

mikemccand · 2015-03-26T09:13:57Z

That seems reasonable? This is an extreme edge case. If they are upgrading from 1.5+, the flush is done automatically on shutdown.

Alas, the flush is only done in 1.5+ if a recovery was not also in process when the node was shut down, because a recovery blocks flushes (separately, we need to fix translog so it's ref counted instead: the two operations really should be independent).

Anyway, I think for this we should just keep the "replay DBQ on upgrade" ... the DBQ code that we need to keep around to do this is very small: just Engine, TransLog, IndexShard.

kimchy · 2015-03-26T15:44:28Z

++ on fixing current behavior of delete by query. I think we all agree on the fix, which is to do a scan search and bulk deletes, instead of using delete-by-query, since thats safer and more consistent with our replication model. It might be a long running task, but relatively lightweight, so I don't think we need to wait for the task management infra if we plan to remove delete-by-query in 2.0, and then add it back when we have it, but just go ahead and replace the current implementation with scan/delete and not deprecate it.

bleskes · 2015-03-26T15:50:58Z

+1 on giving the user an alternative, albeit a slowish instead of removing the current implementation first and then later adding a managed one.

mikemccand · 2015-03-26T18:40:54Z

I agree it would be great to "delete old way and add new way" in one go...

so I don't think we need to wait for the task management infra if we plan to remove delete-by-query in 2.0

That would be great, if it really is OK to just "be slow" sometimes (when the query deleted many docs). I guess there is precedent here: the optimize API (with wait_for_merge=true) can clearly take a very long time...

Somehow, whichever node receives the DBQ request, must open a client connection (to itself?), run the scan search, scroll through the hits and make bulk delete requests. I think #10251 is an example of how to do this ...

I'll add a comment on #7052 that we don't need to wait for task management API, and block this issue on it.

mikemccand · 2015-03-26T18:42:35Z

This issue is blocked on #7052.

s1monw · 2015-03-26T20:11:35Z

I don't think this need to be blocked on #7052 We can remove this now and add the other one as a followup. Stuff like this should not be blocked syntactic sugar

mikemccand · 2015-03-26T22:43:14Z

I don't think this need to be blocked on #7052

OK I opened #10288

s1monw · 2015-05-28T16:01:36Z

@mikemccand can we close this?

mikemccand · 2015-05-28T17:00:14Z

Yes.

mikemccand added v2.0.0-beta1 v1.5.0 labels Mar 11, 2015

s1monw added >breaking v1.6.0 and removed v1.5.0 labels Mar 13, 2015

This was referenced Mar 25, 2015

Core: delete-by-query fails to replay from translog < 1.0.0 Beta2 #10262

Closed

Tests: add delete-by-query into translog in OldIndexBackwardsCompatibilityTests #10266

Closed

mikemccand mentioned this issue Mar 26, 2015

Reimplement delete-by-query as a bulk request #7052

Closed

mikemccand mentioned this issue Mar 26, 2015

Remove current delete-by-query implementation #10288

Closed

clintongormley mentioned this issue Apr 26, 2015

Shard stuck Initializing #10764

Closed

mikemccand mentioned this issue Apr 28, 2015

Remove core delete-by-query implementation #10859

Closed

mikemccand closed this as completed May 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: remove delete-by-query #10067

Core: remove delete-by-query #10067

mikemccand commented Mar 11, 2015

simonbrandhof commented Mar 12, 2015

mikemccand commented Mar 12, 2015

s1monw commented Mar 12, 2015

s1monw commented Mar 12, 2015

mikemccand commented Mar 12, 2015

mikemccand commented Mar 23, 2015

mikemccand commented Mar 24, 2015

s1monw commented Mar 24, 2015

rjernst commented Mar 24, 2015

mikemccand commented Mar 24, 2015

mikemccand commented Mar 25, 2015

rjernst commented Mar 25, 2015

mikemccand commented Mar 26, 2015

kimchy commented Mar 26, 2015

bleskes commented Mar 26, 2015

mikemccand commented Mar 26, 2015

mikemccand commented Mar 26, 2015

s1monw commented Mar 26, 2015

mikemccand commented Mar 26, 2015

s1monw commented May 28, 2015

mikemccand commented May 28, 2015

Core: remove delete-by-query #10067

Core: remove delete-by-query #10067

Comments

mikemccand commented Mar 11, 2015

simonbrandhof commented Mar 12, 2015

mikemccand commented Mar 12, 2015

s1monw commented Mar 12, 2015

s1monw commented Mar 12, 2015

mikemccand commented Mar 12, 2015

mikemccand commented Mar 23, 2015

mikemccand commented Mar 24, 2015

s1monw commented Mar 24, 2015

rjernst commented Mar 24, 2015

mikemccand commented Mar 24, 2015

mikemccand commented Mar 25, 2015

rjernst commented Mar 25, 2015

mikemccand commented Mar 26, 2015

kimchy commented Mar 26, 2015

bleskes commented Mar 26, 2015

mikemccand commented Mar 26, 2015

mikemccand commented Mar 26, 2015

s1monw commented Mar 26, 2015

mikemccand commented Mar 26, 2015

s1monw commented May 28, 2015

mikemccand commented May 28, 2015