Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ways to force-merge down to 1 segment #31742

Closed
jpountz opened this issue Jul 2, 2018 · 4 comments
Closed

Add ways to force-merge down to 1 segment #31742

jpountz opened this issue Jul 2, 2018 · 4 comments
Labels
blocker >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.5.0

Comments

@jpountz
Copy link
Contributor

jpountz commented Jul 2, 2018

Lucene just added a change to the TieredMergePolicy (default merge policy) which prevents it from force-merging indices down to 1 segment if this wouldn't meet the requirement that all segments must be less than the maximum segment size, 5GB by default. For instance if you have a 23GB shard and force-merge it down to 1 segment, you would en up with something like 5 or 6 segments.

There has been feedback that the ability to force-merge to 1 segment is important for time-series data, so we discussed several ideas:

  • override Lucene's merge policy to revert to the previous behavior
  • add a new separate API to freeze indices (wrong name because it currently means something else, but you get the idea) and make clear that it should only be called on read-only data
  • keep Lucene's new behaviour, document the maximum size parameter (index.merge.policy.max_merged_segment), and tell users that if they actually want to merge down to 1 segment then they first need to update that setting. Maybe we could try to detect when an index becomes read-only and automatically update that setting.

In any case there is still the option to implement the first option in 6.x and only introduce the change in master so that the break is introduced in a major release.

@jpountz jpountz added >bug blocker discuss :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.5.0 labels Jul 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jpountz
Copy link
Contributor Author

jpountz commented Jul 13, 2018

This was discussed in the Search team and in Fixit Friday.

  • The problem is easier for index lifecycle management and curator, which we manage and could be fixed eg. to set the max merged segment size before calling _forcemerge.
  • Detecting that the index is read-only based on the timestamp of the last write is not safe.
  • @bleskes suggested to respect the max number of segments parameter if writes are blocked, and use Lucene's new default behavior otherwise (and document it). This would play well with ILM which needs to block writes anyway for the shrink operation. It isn't perfect as it is possible to enable writing again later, but might be good enough, especially as some upcoming changes in Lucene might make it better reclaim deletes on large segments in the future.

@jpountz jpountz removed the discuss label Jul 16, 2018
@colings86 colings86 added discuss and removed discuss labels Jul 23, 2018
@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@pcsanwald
Copy link
Contributor

@jpountz can you comment on if you think this is still a blocker for the 6.x series? it seems like either the blocker label is incorrect, or, the 6.6 label

@jpountz
Copy link
Contributor Author

jpountz commented Jan 2, 2019

Thanks @pcsanwald, this issue is no longer relevant indeed, it was addressed via #32291.

@jpountz jpountz closed this as completed Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.5.0
Projects
None yet
Development

No branches or pull requests

4 participants