Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add best_compression option for indices #8863

Closed
wants to merge 3 commits into from

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Dec 10, 2014

upgrades lucene to latest, and supports the BEST_COMPRESSION parameter now supported (with backwards compatibility, etc) in Lucene. This option uses deflate, tuned for highly compressible data.

index.codec::
The default value compresses stored data with LZ4 compression, but
this can be set to best_compression for a higher compression ratio,
at the expense of slower stored fields performance.

IMO its safest to implement as a named codec here, because ES already has logic to handle this correctly, and because its unrealistic to have a plethora of options to Lucene's default codec... we are practically limited in Lucene to what we can support with back compat, so I don't think we should overengineer this and add additional unnecessary plumbing.

See also:
https://issues.apache.org/jira/browse/LUCENE-5914
https://issues.apache.org/jira/browse/LUCENE-6089
https://issues.apache.org/jira/browse/LUCENE-6090
https://issues.apache.org/jira/browse/LUCENE-6100

@@ -245,6 +247,7 @@ public void write(Directory dir, SegmentInfo si, IOContext ioContext) throws IOE
}
}

// IF THIS TEST FAILS ON UPGRADE GO LOOK AT THE OldSIMockingCodec!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry :)

@s1monw
Copy link
Contributor

s1monw commented Dec 10, 2014

+1 I left some comments

@mikemccand
Copy link
Contributor

LGTM

@rmuir
Copy link
Contributor Author

rmuir commented Dec 10, 2014

I want @jpountz opinion too, when he has some time.

@jpountz
Copy link
Contributor

jpountz commented Dec 10, 2014

+1 to the named codec approach. And I see that index.codec is already a live index setting so it's easy to use, great!

@kimchy
Copy link
Member

kimchy commented Dec 11, 2014

this is great!, now part of the time base data story can also be an optional codec change and optimizing to reduce storage for "old" indices, potentially significantly.

@s1monw s1monw deleted the lucene_r1644303 branch December 11, 2014 09:37
@avleen
Copy link

avleen commented Dec 11, 2014

This is wonderful, thank you guys!

@clintongormley clintongormley changed the title add best_compression option for Lucene 5.0 Add best_compression option for Lucene 5.0 Jun 6, 2015
@clintongormley clintongormley changed the title Add best_compression option for Lucene 5.0 Add best_compression option for indices Jun 6, 2015
@clintongormley clintongormley added the :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. label Jun 6, 2015
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >feature release highlight v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants