-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental support for zstd compression. #3577
Add experimental support for zstd compression. #3577
Conversation
❌ Gradle Check failure 9d3cbe0dbd8e41daef93385b6c77aa2ca6c95d6b |
Small thing so far:
|
@willyborankin I have added signature files and licenses for the dependencies. |
❌ Gradle Check failure 759642278598f5f17fe12af5a6efa052e06e9972 |
@reta the PR is labelled "experimental" just because we'd like to make it available for users together with the snapshotting compression that @willyborankin has been doing. The codecs work well and have significant performance gains. |
I really don't know anything about this stuff, deferring to @nknize, at a high level, yes. Overall this PR looks good to me. |
@mulugetam You should get it to green either way, fix DCO, etc. |
3de2f6b
to
6999a54
Compare
❌ Gradle Check failure 3de2f6b7a00dfe4f700f2d45e314f7a1ca53c7fa |
❌ Gradle Check failure 6999a5496e1955e4c632044cee86cddcdb50654b |
✅ Gradle Check success e95ae663dfd0cda0550d5bb7ffd589590b87312a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think license headers need updates, otherwise LGTM.
server/src/main/java/org/apache/lucene/codecs/experimental/LZ4CompressionMode.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
❌ Gradle Check failure 1a558e751816fb87ddf54b3e529c65ab1a8c3691 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments.
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/Lucene92CustomCodec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/ZstdCompressionMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/ZstdCompressionMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/LZ4CompressionMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/LZ4CompressionMode.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
...er/src/main/java/org/apache/lucene/codecs/experimental/Lucene90CustomStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
IMHO I think we can add them as a new experimental feature with default behaviour as it works now. |
@willyborankin have fixed the issues your raised. Thanks :-) |
✅ Gradle Check success 9c02870d30fb33ae000402d2ace647beba069188 |
✅ Gradle Check success 566217cba00fa09825b71c0b8e2dcff414a1c371 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a small question.
server/src/main/java/org/apache/lucene/codecs/experimental/Lucene92CustomCodec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/lucene/codecs/experimental/LZ4CompressionMode.java
Outdated
Show resolved
Hide resolved
✅ Gradle Check success 11376fadc35b0e83b90bbfe4a5c621963e0c48f9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server/src/main/resources/org/opensearch/bootstrap/security.policy
Outdated
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
I'm going to merge on 🟢 |
@nknize we need you to dismiss your review/approve to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good movement here. Still feels open but at least it's sandboxed. LGFN...
sandbox/modules/custom-codecs/src/main/plugin-metadata/plugin-security.policy
Show resolved
Hide resolved
@dblock Only the |
Separate issue from this PR. Care to open one? |
I tagged this as backport to 2.x, watch that pass/fail/may need manual labor. |
* Add experimental support for zstd and lz4 (native) compression. Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: #3354. Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary). Signed-off-by: Mulugeta Mammo <[email protected]> * Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix DCO and and issues with CodecTests.java. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix forbidden api violation error for lz4-java. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix license headers. Remove and fix unnecessary fields. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix magic numbers. Use more restrictive access modifiers. Signed-off-by: Mulugeta Mammo <[email protected]> * Use protected access modifier for Zstd and LZ4 compression mode classes. Signed-off-by: Mulugeta Mammo <[email protected]> * Allow negative compression levels for zstd. Use more restrictive access modifiers. Signed-off-by: Mulugeta Mammo <[email protected]> * Use a more restrictive permission for loading zstd-jni and lz4-java libraries. Signed-off-by: Mulugeta Mammo <[email protected]> * Rename a file (follow a consistent version naming convention). Signed-off-by: Mulugeta Mammo <[email protected]> * Refactor and create a new custom-codecs sandbox module. Signed-off-by: Mulugeta Mammo <[email protected]> * Remove blank lines. Signed-off-by: Mulugeta Mammo <[email protected]> * Restore Lucene92CustomCodec to extend from FilterCodec. Signed-off-by: Mulugeta Mammo <[email protected]> * Make use of the compressionLevel argument. Signed-off-by: Mulugeta Mammo <[email protected]> * Make Lucene92CustomCodec abstract and use a package-private access modifier. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix lint errors. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix the description for the custom-codecs plugin. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix wildcard import and improve documentation. Signed-off-by: Mulugeta Mammo <[email protected]> * Access control exception fixed. Removed lz4-java support for now. - PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged. - The zstd-jni PR is merged since version 1.5.4-1. - The lz4-java support temporarily removed until the PR gets merged. Signed-off-by: Mulugeta Mammo <[email protected]> * Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Fix lint issue with plugin-security. - Remove thridPartyAudit that was there for supporting lz4-java. Signed-off-by: Mulugeta Mammo <[email protected]> * Add test cases for compression/decompression. Other minor changes. - add test cases for compression/decompression. - rename package. - add a CHANGELOG entry. - add more checks for signed integer arithmetic. Signed-off-by: Mulugeta Mammo <[email protected]> * Remove ES grant in plugin-security.policy. Fix minor javadoc issue. - Remove ES grant in plugin-security.policy file. - Replace @link and @see to fix javadoc error. Signed-off-by: Mulugeta Mammo <[email protected]> * Upgrade jettison version to 1.5.4. Signed-off-by: Mulugeta Mammo <[email protected]> * Update SHA for jettison 1.5.4. Signed-off-by: Mulugeta Mammo <[email protected]> --------- Signed-off-by: Mulugeta Mammo <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> (cherry picked from commit f071c9b) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add experimental support for zstd and lz4 (native) compression. Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: opensearch-project#3354. Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary). Signed-off-by: Mulugeta Mammo <[email protected]> * Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix DCO and and issues with CodecTests.java. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix forbidden api violation error for lz4-java. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix license headers. Remove and fix unnecessary fields. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix magic numbers. Use more restrictive access modifiers. Signed-off-by: Mulugeta Mammo <[email protected]> * Use protected access modifier for Zstd and LZ4 compression mode classes. Signed-off-by: Mulugeta Mammo <[email protected]> * Allow negative compression levels for zstd. Use more restrictive access modifiers. Signed-off-by: Mulugeta Mammo <[email protected]> * Use a more restrictive permission for loading zstd-jni and lz4-java libraries. Signed-off-by: Mulugeta Mammo <[email protected]> * Rename a file (follow a consistent version naming convention). Signed-off-by: Mulugeta Mammo <[email protected]> * Refactor and create a new custom-codecs sandbox module. Signed-off-by: Mulugeta Mammo <[email protected]> * Remove blank lines. Signed-off-by: Mulugeta Mammo <[email protected]> * Restore Lucene92CustomCodec to extend from FilterCodec. Signed-off-by: Mulugeta Mammo <[email protected]> * Make use of the compressionLevel argument. Signed-off-by: Mulugeta Mammo <[email protected]> * Make Lucene92CustomCodec abstract and use a package-private access modifier. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix lint errors. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix the description for the custom-codecs plugin. Signed-off-by: Mulugeta Mammo <[email protected]> * Fix wildcard import and improve documentation. Signed-off-by: Mulugeta Mammo <[email protected]> * Access control exception fixed. Removed lz4-java support for now. - PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged. - The zstd-jni PR is merged since version 1.5.4-1. - The lz4-java support temporarily removed until the PR gets merged. Signed-off-by: Mulugeta Mammo <[email protected]> * Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Fix lint issue with plugin-security. - Remove thridPartyAudit that was there for supporting lz4-java. Signed-off-by: Mulugeta Mammo <[email protected]> * Add test cases for compression/decompression. Other minor changes. - add test cases for compression/decompression. - rename package. - add a CHANGELOG entry. - add more checks for signed integer arithmetic. Signed-off-by: Mulugeta Mammo <[email protected]> * Remove ES grant in plugin-security.policy. Fix minor javadoc issue. - Remove ES grant in plugin-security.policy file. - Replace @link and @see to fix javadoc error. Signed-off-by: Mulugeta Mammo <[email protected]> * Upgrade jettison version to 1.5.4. Signed-off-by: Mulugeta Mammo <[email protected]> * Update SHA for jettison 1.5.4. Signed-off-by: Mulugeta Mammo <[email protected]> --------- Signed-off-by: Mulugeta Mammo <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> Signed-off-by: Valentin Mitrofanov <[email protected]>
* Add experimental support for zstd and lz4 (native) compression. Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: #3354. Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary). * Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions. * Fix DCO and and issues with CodecTests.java. * Fix forbidden api violation error for lz4-java. * Fix license headers. Remove and fix unnecessary fields. * Fix magic numbers. Use more restrictive access modifiers. * Use protected access modifier for Zstd and LZ4 compression mode classes. * Allow negative compression levels for zstd. Use more restrictive access modifiers. * Use a more restrictive permission for loading zstd-jni and lz4-java libraries. * Rename a file (follow a consistent version naming convention). * Refactor and create a new custom-codecs sandbox module. * Remove blank lines. * Restore Lucene92CustomCodec to extend from FilterCodec. * Make use of the compressionLevel argument. * Make Lucene92CustomCodec abstract and use a package-private access modifier. * Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec. * Fix lint errors. * Fix the description for the custom-codecs plugin. * Fix wildcard import and improve documentation. * Access control exception fixed. Removed lz4-java support for now. - PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged. - The zstd-jni PR is merged since version 1.5.4-1. - The lz4-java support temporarily removed until the PR gets merged. * Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Upgrade plugin to use Lucene95Codec. Rename files accordingly. - Fix lint issue with plugin-security. - Remove thridPartyAudit that was there for supporting lz4-java. * Add test cases for compression/decompression. Other minor changes. - add test cases for compression/decompression. - rename package. - add a CHANGELOG entry. - add more checks for signed integer arithmetic. * Remove ES grant in plugin-security.policy. Fix minor javadoc issue. - Remove ES grant in plugin-security.policy file. - Replace @link and @see to fix javadoc error. * Upgrade jettison version to 1.5.4. * Update SHA for jettison 1.5.4. --------- (cherry picked from commit f071c9b) Signed-off-by: Mulugeta Mammo <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
@mulugetam @dblock for 2.7.0, we should wait for ZSTD 1.5.5 update (the JNI one), they fixed rare corruption bug [1], thanks [1] https://github.com/facebook/zstd/releases/tag/v1.5.5 |
- Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). Signed-off-by: Mulugeta Mammo <[email protected]>
- Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). Signed-off-by: Mulugeta Mammo <[email protected]>
* Add documentation for the custom-codecs plugin. - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). Signed-off-by: Mulugeta Mammo <[email protected]> * Update _api-reference/index-apis/create-index.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Mulugeta Mammo <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Naarcha-AWS <[email protected]>
* Add documentation for the custom-codecs plugin. - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). Signed-off-by: Mulugeta Mammo <[email protected]> * Update _api-reference/index-apis/create-index.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Mulugeta Mammo <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Naarcha-AWS <[email protected]> (cherry picked from commit 817b730) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add documentation for the custom-codecs plugin. - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). * Update _api-reference/index-apis/create-index.md * Update _install-and-configure/plugins.md * Update _install-and-configure/plugins.md --------- (cherry picked from commit 817b730) Signed-off-by: Mulugeta Mammo <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Naarcha-AWS <[email protected]>
) * Add documentation for the custom-codecs plugin. - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577). Signed-off-by: Mulugeta Mammo <[email protected]> * Update _api-reference/index-apis/create-index.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _install-and-configure/plugins.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Mulugeta Mammo <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Naarcha-AWS <[email protected]>
This PR adds experimental support for lz4 (native) and zstd (with and without dictionary) compression algorithms. The rationale for adding this support is discussed here: #3354.
Users would be able to set the compression type by setting the
index.codec
with the following values:"LZ4"
-- lz4 (native) compression."ZSTD"
-- zstd compression, with dictionary."ZSTDNODICT"
-- zstd compression, without dictionary.The core classes are packaged under
org.apache.lucene.codecs.experimental
.The compression algorithms have been tested and benchmarked using OpenSearch-Benchmark and StoredFieldsBenchmark . The results show significant gains in performance.
Signed-off-by: Mulugeta Mammo [email protected]
Description
A detailed description is available here: #3354
Issues Resolved
#3354
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.