Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare failure from org.elasticsearch.gateway.PersistedClusterStateServiceTests.testHandlesShuffledDocuments #88471

Closed
joegallo opened this issue Jul 12, 2022 · 3 comments · Fixed by #88786
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@joegallo
Copy link
Contributor

CI Link

https://gradle-enterprise.elastic.co/s/tolzhhuvjvobo/tests/:server:test/org.elasticsearch.gateway.PersistedClusterStateServiceTests/testHandlesShuffledDocuments%20%7Bseed=%5B6281127015198EC1:C2E270963902654%5D%7D?top-execution=1

Repro line

./gradlew ':server:test' --tests "org.elasticsearch.gateway.PersistedClusterStateServiceTests.testHandlesShuffledDocuments {seed=[6281127015198EC1:C2E270963902654]}" -Dtests.seed=6281127015198EC1 -Dtests.locale=ca -Dtests.timezone=Africa/Asmera -Druntime.java=18

Does it reproduce?

No

Applicable branches

master, maybe others

Failure history

No response

Failure excerpt

2> junit.framework.AssertionFailedError: Unexpected exception type, expected CorruptStateException but got java.lang.IndexOutOfBoundsException: Range [4, 4 + -3) out of bounds for length 1 |  
-- | --
  | at __randomizedtesting.SeedInfo.seed([6281127015198EC1:C2E270963902654]:0) |  
  | at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2850) |  
  | at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2831) |  
  | at org.elasticsearch.gateway.PersistedClusterStateServiceTests.testHandlesShuffledDocuments(PersistedClusterStateServiceTests.java:1149) |  
  |   |  
  | Caused by: |  
  | java.lang.IndexOutOfBoundsException: Range [4, 4 + -3) out of bounds for length 1 |  
  | at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100) |  
  | at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromIndexSize(Preconditions.java:118) |  
  | at java.base/jdk.internal.util.Preconditions.checkFromIndexSize(Preconditions.java:397) |  
  | at java.base/java.util.Objects.checkFromIndexSize(Objects.java:411) |  
  | at org.elasticsearch.common.bytes.BytesArray.slice(BytesArray.java:95) |  
  | at org.elasticsearch.common.compress.DeflateCompressor.uncompress(DeflateCompressor.java:208) |  
  | at org.elasticsearch.gateway.PersistedClusterStateService.uncompress(PersistedClusterStateService.java:660) |  
  | at org.elasticsearch.gateway.PersistedClusterStateService.consumeFromType(PersistedClusterStateService.java:622) |  
  | at org.elasticsearch.gateway.PersistedClusterStateService.loadOnDiskState(PersistedClusterStateService.java:551) |  
  | at org.elasticsearch.gateway.PersistedClusterStateService.loadBestOnDiskState(PersistedClusterStateService.java:458) |  
  | at org.elasticsearch.gateway.PersistedClusterStateServiceTests.loadPersistedClusterState(PersistedClusterStateServiceTests.java:1633) |  
  | at org.elasticsearch.gateway.PersistedClusterStateServiceTests.lambda$testHandlesShuffledDocuments$30(PersistedClusterStateServiceTests.java:1149) |  
  | at org.apache.lucene.tests.util.LuceneTestCase._expectThrows(LuceneTestCase.java:3003) |  
  | at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2840) |  
  | ... 2 more

I haven't been able to get a reliable reproduction line, but usually running the test 10,000 times seems to provoke it:

for i in {0..100}; do ./gradlew ':server:test' -Dtests.iters=100 --tests "org.elasticsearch.gateway.PersistedClusterStateServiceTests.testHandlesShuffledDocuments"; done
@joegallo joegallo added :Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label labels Jul 12, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jul 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@joegallo
Copy link
Contributor Author

The PersistedClusterStateService test that fails is expecting a CorruptStateException, but it actually gets an IndexOutOfBoundsException. Here's why:

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/common/compress/DeflateCompressor.java#L208

If the bytesReference passed into uncompress is corrupted enough that the length of the entire bytesReference is less than the length of the DFL\0 header, then the above linked line fails at the slice (with an IOOBE) rather than in the writeTo (with a ZipException/IOException).

Here's a little bit of WIP that could maybe help in the way towards a solution:

diff --git a/server/src/main/java/org/elasticsearch/common/compress/DeflateCompressor.java b/server/src/main/java/org/elasticsearch/common/compress/DeflateCompressor.java
index e8c9fb69e6c..daecf286b20 100644
--- a/server/src/main/java/org/elasticsearch/common/compress/DeflateCompressor.java
+++ b/server/src/main/java/org/elasticsearch/common/compress/DeflateCompressor.java
@@ -201,6 +203,17 @@ public class DeflateCompressor implements Compressor {

     @Override
     public BytesReference uncompress(BytesReference bytesReference) throws IOException {
+        if (bytesReference.length() < HEADER.length) {
+            throw new IOException(
+                String.format(
+                    Locale.ROOT,
+                    "bytesReference length [%s] is less than HEADER length [%s]",
+                    bytesReference.length(),
+                    HEADER.length
+                )
+            );
+        }
+
         final BytesStreamOutput buffer = baos.get();
         try {
             final Inflater inflater = inflaterRef.get();

@joegallo
Copy link
Contributor Author

Related to #88479, in that I ran into this while testing that PR locally.

@DJRickyB DJRickyB removed the needs:triage Requires assignment of a team area label label Jul 12, 2022
rjernst added a commit to rjernst/elasticsearch that referenced this issue Jul 25, 2022
This commit checks for an edge case in decompression with deflate where
the input is shorter than the deflate header size. Before this commit
a slice error would occur with a negative bound. With this commit an IOException is thrown.

closes elastic#88471
rjernst added a commit that referenced this issue Jul 25, 2022
This commit checks for an edge case in decompression with deflate where
the input is shorter than the deflate header size. Before this commit
a slice error would occur with a negative bound. With this commit an IOException is thrown.

closes #88471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants