TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805

ChrisHegarty · 2024-09-18T19:57:44Z

ERROR: The following test(s) have failed:
  - org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.testSparseDocValuesVsStoredFields (:lucene:core)
    Test output: /opt/buildkite-agent/builds/bk-agent-prod-gcp-1726674638633683811/elastic/apache-lucene-nightly/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.txt
    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.testSparseDocValuesVsStoredFields" -Ptests.jvms=12 -Ptests.jvmargs= -Ptests.seed=175AB2293A24B66E -Ptests.nightly=true -Ptests.gui=true -Ptests.file.encoding=UTF-8 -Ptests.vectorsize=128 -Ptests.forceintegervectors=true
  - org.apache.lucene.backward_codecs.lucene80.TestBestSpeedLucene80DocValuesFormat.testSparseDocValuesVsStoredFields (:lucene:backward-codecs)
    Test output: /opt/buildkite-agent/builds/bk-agent-prod-gcp-1726674638633683811/elastic/apache-lucene-nightly/lucene/backward-codecs/build/test-results/test/outputs/OUTPUT-org.apache.lucene.backward_codecs.lucene80.TestBestSpeedLucene80DocValuesFormat.txt
    Reproduce with: gradlew :lucene:backward-codecs:test --tests "org.apache.lucene.backward_codecs.lucene80.TestBestSpeedLucene80DocValuesFormat.testSparseDocValuesVsStoredFields" -Ptests.jvms=12 -Ptests.jvmargs= -Ptests.seed=175AB2293A24B66E -Ptests.nightly=true -Ptests.gui=true -Ptests.file.encoding=UTF-8 -Ptests.vectorsize=128 -Ptests.forceintegervectors=true

 >     java.lang.ArrayIndexOutOfBoundsException: Index 3 out of bounds for length 3
   >         at __randomizedtesting.SeedInfo.seed([175AB2293A24B66E:43816E3AC0A89021]:0)
   >         at org.apache.lucene.util.packed.Packed64.get(Packed64.java:80)
   >         at org.apache.lucene.index.OrdinalMap$1.get(OrdinalMap.java:379)
   >         at org.apache.lucene.codecs.DocValuesConsumer$7$1.nextOrd(DocValuesConsumer.java:946)
   >         at org.apache.lucene.codecs.lucene90.Lucene90DocValuesConsumer$4$1.nextDoc(Lucene90DocValuesConsumer.java:808)
   >         at org.apache.lucene.codecs.lucene90.Lucene90DocValuesConsumer.writeValues(Lucene90DocValuesConsumer.java:201)
   >         at org.apache.lucene.codecs.lucene90.Lucene90DocValuesConsumer.doAddSortedNumericField(Lucene90DocValuesConsumer.java:705)
   >         at org.apache.lucene.codecs.lucene90.Lucene90DocValuesConsumer.addSortedSetField(Lucene90DocValuesConsumer.java:770)
   >         at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:853)
   >         at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:148)
   >         at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:152)
   >         at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:188)
   >         at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:314)
   >         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:149)
   >         at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5292)
   >         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4758)
   >         at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6581)
   >         at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:38)
   >         at org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2327)sFormat
   >         at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2322)
   >         at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:6033)
   >         at org.apache.lucene.index.IndexWriter.maybeProcessEvents(IndexWriter.java:6023)
   >         at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1562)
   >         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1847)
   >         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1487)
   >         at org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:224)
   >         at org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.doTestSparseDocValuesVsStoredFields(TestLucene90DocValuesFormat.java:215)
   >         at org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.testSparseDocValuesVsStoredFields(TestLucene90DocValuesFormat.java:169)
   >         at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   >         at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   >         at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   >         at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   >         at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   >         at java.base/java.lang.Thread.run(Thread.java:1583)

The text was updated successfully, but these errors were encountered:

benwtrent · 2024-09-18T20:15:33Z

Git bisect puts the blame at: 6634b41

#13686

benwtrent · 2024-09-18T20:42:16Z

git bisect might be lying, I don't see how that PR could cause this failure :(

iverase · 2024-09-18T20:50:25Z

Probably we are not remapping the field ordinal properly when merging segments.

iverase · 2024-09-18T20:56:09Z

See here:

lucene/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java

Line 248 in 6d987e1

    
           // NOTE: we cannot just use the merged fieldInfo.number (instead of resolving to

jpountz · 2024-09-19T07:47:53Z

Argh, I remember carefully checking whether this PR could cause issues due to mismatched field infos, but apparently I missed something.

rmuir · 2024-09-19T10:49:33Z

Can we just revert the change for now? it does two things at once... one of those is using field.number instead of field.name which is historically unsafe: it was always the big risk of bulk merge.

It can't be done like this in all situations, and doing it across versions like this is especially YOLO and asking for corruption IMO. This is why stored fields never bulk merge across versions:

lucene/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java

Line 684 in e4ac577

    
           || ((Lucene90CompressingStoredFieldsReader) candidate).getVersion() != VERSION_CURRENT) {

ChrisHegarty · 2024-09-19T11:10:20Z

I'm going to try reverting 6634b41.

benwtrent · 2024-09-19T11:15:35Z

@ChrisHegarty this makes be worried about all the other field number switch with field name things as well.

I am wondering if we should revert all of them, there are multiple PRs.

ChrisHegarty · 2024-09-19T11:27:29Z

The revert fixes the failures we see here and the other related test failures, seen in #13807 #13808.

rmuir · 2024-09-19T11:29:04Z

sounds like the safe bet to backout any changes messing around with fieldinfos on merge.

Sorry for the short explanation, there is a long history of super-sneaky corruption bugs like this. always happening on some corner-case such as addIndexes(reader) or across different versions, or something like that. When they happen on merge it makes debugging them especially difficult. Mixing up data across fields because of field numbers happened more than once.

This is why, if you look at bulk merge code, you see crazy sysprop escape hatches and stuffl like that: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L492-L508

rmuir · 2024-09-19T11:30:45Z

we didn't even have bulk merge at all in lucene for a couple years at all because of field-number bugs like this. got bit too many times.

ChrisHegarty · 2024-09-19T11:36:45Z

I filed a meta issue to better track the reverts, #13809

jpountz · 2024-09-19T11:45:43Z

I don't mind reverting but I would also like to fix the root cause as this change only exposed an existing bug: someone is calling a doc-values producer with the wrong FieldInfo object.

rmuir · 2024-09-19T11:53:16Z

yeah it would be best to improve the tests: it is not good that it took this test, run many many times, to find it.

jpountz · 2024-09-19T11:55:44Z

I found the bug, it's the slow composite reader wrapper which is at fault here. I'll look into improving tests to detect such issues.

Separately, we may want to consider changing the DocValuesProducer API to take a String rather than a FieldInfo, like e.g. points, so that it is not tempted to trust the caller to resolve the FieldInfo object correctly.

jpountz · 2024-09-19T12:28:13Z

I found the root cause, it's here:

lucene/lucene/core/src/java/org/apache/lucene/codecs/DocValuesConsumer.java

Line 616 in e4ac577

values = docValuesProducer.getSorted(fieldInfo);

. The producer is called on fieldInfo instead of readerFieldInfo like other doc values types do. I'm working on tests that would have uncovered this problem.

ChrisHegarty · 2024-09-19T12:38:18Z

Ok, reverts are prepared. @jpountz you wanna fix (and not revert), or revert for now?

jpountz · 2024-09-19T13:02:32Z

Give me some time to see how the fix and tests look, and let's think about whether/what to revert later on? I expect to have something by end of day. @ChrisHegarty Feel free to cut the branch in the meantime, we can backport to the 9.12 branch if necessary?

ChrisHegarty · 2024-09-19T13:05:55Z

Let's postpone the 9_12 branch cut until tomorrow, pending on the outcome of this.

bugmakerrrrrr · 2024-09-19T13:31:24Z

Separately, we may want to consider changing the DocValuesProducer API to take a String rather than a FieldInfo, like e.g. points, so that it is not tempted to trust the caller to resolve the FieldInfo object correctly.

@jpountz +1, We have encountered the related issue in NormsProducer, and I worked around it by resolving the field info inside the NormsProducer.

This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismatched field numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a `SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly resolves the correct `FieldInfo` object), - introducing an explicit test for mismatched field numbers in `BaseDocValuesFormatTestCase`. These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info. Closes apache#13805

jpountz · 2024-09-19T14:11:50Z

I have a fix and tests that would have found the bug at #13812.

This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismatched field numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a `SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly resolves the correct `FieldInfo` object), - introducing an explicit test for mismatched field numbers for doc values, points, postings and knn vectors. These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info. Closes #13805

ChrisHegarty added this to the 9.12.0 milestone Sep 18, 2024

benwtrent added blocker A severe issue that should be resolved before the released specified in its Milestone. labels Sep 18, 2024

ChrisHegarty mentioned this issue Sep 19, 2024

TestBestCompressionLucene80DocValuesFormat fails with ArrayIndexOutOfBoundsException #13807

Closed

ChrisHegarty mentioned this issue Sep 19, 2024

Backout changes messing around with fieldinfos on merge #13809

Closed

ChrisHegarty mentioned this issue Sep 19, 2024

Replace Map<String,Object> with IntObjectHashMap for KnnVectorsReader #13763

Merged

jpountz mentioned this issue Sep 19, 2024

Improve testing of mismatched field numbers. #13812

Merged

jpountz closed this as completed in #13812 Sep 20, 2024

jpountz closed this as completed in da1f954 Sep 20, 2024

bugmakerrrrrr mentioned this issue Oct 28, 2024

replace Map<String,Object> with IntObjectHashMap for DV producer #13961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805

TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805

ChrisHegarty commented Sep 18, 2024 •

edited

Loading

benwtrent commented Sep 18, 2024

benwtrent commented Sep 18, 2024

iverase commented Sep 18, 2024

iverase commented Sep 18, 2024

jpountz commented Sep 19, 2024

rmuir commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

benwtrent commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

rmuir commented Sep 19, 2024

rmuir commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

jpountz commented Sep 19, 2024

rmuir commented Sep 19, 2024

jpountz commented Sep 19, 2024

jpountz commented Sep 19, 2024 •

edited

Loading

ChrisHegarty commented Sep 19, 2024

jpountz commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

bugmakerrrrrr commented Sep 19, 2024

jpountz commented Sep 19, 2024

TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805

TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805

Comments

ChrisHegarty commented Sep 18, 2024 • edited Loading

benwtrent commented Sep 18, 2024

benwtrent commented Sep 18, 2024

iverase commented Sep 18, 2024

iverase commented Sep 18, 2024

jpountz commented Sep 19, 2024

rmuir commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

benwtrent commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

rmuir commented Sep 19, 2024

rmuir commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

jpountz commented Sep 19, 2024

rmuir commented Sep 19, 2024

jpountz commented Sep 19, 2024

jpountz commented Sep 19, 2024 • edited Loading

ChrisHegarty commented Sep 19, 2024

jpountz commented Sep 19, 2024

ChrisHegarty commented Sep 19, 2024

bugmakerrrrrr commented Sep 19, 2024

jpountz commented Sep 19, 2024

ChrisHegarty commented Sep 18, 2024 •

edited

Loading

jpountz commented Sep 19, 2024 •

edited

Loading