-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException #13805
Comments
git bisect might be lying, I don't see how that PR could cause this failure :( |
Probably we are not remapping the field ordinal properly when merging segments. |
See here: lucene/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java Line 248 in 6d987e1
|
Argh, I remember carefully checking whether this PR could cause issues due to mismatched field infos, but apparently I missed something. |
Can we just revert the change for now? it does two things at once... one of those is using field.number instead of field.name which is historically unsafe: it was always the big risk of bulk merge. It can't be done like this in all situations, and doing it across versions like this is especially YOLO and asking for corruption IMO. This is why stored fields never bulk merge across versions: Line 684 in e4ac577
|
I'm going to try reverting 6634b41. |
@ChrisHegarty this makes be worried about all the other field number switch with field name things as well. I am wondering if we should revert all of them, there are multiple PRs. |
sounds like the safe bet to backout any changes messing around with fieldinfos on merge. Sorry for the short explanation, there is a long history of super-sneaky corruption bugs like this. always happening on some corner-case such as This is why, if you look at bulk merge code, you see crazy sysprop escape hatches and stuffl like that: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L492-L508 |
we didn't even have bulk merge at all in lucene for a couple years at all because of field-number bugs like this. got bit too many times. |
I filed a meta issue to better track the reverts, #13809 |
I don't mind reverting but I would also like to fix the root cause as this change only exposed an existing bug: someone is calling a doc-values producer with the wrong FieldInfo object. |
yeah it would be best to improve the tests: it is not good that it took this test, run many many times, to find it. |
I found the bug, it's the slow composite reader wrapper which is at fault here. I'll look into improving tests to detect such issues. Separately, we may want to consider changing the DocValuesProducer API to take a String rather than a FieldInfo, like e.g. points, so that it is not tempted to trust the caller to resolve the FieldInfo object correctly. |
I found the root cause, it's here:
fieldInfo instead of readerFieldInfo like other doc values types do. I'm working on tests that would have uncovered this problem.
|
Ok, reverts are prepared. @jpountz you wanna fix (and not revert), or revert for now? |
Give me some time to see how the fix and tests look, and let's think about whether/what to revert later on? I expect to have something by end of day. @ChrisHegarty Feel free to cut the branch in the meantime, we can backport to the 9.12 branch if necessary? |
Let's postpone the 9_12 branch cut until tomorrow, pending on the outcome of this. |
@jpountz +1, We have encountered the related issue in NormsProducer, and I worked around it by resolving the field info inside the NormsProducer. |
This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismatched field numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a `SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly resolves the correct `FieldInfo` object), - introducing an explicit test for mismatched field numbers in `BaseDocValuesFormatTestCase`. These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info. Closes apache#13805
I have a fix and tests that would have found the bug at #13812. |
This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismatched field numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a `SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly resolves the correct `FieldInfo` object), - introducing an explicit test for mismatched field numbers for doc values, points, postings and knn vectors. These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info. Closes #13805
This improves testing of mismatched field numbers by - improving `AssertingDocValuesProducer` to detect mismatched field numbers, - introducing a `MismatchedCodecReader` to actually test mismatched field numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a `SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly resolves the correct `FieldInfo` object), - introducing an explicit test for mismatched field numbers for doc values, points, postings and knn vectors. These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info. Closes #13805
The text was updated successfully, but these errors were encountered: