-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9953: Make FacetResult#value accurate for LongValueFacetCounts #131
Conversation
… multi-value doc cases
@@ -429,15 +429,15 @@ public void testRandomMultiValued() throws Exception { | |||
int expectedChildCount = 0; | |||
int expectedTotalCount = 0; | |||
for (int i = 0; i < valueCount; i++) { | |||
if (values[i] != null) { | |||
if (values[i] != null && values[i].length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we force the test case to randomly add a duplicate long value in a document (Line 398) so that the test case fails with the existing code and then passes with your PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gautamworah96 . I'm not sure I fully understand the suggestion to add a duplicate long value to test the change? The current bug is that, whenever a document has multiple values for a particular field (they don't need to be the same value), we'll count that document multiple times (once for each value it contains) in FacetResult#value. So I think it should be sufficient to just make sure documents contain multiple values to trip the bug, which is happening in Line 396 in this test case. The change I made to the test case reflects the proper logic for populating FacetResult#value, and does in fact fail if my change is not present.
Does this make sense? Maybe I'm missing your point. If so, I apologize!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a single document contains the same numeric value more than once, indexed as a SortedNumericDocValuesField
, Lucene will indeed index that number multiple times for that document, and then this facet counting implementation will count it multiple times for that one value, right?
Or does Lucene de-dup such duplicate numeric values in one document? I'm not certain :)
Ideally, it should only count that value once for that document, but it over-counts today.
However, this is a different bug than what this PR is fixing -- so let's keep this PR about the original bug of not overcounting the FacetResult.value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a single document contains the same numeric value more than once, indexed as a SortedNumericDocValuesField, Lucene will indeed index that number multiple times for that document, and then this facet counting implementation will count it multiple times for that one value, right?
Correct @mikemccand . I thought that this PR was for fixing that bug. I have replicated that bug here (see testLongValues)
So essentially, within a FacetResult you would want the value
count for the label
to be counted just once but that is not the case today
I'll create a JIRA for it.
Edit: Created LUCENE-9964
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting @mikemccand / @gautamworah96. Right, so that's not the bug I was considering here, but that's a good one to dig into as well. Thanks for spinning off an issue @gautamworah96!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thank you for reproducing this (separate) issue in test PR and opening a new issue @gautamworah96! Let's see if we can fix this one too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks @gsmiller!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gsmiller!
Description
LongValueFacetCounts
may produce inaccurate counts forFacetResult#value
in cases where docs are multi-valued. This fixes the bug. Note that I'm proposing this change be included in the 8x branch, so I've also included a separate pull request in the old repo: apache/lucene-solr#2491Solution
Modify
LongValueFacetCounts
to only count each new document it sees once, instead of once-per-value.Tests
Modified existing test cases in
TestLongValueFacetCounts
to reflect the correct behavior and ensured they pass.Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.