-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error upper bound may be wrong when performing incremental reductions #40005
Comments
Pinging @elastic/es-analytics-geo |
ping @colings86 @polyfractal as we have chatted about this. |
IMO the best way to fix this will be to change to using |
sounds good @colings86 I can look into this. |
Actually, I am not sure I will get to this anytime soon, I marked this issue "help wanted" |
I'd like to take this. |
As I can see, there is a binary serialization/deserialization of |
@javanna, @colings86, can you answer, please? |
Are you referring to how it is serialized today? Currently,
We will indeed need to change the serialization, and worry about cross-version compatibility (clusters can be heterogeneous, so a new version might need to serialize a response to an older version and vice versa). This is done by checking the input/output stream versions and serializing appropriately. The pattern is: if (in.getVersion().onOrAfter(Version.V_7_3_0)) {
this.docCountError = in.readOptionalLong();
} else {
this.docCountError = in.readLong();
} Similar for the output stream. E.g. when talking to an older node, we use the old serialization method. And when talking to newer nodes, we can use the "optional" methods to instantiate a Here's another example, you can find these scattered around the code if you look at the input/output streams: Line 47 in 44ea7dc
|
If I read from an old version stream, there is ambiguity in |
I think we'll want to treat it as "not calculated", since that's the less-bad way to resolve the ambiguity. E.g. if we interpret it as "0 error" we might assign a no-error rate to something that was just not calculated yet and actually has a large error, leading to very incorrect error reports. (@javanna is on holiday right now, but he can confirm when he's back) |
I have two questions regarding this piece of code: Lines 242 to 243 in 44ea7dc
|
I believe it is
Ordering by the term itself (
We are sorting alphabetically, so since Shard 2 starts with "B" we know it doesn't have an "A" term (otherwise it would have sent a count for "A"). Similarly, we know Shard 3 doesn't have "A", "B", or "C". When merging, we take the "top" 5 alphabetically from the returned results, meaning A through E in this case. After merging doc counts we get:
We know the counts are exact because we're not relying on doc_counts for ordering, but the total lexicographic ordering. "A" is the "top" term because it was returned and sorts to the first position, regardless of the document count. If only one shard returns that value, or all the shards return it, doesn't matter because it will sort to the "top" regardless of count. |
Got it, thanks! |
PR is ready for review |
…43874) When performing incremental reductions, 0 value of docCountError may mean that the error was not previously calculated, or that the error was indeed previously calculated and its value was 0. We end up rejecting true values set to 0 this way. This may lead to wrong upper bound of error in result. To fix it, this PR makes docCountError nullable. null values mean that error was not calculated yet. Fixes #40005 Co-authored-by: Igor Motov <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
…lastic#43874) When performing incremental reductions, 0 value of docCountError may mean that the error was not previously calculated, or that the error was indeed previously calculated and its value was 0. We end up rejecting true values set to 0 this way. This may lead to wrong upper bound of error in result. To fix it, this PR makes docCountError nullable. null values mean that error was not calculated yet. Fixes elastic#40005 Co-authored-by: Igor Motov <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
…lastic#43874) When performing incremental reductions, 0 value of docCountError may mean that the error was not previously calculated, or that the error was indeed previously calculated and its value was 0. We end up rejecting true values set to 0 this way. This may lead to wrong upper bound of error in result. To fix it, this PR makes docCountError nullable. null values mean that error was not calculated yet. Fixes elastic#40005 Co-authored-by: Igor Motov <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
…ons (#43874) (#76475) When performing incremental reductions, 0 value of docCountError may mean that the error was not previously calculated, or that the error was indeed previously calculated and its value was 0. We end up rejecting true values set to 0 this way. This may lead to wrong upper bound of error in result. To fix it, this PR makes docCountError nullable. null values mean that error was not calculated yet. Fixes #40005, #75667 Co-authored-by: Nikita Glashenko <[email protected]>
I'm reopening this issue because when removing an "@AwaitsFix" in CCSDuelIT in #85538 that was still pointing to this issue, I ran into a reproducable test failure that seems to point at this issue still. Removing the "@AwaitsFix" and running
on master show that the responses of running a search request with a terms aggregation on a CCS setup differs on whether it is run with or without minimizing rountrip, and I think this touches incremental reductions. With the above reproduction line I get the following mismatch in the
|
When reducing terms aggs results, we check if we already have a doc count error for a certain bucket by looking at its error see https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/InternalTerms.java#L245): if it's greater than zero we have already calculated it, while if we have zero it means we have not hence we ignore such value and use the doc count of the last returned bucket.
When performing incremental reductions though,
0
may mean that the error was not previously calculated, or that the error was indeed previously calculated and its value was0
. We end up rejecting true values set to0
this way.The text was updated successfully, but these errors were encountered: