-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to first/last aggregators for numeric types during ingestion #10949
Conversation
Signed-off-by: frank chen <[email protected]>
Signed-off-by: frank chen <[email protected]>
This reverts commit 2a1e47a.
Signed-off-by: frank chen <[email protected]>
Signed-off-by: frank chen <[email protected]>
One tricky problem left is that first/last aggregators is not supported by SQL query on reindexed long/float/double columns while these aggregators work well in a native query. The type of reindexed double/float/long/string first/last columns are marked as COMPLEX in schema, and the underlying type is lost when the type is converted into
Since the underlying data type of the column is lost during SQL planning, current One way I can come up with is to define some macros such as double_latest for different data types at the SQL layer. |
Signed-off-by: frank chen <[email protected]>
Hmm, so what I have had in mind to solve this is to be able to determine whether a #10277 also added tracking of the "name" of the complex type on That said, I haven't had a look at this PR at all yet. I will try to get to it sometime soon, maybe I will have some ideas while looking over the code. |
Hi @clintropolis , Thanks for your suggestion. I'll try to solve it. |
Depending on how big of a change this is, it might be worth splitting out a separate PR to go in before this one. I'll try to think about this a bit as well. |
The name of complex type has been set in first/last aggregator And I checked the code about how
So, is it reasonable to make some changes here to pass the type name as well as its value type to |
Hi @clintropolis @suneet-s , Could you review this PR at any time you're convenient ? Since this PR is a little large, I think the SQL problem could be separated in another PR. |
Sorry, I will try to get to this soon! I think I have a similar problem to solve with complex types in a different thing I'm working on, so will be thinking about how we can deal with differences between intermediary and finalized types a bit better as well. |
If there're any ideas or progress about solving complex types, could you let me know ? I'm also working on the sql problem. |
@FrankChen021 thanks for bringing this back to the top of my radar. I will look through this over the next week or so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed about 12 files. Posting an incomplete review
import java.nio.ByteBuffer; | ||
|
||
/** | ||
* The class serializes a Pair<Long, ?> object for double/float/longFirst and double/float/longLast aggregators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe why you chose not to also make SerializablePairLongStringSerde
not extend this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reasonable to refactor SerializablePairLongStringSerde
to be subclass of AbstractSerializablePairSerde
, but I think this kind of change is not tightly related to this PR because this PR is already a little bit large. I think a coming PR would resolve this problem once this PR is merged.
protected abstract T toPairObject(ByteBuffer buffer); | ||
|
||
protected abstract byte[] pairToBytes(T val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* The class serializes a Pair<Long, ?> object for double/float/longFirst and double/float/longLast aggregators | ||
*/ | ||
public abstract class AbstractSerializablePairSerde<T extends SerializablePair<Long, ?>> extends ComplexMetricSerde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change class name to AbstractSerializableLongObjectPairSerde
public abstract class AbstractSerializablePairSerde<T extends SerializablePair<Long, ?>> extends ComplexMetricSerde | |
public abstract class AbstractSerializableLongObjectPairSerde<T extends SerializablePair<Long, ?>> extends ComplexMetricSerde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Override | ||
public int compare(@Nullable T o1, @Nullable T o2) | ||
{ | ||
return Longs.compare(o1.lhs, o2.lhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not correctly handle if either o1 or o2 is null. See StringFirstAggregatorFactory#VALUE_COMPARATOR
, we'll want a similar behavior here.
Would it be possible to update the integration tests that were added to surface this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check if UT is able to check this error first because IT cases share some data sources with other cases.
|
||
protected abstract T toPairObject(ByteBuffer buffer); | ||
|
||
protected abstract byte[] pairToBytes(T val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected abstract byte[] pairToBytes(T val); | |
protected abstract byte[] pairToBytes(@Nullable T val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new commit, null check is added to the caller of this method, so it's not necessary to declare this parameter as Nullable
} | ||
|
||
@Override | ||
public byte[] toBytes(T val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public byte[] toBytes(T val) | |
public byte[] toBytes(@Nullable T val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
@Override | ||
protected byte[] pairToBytes(SerializablePairLongFloat val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the super class says val
can be null, all these implementations should be able to handle a null val
.
I haven't dug in yet to know what this means, but this same pattern exists in all 3 implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the new commit, null check is added to the caller of this method, so no need to handle null in this method
@FrankChen021 To help with reviewing this PR, could you update the PR description to include some notes how you chose to implement the solution for this. For example, it looks like the Also, it would help if you included a description of the different scenarios you tested, and known unsupported conditions - like your comment about first last aggregators not working in SQL queries. |
@suneet-s Thanks for your review. I will address all the comments you left and update this PR later this day. |
Description of this PR has been updated. Let me know if there's anything left. |
Thanks @FrankChen021 I will take a look again this week! |
What happens when a user issues the I'm asking because it looks like #10332 added handling of complex type columns, which used to be ok because stringFirst/Last was the only type of complex column. But now that we've introduced these column types, the expected behavior is less clear. Perhaps you can add some tests to CalciteQueryTest to validate the behavior that we want users to see when they issue sql queries on these column types. |
Signed-off-by: frank chen <[email protected]>
EARLIEST/LATEST both work well for stringFirst/Last columns. They also work for none double/float/longFirst/Last columns. For double/long/floatFirst/Last columns, following exception message is returned
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FrankChen021 Thank you for this PR. I have some minor comments. do you also want to add
- unit tests for the serde
- documentation changes
@Override | ||
public int compare(@Nullable T o1, @Nullable T o2) | ||
{ | ||
return getLongObjectPairComparator().compare(o1, o2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to not create a new comparator object for each comparison op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations of getLongObjectPairComparator
does not create new comparator object, they hold a static comparator object.
} | ||
|
||
@Override | ||
public ObjectStrategy<T> getObjectStrategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be cleaner to have each subclass implement getObjectStrategy() and then we need not have three abstract methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good suggestion.
processing/src/main/java/org/apache/druid/query/aggregation/SerializablePairLongLongSerde.java
Outdated
Show resolved
Hide resolved
if (pair.lhs < firstTime) { | ||
firstTime = pair.lhs; | ||
|
||
// rhs might be NULL under SQL-compatibility mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit out of my depth here. what will happen if the aggregate was stored as null in segment since sql compatibility was on in the task writing the segment. But then sql compatability is turned off when the segment data is being read. should it still be read as null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I will check it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this question, the short answer is yes. The query time processing of code is here
Line 228 in 9745d9e
return new SerializablePair<>(((Number) map.get("lhs")).longValue(), null); |
I think we should return the default value 0 for this case. What do you think @clintropolis ?
@@ -36,11 +36,16 @@ | |||
private static final int NULL_VALUE = -1; | |||
|
|||
/** | |||
* Returns whether a given value selector *might* contain SerializablePairLongString objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class may require a rename now. May be FirstLastUtils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, selectorNeedsFoldCheck
method in StringFirstLastUtils
is now shared by long/float/doubleFirst/Last, it should be extracted out of this class.
I have not made more changes to this class file because stringFirst/Last will be refactored in a new PR to share new abstract classes provided in this PR, which means this class is also involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok. so you will make the changes in that PR. is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
AggregateCombiner floatFirstAggregateCombiner = combiningAggFactory.makeAggregateCombiner(); | ||
|
||
SerializablePair[] inputPairs = { | ||
new SerializablePair<>(5L, 134.3f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add tests with null values as input and/or expected result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. null related tests have been added in the latest commit
Co-authored-by: Abhishek Agarwal <[email protected]>
@Override | ||
public ObjectStrategy getObjectStrategy() | ||
{ | ||
return new ObjectStrategy<SerializablePairLongDouble>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a static ObjectStrategy since it is stateless. It seems right now we are creating a new object for every deserialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -36,11 +36,16 @@ | |||
private static final int NULL_VALUE = -1; | |||
|
|||
/** | |||
* Returns whether a given value selector *might* contain SerializablePairLongString objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok. so you will make the changes in that PR. is that right?
@FrankChen021 - have you run any performance tests/benchmark for ingestion time rollup? That will be handy to rule out any perf bug. |
I have not. But I will. Thanks for pointing out this. |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions. |
@FrankChen021 - This is a useful capability. would you be able to take this to completion? Perf tests will be nice but they need not block this PR. |
This issue is no longer marked as stale. |
Hi @abhishekagarwal87 , the changes in this PR was previously used in our team. I'm not working on that project now, so I don't have large chunk of time to resolve the conflicts. Also SQL functions are not supported on rollup first/last column, this is also a restriction that needs to be resolved. (See comment above #10949 (comment)) At the time this PR was created, it was not able to do that. So we use native query to bypass this problem. It would be great if you can pick up this PR. |
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
Fixes #10702
Description
This PR fixes #10702 by adding support to doubleFirst/floatFirst/longFirst and doubleLast/floatLast/longLast during ingestion phase. And also reverts #10794 to bring back the UI.
The implementation is inspired by current stringFirst/stringLast implementation, so the code looks like similar. But this PR does not refactor current stringFirst/stringLast implementation to share the code with double/float/long. That might be done in the future.
Key changed/added classes in this PR
AbstractSerializableLongObjectPairSerde
is provided to share serialization code for type of long/double/floatGenericFirstAggregateCombiner
is provided to share first aggregator code for type of long/double/floatGenericLastAggregateCombiner
is provided to share last aggregator code for type of long/double/floatWhat's not included in this PR
stringFirst
/stringLast
should also share the three base classes listed above, I will open a new PR to do this to keep changes in this PR as less as possible.Test Scenario
This PR contains UT and IT cases to cover all
doubleFirst
/doubleLast
,floatFirst
/floatLast
,longFirst
/longLast
aggregators, including:This PR has: