Add experimental support for first/last for double/float/long #10702 #14462

ankit0811 · 2023-06-21T21:58:05Z

Description

This PR revives #10949 which address issue #10702 by adding support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.

We now support Numeric Last/First data types while ingestion and native query (for MSQ as well)

Ingestion

using spec we do support aggregation behavior for Double/Float and Long last/first column types.
this is still pending change on the UI to be supported (reverted PR) which will be taken part as a follow-up PR

Query

Querying this data type is not yet supported in SQL as we do require some calcite changes (tobe done in a follow-up PR)
We can use native json spec query to obtain teh numeric Last and first value from the ingested column
usage:

{
      "type": "doubleLast",
      "name": "a3",
      "fieldName": "last_double_added", // while ingesting this column is of type DoubleLast
      "timeColumn": "__time"
}

Release note

Key changed/added classes in this PR

AbstractSerializableLongObjectPairSerde abstract class to share serde between double/float/long
AbstractSerializablePairLongObjectBufferStore.java
AbstractSerializablePairLongObjectColumnHeader.java
AbstractSerializablePairLongObjectColumnSerializer.java
AbstractSerializablePairLongObjectDeltaEncodedStagedSerde.java
AbstractSerializablePairLongObjectSimpleStagedSerde.java
SerializablePairLongDoubleComplexMetricSerde for double
SerializablePairLongFloatComplexMetricSerde for float
SerializablePairLongStringComplexMetricSerde for long
GenericFirstAggregateCombiner first agg combiner to share between double/float/long
GenericLastAggregateCombiner first agg combiner to share between double/float/long
DoubleFirstAggregatorFactory, DoubleLastAggregatorFactory
FloatFirstAggregatorFactory, FloatLastAggregatorFactory
LongFirstAggregatorFactory, LongLastAggregatorFactory

Further introduces new ColumnType

serializablePairLongDouble
serializablePairLongFloat
serializablePairLongString

Post suggestions, we are no longer using GenericIndexed. Using the pattern defined in SerializablePairLongStringComplexMetricSerde we observed an improvement of 40% in terms of storage (These number are against ingestion of 1 days worth of wikiticker data)

To-Do
These will be taken up in a follow-up PR

Add ingestion support from UI for first/last types
SQL compatibility for querying (latest/earliest) the new ingestion type (Today this is queryable via native json queries not via sql. Needs more discussion on making this SQL compatible)

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

processing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstAggregator.java

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstBufferAggregator.java

processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastAggregator.java

...ssing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastBufferAggregator.java

...rc/main/java/org/apache/druid/query/aggregation/AbstractSerializableLongObjectPairSerde.java

abhishekagarwal87 · 2023-06-22T14:28:01Z

Thank you @ankit0811 for reviving this PR. I will review this soon.

ankit0811 · 2023-06-22T15:11:26Z

@abhishekagarwal87 I missed adding the fold logic in NumericLastVectorAggregator::aggregator
Will update the PR soon

LakshSingla · 2023-06-28T04:49:09Z

Thanks @ankit0811 for the PR!
This will also allow MSQ to use these aggregators for ingestion and querying. We should update the MSQ's limitations and the CalciteQueryTests ignored by MSQ once this PR is merged in a follow-up PR (after testing).

LakshSingla

Thanks for raising this PR!! I am relatively unfamiliar with the complex column's serde and am still going through the implementation, however, I have a few high-level comments, based on my understanding of the StringFirst/Last aggregators:

In the deserializeColumn, we should allocate the first byte for a version number. This would help if we decide to update the strategy down the line and want backward compatibility.

The current StringFirst/Last comparators use delta encoding and compression on the numeric column to reduce the size. Also, the PR (Improve String Last/First Storage Efficiency #12879) which introduced this change also added a few classes which help in this. I suppose that we would want it sometime. The versioning as mentioned above would help in making these changes if we decide that we don't want it now. However, the complexity of changing from one version to another is high and we require backward compatibility therefore we should question if that should be done in the original PR itself (check this comment:

druid/processing/src/main/java/org/apache/druid/query/aggregation/SerializablePairLongStringComplexMetricSerde.java

Lines 54 to 66 in d63eff3

    
             /** 
        
              * This is a configuration parameter to allow for turning on compression.  It is a hack, it would be significantly 
        
              * better if this could be delivered via properties.  The number one reason this is a hack is because it reads 
        
              * the System.getProperty which doesn't actually have runtime.properties files put into it, so this setting 
        
              * could be set in runtime.properties and this code wouldn't see it, because that's not how it is wired up. 
        
              * 
        
              * The intent of this parameter is so that Druid 25 can be released using the legacy serialization format. This 
        
              * will allow us to get code released that can *read* both the legacy and the new format.  Then, in Druid 26, 
        
              * we can completely eliminate this boolean and start to only *write* the new format, in which case this 
        
              * hack of a configuration property disappears. 
        
              */ 
        
             private static final boolean COMPRESSION_ENABLED = 
        
                 Boolean.parseBoolean(System.getProperty("druid.columns.pairLongString.compressed", "false"));

)

Can we refactor the preexisting LongString serde classes to be subclasses of the newly introduced AbstractSerializableLongObjectPairSerde? I see that there are some similarities in the methods. In any case, it would be helpful if all of them are under the same umbrella, and we can further categorize the newly added classes as children of AbstractSerializableLongNumbericPairSerde

somu-imply

Thanks for this PR. I am still going over this and will add my comments as I go over the rest.

...in/java/org/apache/druid/query/aggregation/SerializablePairLongDoubleComplexMetricSerde.java

processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregator.java

somu-imply · 2023-07-13T16:23:41Z

processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregator.java

 {
  double firstValue;

-  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, BaseDoubleColumnValueSelector valueSelector)
+  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, ColumnValueSelector valueSelector, boolean needsFoldCheck)


Does double first need a fold check ?

I believe so. Let me know if I am mistaken

@somu-imply I am not completely aware of the foldCheck, which implementation of the selectors produces results that cannot be optimized? Also, is there any particular reason that double won't require the check, since it seems analogous to what StringFirst et al. are doing?

Still confused as to why doubles wouldn't require a folds check.

It should require a folds check in case of rollups I think.

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

ankit0811 · 2023-07-18T04:56:44Z

Thanks for raising this PR!! I am relatively unfamiliar with the complex column's serde and am still going through the implementation, however, I have a few high-level comments, based on my understanding of the StringFirst/Last aggregators:

In the deserializeColumn, we should allocate the first byte for a version number. This would help if we decide to update the strategy down the line and want backward compatibility.

The current StringFirst/Last comparators use delta encoding and compression on the numeric column to reduce the size. Also, the PR (Improve String Last/First Storage Efficiency #12879) which introduced this change also added a few classes which help in this. I suppose that we would want it sometime. The versioning as mentioned above would help in making these changes if we decide that we don't want it now. However, the complexity of changing from one version to another is high and we require backward compatibility therefore we should question if that should be done in the original PR itself (check this comment:

druid/processing/src/main/java/org/apache/druid/query/aggregation/SerializablePairLongStringComplexMetricSerde.java

Lines 54 to 66 in d63eff3

/**

* This is a configuration parameter to allow for turning on compression. It is a hack, it would be significantly

* better if this could be delivered via properties. The number one reason this is a hack is because it reads

* the System.getProperty which doesn't actually have runtime.properties files put into it, so this setting

* could be set in runtime.properties and this code wouldn't see it, because that's not how it is wired up.

*

* The intent of this parameter is so that Druid 25 can be released using the legacy serialization format. This

* will allow us to get code released that can *read* both the legacy and the new format. Then, in Druid 26,

* we can completely eliminate this boolean and start to only *write* the new format, in which case this

* hack of a configuration property disappears.

*/

private static final boolean COMPRESSION_ENABLED =

Boolean.parseBoolean(System.getProperty("druid.columns.pairLongString.compressed", "false"));

)

Can we refactor the preexisting LongString serde classes to be subclasses of the newly introduced AbstractSerializableLongObjectPairSerde? I see that there are some similarities in the methods. In any case, it would be helpful if all of them are under the same umbrella, and we can further categorize the newly added classes as children of AbstractSerializableLongNumbericPairSerde

Best if we take this in a separate PR? else it will be too big a PR to be reviewed. It already touches quite a few classes

LakshSingla · 2023-07-19T03:17:21Z

Best if we take this in a separate PR? else it will be too big a PR to be reviewed. It already touches quite a few classes

Seems reasonable to me, we can mark this as an improvement once we get the functionality in.

...sing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java

...ssing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastBufferAggregator.java

…_DoubleFLoatLong

LakshSingla · 2023-08-03T06:31:04Z

Thanks for resolving the merge conflicts and the comments. I'll test it from MSQ's perspective and check if it works fine with MSQ queries in a couple of days.

LakshSingla

I pulled in the patch and tested the changes out for MSQ, and the queries are now working, which didn't due to lack of serde, therefore confirming that this indeed unblocks EARLIEST and LATEST on MSQ 🚀

Leaving a final set of review comments from my side. Thanks for being patient and receptive during the review process :)

...in/java/org/apache/druid/query/aggregation/SerializablePairLongDoubleComplexMetricSerde.java

...rc/main/java/org/apache/druid/query/aggregation/AbstractSerializableLongObjectPairSerde.java

LakshSingla · 2023-08-08T18:43:05Z

...ain/java/org/apache/druid/query/aggregation/SerializablePairLongFloatComplexMetricSerde.java

+          return new byte[]{};
+        }
+
+        ByteBuffer bbuf = ByteBuffer.allocate(Long.BYTES + Byte.BYTES + Float.BYTES);


Similar comment as above, in case it is null, we should avoid allocating the space for Float.BYTES. (Should apply for the remaining factories as well)

processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java

imply-cheddar · 2023-08-08T20:28:07Z

The ComplexMetricSerde objects in this PR are all using the GenericIndexed method of doing things. That creates really large columns.

There is a better way that the stringLast/First aggregators already use, can we please adjust the ComplexMetricSerde implementations to align with the pattern followed in SerializablePairLongStringComplexMetricSerde instead? Specifically in how that implements getSerializer() and deserializeColumn(). There are code paths for compatibility that use the GenericIndexed stuff, the pattern that we want to follow is the ones that use SerializableLongPairStringColumnSerializer and SerializablePairLongStringComplexColumn.Builder respectively.

imply-cheddar · 2023-08-08T20:55:00Z

Generally speaking, I don't believe that we should merge the code with the GenericIndexed method of generating the columns. That's going to create the need to always support that. We have a better way to store things (the way that the String versions do it) and we shoudl follow those instead. If we really want to do these in two PRs, I'd suggest leaving this PR as is and create another PR on top of it to switch the serde around and then merge them in quick succession.

clintropolis · 2023-08-08T21:23:57Z

There is a better way that the stringLast/First aggregators already use, can we please adjust the ComplexMetricSerde implementations to align with the pattern followed in SerializablePairLongStringComplexMetricSerde instead? Specifically in how that implements getSerializer() and deserializeColumn(). There are code paths for compatibility that use the GenericIndexed stuff, the pattern that we want to follow is the ones that use SerializableLongPairStringColumnSerializer and SerializablePairLongStringComplexColumn.Builder respectively.

Random drive-by comment, but I don't really think that is the best pattern to use here either though, since the pairs here are fixed width and so a much more appropriate way of compressing them that doesn't involve storing offsets separately could be used...

ankit0811 · 2023-08-22T21:11:28Z

@imply-cheddar a qq for clarifications since this part of code is new to me

the current SerializablePairLongStringComplexColumn.Builder has a version check. I am assuming this is not required for first/last numeric type implementation. is the assumption correct here?

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+      if (foldNeeded) {
+        final SerializablePair<Long, Number> inPair = (SerializablePair<Long, Number>) objectsWhichMightBeNumeric[row];
+        if (useDefault || inPair != null) {
+          if (inPair.lhs != null && inPair.lhs < firstTime) {


...ssing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java

+      if (foldNeeded) {
+        final SerializablePair<Long, Number> inPair = (SerializablePair<Long, Number>) objectsWhichMightBeNumeric[row];
+        if (useDefault || inPair != null) {
+          if (inPair.lhs != null && inPair.lhs >= lastTime) {


daniel-imgarena · 2023-10-24T18:04:58Z

Will we see this in Druid 28? I see myself back on the need for a doubleLast or longFirst for way too long

LakshSingla · 2023-10-25T17:16:53Z

@daniel-imgarena The branch for Druid 28 has been cut, and only release blockers & regressions are allowed at this point. Unfortunately, we won't be seeing this in the 28 release, since it's a new feature.

LakshSingla

Leaving a partial review as of now, but there are two main comments, apart from the line items:

A lot of the abstract methods for serde & aggregators can be merged with the string's methods without any tweaking (I suppose), since they don't rely on the numeric properties of these aggregators. We should do that, otherwise, there will be a lot of code duplication. Also, one would need to keep track of both the classes (abstract ones & the string ones) while fixing bugs or making any improvements.
Thanks for taking the time to rewrite without using the GenericIndexed. I am curious if there's any benchmarking that you did or if there was a performance/size benefit that you observed from the change.

processing/src/main/java/org/apache/druid/query/aggregation/first/FirstLastUtils.java

LakshSingla · 2023-11-14T10:19:04Z

processing/src/main/java/org/apache/druid/query/aggregation/first/FirstLastUtils.java

+  }
+
+  /**
+   * Returns whether an object *might* contain SerializablePairLongString objects.


Outdated javadoc

It is still outdated

processing/src/main/java/org/apache/druid/query/aggregation/first/FirstLastUtils.java

LakshSingla · 2023-11-15T04:19:05Z

processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregator.java

 {
  double firstValue;

-  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, BaseDoubleColumnValueSelector valueSelector)
+  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, ColumnValueSelector valueSelector, boolean needsFoldCheck)


Still confused as to why doubles wouldn't require a folds check.

...ing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregatorFactory.java

...main/java/org/apache/druid/query/aggregation/SerializablePairLongStringColumnSerializer.java

LakshSingla · 2023-11-15T05:02:51Z

...n/java/org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectBufferStore.java

The String methods could also inherit from this class I suppose.

...in/java/org/apache/druid/query/aggregation/SerializablePairLongDoubleComplexMetricSerde.java

LakshSingla · 2023-11-15T05:40:18Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+    if (capabilities != null) {
+      return new DoubleFirstVectorAggregator(timeSelector, vSelector);


Why is this check required? I don't think we are using the capabilities inside the aggregator

LakshSingla · 2023-11-15T05:43:09Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+      VectorObjectSelector objectSelector = ExpressionVectorSelectors.castValueSelectorToObject(
+          columnSelectorFactory.getReadableVectorInspector(),
+          fieldName,
+          valueSelector,
+          capabilities.toColumnType(),
+          ColumnType.DOUBLE
+      );
+      return new DoubleFirstVectorAggregator(timeSelector, objectSelector);
+    }


I am not sure if this is correct. If the type is numeric, why do we need an explicit cast to Double. I could be wrong, but this seems kinda suspicious.

This is the same code flow as StringFirstAggregatorFactory.

With the string factory, we'd need to check if it is of a numeric type and cast if so. But if it is a numeric, then it shouldn't be the case, at least for DOUBLE objects. Also, if it is a String aggregator, now we return a DoubleFirstVectorAggregator (if capabilities correspond to String capabilities).

LakshSingla · 2023-11-15T05:47:23Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

@@ -164,7 +215,7 @@ void updateTimeWithNull(ByteBuffer buf, int position, long time)
   * Abstract function which needs to be overridden by subclasses to set the
   * latest value in the buffer depending on the datatype
   */
-  abstract void putValue(ByteBuffer buf, int position, int index);
+  abstract void putValue(ByteBuffer buf, int position, Number number);


Let's revert this change if we can. It will lead to some duplication, however, this will lead to the boxing of the primitives, which is what the column selectors aim to reduce (if they can). Therefore there are separate methods like isNull + getLong which are preferred instead of getObject because the former doesn't auto-box the primitive variable to an object.

LakshSingla · 2023-11-15T05:48:17Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

   */
-  void updateTimeWithValue(ByteBuffer buf, int position, long time, int index)
+  void updateTimeWithValue(ByteBuffer buf, int position, long time, Number number)


Autoboxing alert, as mentioned in the other comment. Let's see if we can get away without this change.

It might not be as clean as the current code, however, there shouldn't be a lot of duplication.

…_DoubleFLoatLong

LakshSingla · 2023-12-07T11:41:25Z

...sing/src/main/java/org/apache/druid/query/aggregation/last/GenericLastAggregateCombiner.java

-    if (StringFirstAggregatorFactory.TIME_COMPARATOR.compare(lastValue, newValue) < 0) {
-      lastValue = (SerializablePairLongString) selector.getObject();
+    T newValue = (T) selector.getObject();
+    if (Longs.compare(lastValue.lhs, newValue.lhs) <= 0) {


Why have we converted it from '<' to '<='? If unintentional, let's keep it the same way. Although the logic is still sound, it can lead to changes in the results of the queries of the users who have been using the LATEST function.

LakshSingla · 2023-12-07T21:52:08Z

.../java/org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectColumnHeader.java

+
+public abstract class AbstractSerializablePairLongObjectColumnHeader<T extends SerializablePair<Long, ?>>
+{
+  private static final int HEADER_SIZE_BYTES = 4;


There's a missing comment here from the original code.

LakshSingla · 2023-12-07T21:55:06Z

...a/org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectColumnSerializer.java

+import java.io.IOException;
+import java.nio.channels.WritableByteChannel;
+
+public abstract class AbstractSerializablePairLongObjectColumnSerializer<T extends SerializablePair<Long, ?>> implements


There are missing Javadocs from the original code. I do think that they should be migrated here.

LakshSingla · 2023-12-07T21:59:03Z

...pache/druid/query/aggregation/AbstractSerializablePairLongObjectDeltaEncodedStagedSerde.java

+import java.nio.ByteBuffer;
+import java.util.Locale;
+
+public abstract class AbstractSerializablePairLongObjectDeltaEncodedStagedSerde<T extends SerializablePair<Long, ?>> implements


nit: Javadocs missing

.../org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectSimpleStagedSerde.java

processing/src/main/java/org/apache/druid/query/aggregation/first/FirstLastUtils.java

LakshSingla · 2023-12-07T22:26:59Z

processing/src/main/java/org/apache/druid/query/aggregation/first/FirstLastUtils.java

+  }
+
+  /**
+   * Returns whether an object *might* contain SerializablePairLongString objects.


It is still outdated

LakshSingla · 2023-12-07T22:33:33Z

processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregator.java

 {
  double firstValue;

-  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, BaseDoubleColumnValueSelector valueSelector)
+  public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, ColumnValueSelector valueSelector, boolean needsFoldCheck)


It should require a folds check in case of rollups I think.

...main/java/org/apache/druid/query/aggregation/SerializablePairLongLongComplexMetricSerde.java

...ain/java/org/apache/druid/query/aggregation/SerializablePairLongFloatComplexMetricSerde.java

.../org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectSimpleStagedSerde.java

LakshSingla · 2023-12-08T05:48:52Z

.../org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectSimpleStagedSerde.java

+            rhsBytes = Float.BYTES;
+          }
+        }
+        return Long.BYTES + Byte.BYTES + rhsBytes;


This computation is incorrect. For case when rhsObject is null, we'd only be storing Long.BYTES + Byte.BYTES, so this should be conditional.

Suggested change

return Long.BYTES + Byte.BYTES + rhsBytes;

return Long.BYTES + Byte.BYTES + (rhsObject == null ? 0 : rhsBytes);

rhsBytes is assigned to 0 and is assigned value only if rhsBytes is non null. So isn't it already conditional?

LakshSingla · 2023-12-08T05:49:25Z

...pache/druid/query/aggregation/AbstractSerializablePairLongObjectDeltaEncodedStagedSerde.java

+          }
+        }
+
+        return (useIntegerDelta ? Integer.BYTES : Long.BYTES) + Byte.BYTES + rhsBytes;


This should be conditional on whether the rhsObject is null or not.

Suggested change

return (useIntegerDelta ? Integer.BYTES : Long.BYTES) + Byte.BYTES + rhsBytes;

return (useIntegerDelta ? Integer.BYTES : Long.BYTES) + Byte.BYTES + (rhsObject == null ? 0 : rhsBytes);

ankit0811 · 2023-12-08T06:14:41Z

@LakshSingla thank you so much for reviewing this PR and being patient throughout the process. Have tried to address your comments let me know if theres any thing more left

LakshSingla · 2023-12-11T09:56:57Z

...src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java

      }
    };
-    selector = new BaseDoubleVectorValueSelector(new NoFilterVectorOffset(VALUES.length, 0, VALUES.length)
+    selector = new VectorObjectSelector()


Curious if this change is required?

Yes. It will be required as the selector can now be of either VectorObject (pair) or VectorValue (numeric).

Thanks for the explanation

LakshSingla · 2023-12-11T10:25:19Z

Thanks for the patch @ankit0811.

I think there's some follow up that can be done, in this area

Make sure string pairs work as expected with null & empty strings - This has been a regression before the patch, and I have raised a PR for the same
Make the EARLIEST and the LATEST work with SQL
Add tests for MSQ
Look for some cleaner abstractions

LakshSingla · 2023-12-11T10:26:13Z

Can you please update the description with the release note

LakshSingla · 2023-12-12T05:59:08Z

I still believe that we can clean up the code a bit further if we get rid of the SerializablePairLongString abstractions and keep them as generics, however, I don't know what entails such a change. That would allow the reuse of multiple classes. However, that will be taken in a follow up PR, so its a go-ahead from my side.

cryptoe · 2023-12-12T06:05:29Z

SInce we have identified the caveats, we can merge this and work on it in a follow up PR.
Thanks @ankit0811 and @LakshSingla for the PR and reviews.

github-advanced-security bot found potential problems Jun 21, 2023

View reviewed changes

Add support for first/last for double/float/long apache#10702

32bca40

ankit0811 force-pushed the feature_FirstLast_DoubleFLoatLong branch from 7a39e3e to 32bca40 Compare June 22, 2023 07:47

LakshSingla reviewed Jul 7, 2023

View reviewed changes

somu-imply reviewed Jul 13, 2023

View reviewed changes

LakshSingla reviewed Aug 1, 2023

View reviewed changes

...sing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java Outdated Show resolved Hide resolved

somu-imply reviewed Aug 2, 2023

View reviewed changes

...ssing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastBufferAggregator.java Outdated Show resolved Hide resolved

Addressing comments

e35a906

ankit0811 force-pushed the feature_FirstLast_DoubleFLoatLong branch from daabc4e to e35a906 Compare August 2, 2023 19:35

Ankit Kothari and others added 2 commits August 2, 2023 12:39

Merge remote-tracking branch 'upstream/master' into feature_FirstLast…

b309b2d

…_DoubleFLoatLong

static check jdk-17 fix

5389438

update index/query for tests-ex/IT cases

90fd6e7

LakshSingla added Feature Release Notes Area - Segment Format and Ser/De Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Aug 4, 2023

fixing batch IT failure

8fc6276

LakshSingla reviewed Aug 8, 2023

View reviewed changes

clintropolis mentioned this pull request Aug 18, 2023

remove group-by v1 #14866

Merged

10 tasks

github-advanced-security bot found potential problems Oct 12, 2023

View reviewed changes

ankit0811 requested a review from LakshSingla October 17, 2023 20:14

ankit0811 requested a review from somu-imply October 25, 2023 05:40

cryptoe added the Design Review label Nov 2, 2023

LakshSingla reviewed Nov 15, 2023

View reviewed changes

...in/java/org/apache/druid/query/aggregation/SerializablePairLongDoubleComplexMetricSerde.java Show resolved Hide resolved

LakshSingla reviewed Nov 15, 2023

View reviewed changes

ankit0811 and others added 4 commits November 30, 2023 23:50

address comments

c4fa03e

address comments + fix failure checks

e3cc206

intellij-inspections failure fix

edd7a88

Merge remote-tracking branch 'upstream/master' into feature_FirstLast…

17a74dc

…_DoubleFLoatLong

LakshSingla reviewed Dec 7, 2023

View reviewed changes

LakshSingla reviewed Dec 8, 2023

View reviewed changes

.../org/apache/druid/query/aggregation/AbstractSerializablePairLongObjectSimpleStagedSerde.java Outdated Show resolved Hide resolved

LakshSingla reviewed Dec 8, 2023

View reviewed changes

addressing comments + extending LongString to the abstract classes

3397717

LakshSingla approved these changes Dec 11, 2023

View reviewed changes

cryptoe approved these changes Dec 12, 2023

View reviewed changes

cryptoe merged commit 8735d02 into apache:master Dec 12, 2023
86 checks passed

cryptoe changed the title ~~Add support for first/last for double/float/long #10702~~ Add experimental support for first/last for double/float/long #10702 Dec 12, 2023

ankit0811 mentioned this pull request Dec 29, 2023

Add sql + ingestion compatibility for first/last on numeric values #15607

Merged

10 tasks

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

LakshSingla mentioned this pull request Feb 13, 2024

[DRAFT] 29.0.0 release notes #15896

Closed

	/**
	* This is a configuration parameter to allow for turning on compression. It is a hack, it would be significantly
	* better if this could be delivered via properties. The number one reason this is a hack is because it reads
	* the System.getProperty which doesn't actually have runtime.properties files put into it, so this setting
	* could be set in runtime.properties and this code wouldn't see it, because that's not how it is wired up.
	*
	* The intent of this parameter is so that Druid 25 can be released using the legacy serialization format. This
	* will allow us to get code released that can read both the legacy and the new format. Then, in Druid 26,
	* we can completely eliminate this boolean and start to only write the new format, in which case this
	* hack of a configuration property disappears.
	*/
	private static final boolean COMPRESSION_ENABLED =
	Boolean.parseBoolean(System.getProperty("druid.columns.pairLongString.compressed", "false"));

		if (capabilities != null) {
		return new DoubleFirstVectorAggregator(timeSelector, vSelector);

	return Long.BYTES + Byte.BYTES + rhsBytes;
	return Long.BYTES + Byte.BYTES + (rhsObject == null ? 0 : rhsBytes);

	return (useIntegerDelta ? Integer.BYTES : Long.BYTES) + Byte.BYTES + rhsBytes;
	return (useIntegerDelta ? Integer.BYTES : Long.BYTES) + Byte.BYTES + (rhsObject == null ? 0 : rhsBytes);

Add experimental support for first/last for double/float/long #10702 #14462

Add experimental support for first/last for double/float/long #10702 #14462

Conversation

ankit0811 commented Jun 21, 2023 • edited Loading

Description

Ingestion

Query

Release note

Key changed/added classes in this PR

abhishekagarwal87 commented Jun 22, 2023

ankit0811 commented Jun 22, 2023

LakshSingla commented Jun 28, 2023 • edited Loading

LakshSingla left a comment • edited Loading

Choose a reason for hiding this comment

somu-imply left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankit0811 commented Jul 18, 2023

LakshSingla commented Jul 19, 2023

LakshSingla commented Aug 3, 2023

LakshSingla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imply-cheddar commented Aug 8, 2023 • edited Loading

imply-cheddar commented Aug 8, 2023

clintropolis commented Aug 8, 2023

ankit0811 commented Aug 22, 2023 • edited Loading

daniel-imgarena commented Oct 24, 2023

LakshSingla commented Oct 25, 2023

LakshSingla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankit0811 Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankit0811 commented Dec 8, 2023

Choose a reason for hiding this comment

ankit0811 Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla commented Dec 11, 2023

LakshSingla commented Dec 11, 2023 • edited Loading

LakshSingla commented Dec 12, 2023

cryptoe commented Dec 12, 2023

ankit0811 commented Jun 21, 2023 •

edited

Loading

LakshSingla commented Jun 28, 2023 •

edited

Loading

LakshSingla left a comment •

edited

Loading

imply-cheddar commented Aug 8, 2023 •

edited

Loading

ankit0811 commented Aug 22, 2023 •

edited

Loading

ankit0811 Dec 8, 2023 •

edited

Loading

ankit0811 Dec 12, 2023 •

edited

Loading

LakshSingla commented Dec 11, 2023 •

edited

Loading