Sql Single Value Aggregator for scalar queries #15700

sreemanamala · 2024-01-17T06:17:13Z

Description

Added the Single Value Aggregator functionality for scalar queries in group by queries

Fixed the bug ...

Executing single value correlated queries will throw an exception today since single_value function is not available in druid.
With these added classes, this provides druid, the capability to plan and run such queries.

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Supporting Single Value aggregated group by queries for scalars

Key changed/added classes in this PR

SingleValueSqlAggregator
SingleValueAggregatorFactory, SingleValueBufferAggregator, SingleValueAggregator
AggregatorsModule, AggregatorUtil
DruidOperatorTable
CalciteSingleValueAggregatorTest

This PR has:

soumyava · 2024-01-17T06:34:51Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidOperatorTable.java

-import org.apache.druid.sql.calcite.aggregation.builtin.StringSqlAggregator;
-import org.apache.druid.sql.calcite.aggregation.builtin.SumSqlAggregator;
-import org.apache.druid.sql.calcite.aggregation.builtin.SumZeroSqlAggregator;
+import org.apache.druid.sql.calcite.aggregation.builtin.*;


I'm a bit against including it as * and include only those libraries that we need, in future if we come up with another Agg that we do not use this wont be correct

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java

...essing/src/main/java/org/apache/druid/query/aggregation/SingleValueLongBufferAggregator.java

LakshSingla · 2024-01-18T11:55:26Z

Thanks for the first PR to Druid! 🚀

A few high level comments with regards to this

What's the behaviour if the underlying relation has 0 rows. Shoud it throw an exception, replace it with null, or replace it with some dummy value (0.0 for floats, 0 for longs etc). We'd have to see what the SQL standard is or what other DBs commonly do and model the aggregation on top of it.
String and Complex types need to be modeled as well. For string types, it can be invoked if used with aggregations like STRLEN

SELECT  
count(*)
 FROM "wiki"
 where 
  col1  >= STRLEN(SELECT dim1 FROM single_row_relation)

and with complex types, it can be invoked with functions performing finalization on the complex types, or something like PAIR_LEFT (not yet added into Druid though). However the complex use case seems far-fetched enough, that we should be fine with saying we don't support such a use case. The Calcite (or the user) can probably optimize and pushdown the aggregation as well.
3. Can we implement vector versions of the aggregator as well?

somu-imply

Thanks for the contribution Sree. Might be simple enough to add the vector aggs as this is on a single value. There's also a failure in static checks that needs to be addressed

somu-imply · 2024-01-18T12:10:25Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueLongAggregator.java

+{
+  final BaseLongColumnValueSelector valueSelector;
+
+  Long value;


We can make these final, similar for the other classes

somu-imply · 2024-01-18T12:12:04Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteSingleValueAggregatorTest.java

+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT count(*) FROM foo where m1 >= (select max(m1) - 4 from foo)",


Let's add more examples when these are string functions and also other aggs like count etc.

somu-imply · 2024-01-18T12:12:56Z

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java

+  )
+  {
+    if (aggregateCall.getArgList().size() > 1) {
+      throw DruidException.defensive(


If this is visible to user, defensive might not be the best idea

clintropolis · 2024-01-18T12:59:15Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueDoubleAggregator.java

+  public SingleValueDoubleAggregator(BaseDoubleColumnValueSelector valueSelector)
+  {
+    this.valueSelector = valueSelector;
+    this.value = valueSelector.getDouble();


i think numeric selectors probably need to check isNull() and set the value to null if true

clintropolis · 2024-01-18T13:05:14Z

...ing/src/main/java/org/apache/druid/query/aggregation/SingleValueDoubleAggregatorFactory.java

+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(getFieldName());


instead of a separate aggregator factory for each type, why not call metricFactory.getColumnCapabilities(getFieldName()) and pick the right type of aggregator.

even further, do we even need aggs/buffer aggs for each type? Given that the typed aggs all appear to be storing values as Objects instead of java numeric primitives, why not just have a single Aggregator and BufferAggregator implementation. For the BufferAggregator, you could pass the ColumnType converted from getColumnCapabilities and use .getNullableStrategy() to write and read the value from the buffer.

This seems like a lot of classes for something that is basically a value holder...

Thanks for pointing out the NullableTypeStrategy. I always forget that such a contract exists (even in #15559 😞 ), and misguided the author into thinking that we don't have such a contract, and it needs to be inferred on per type basis.

clintropolis · 2024-01-18T13:07:31Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteSingleValueAggregatorTest.java

+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.junit.Test;
+
+public class CalciteSingleValueAggregatorTest extends CalciteQueryTest


this should not extend CalciteQueryTest since that will run all of those tests too... it doesn't seem like it needs to be its own test file in the first place, but if it does for some reason, it should extend BaseCalciteQueryTest

clintropolis · 2024-01-18T13:10:17Z

Can we implement vector versions of the aggregator as well?

I don't think this could possibly be vectorized, can it? since this can only call aggregate once, while aggregate method for vector aggs take multiple values, unless i'm misunderstanding something about how this is used

LakshSingla · 2024-01-18T16:09:42Z

I don't think this could possibly be vectorized, can it? since this can only call aggregate once, while aggregate method for vector aggs take multiple values, unless i'm misunderstanding something about how this is used

@clintropolis I don't have much insight into how the vector engines perform, however, I reasoned that it being a non-vector version only would prevent the query from being vectorized. While it won't gain any benefit from the vectorization, having a vectorized implementation would allow the rest of the query from being vectorized. Wdyt?

clintropolis · 2024-01-18T20:27:59Z

@clintropolis I don't have much insight into how the vector engines perform, however, I reasoned that it being a non-vector version only would prevent the query from being vectorized. While it won't gain any benefit from the vectorization, having a vectorized implementation would allow the rest of the query from being vectorized. Wdyt?

Ah yeah, i guess it doesn't hurt to implement, I was just making a drive by comment and not entirely sure of the context of this agg. But it seems like any query with this agg can only process a single row, so I wasn't sure it would be much benefit either since the main benefit of vectorization is processing batches of rows.

soumyava · 2024-01-24T05:00:16Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  @Override
+  public void aggregate()
+  {
+    if (isAggregateInvoked) {


if selector.isNull() we can just return.

a null row is still a row in the context of this agg i think so it needs to set isAggregateInvoked

clintropolis · 2024-01-24T21:06:07Z

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java

+    switch (aggregationType.getType()) {
+      case LONG:
+      case FLOAT:
+      case DOUBLE:
+      case STRING:
+        return new SingleValueAggregatoractory(name, fieldName, aggregationType);
+      default:
+        // This error refers to the Druid type. But, we're in SQL validation.
+        // It should refer to the SQL type.
+        throw SimpleSqlAggregator.badTypeException(fieldName, "SINGLE_VALUE", aggregationType);
+    }


do we actually need this check at all? Thinking of things like ARRAY_AGG which can aggregate an array of values...

clintropolis · 2024-01-24T21:07:46Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteSingleValueAggregatorTest.java

+import java.util.Arrays;
+import java.util.HashSet;
+
+public class CalciteSingleValueAggregatorTest extends BaseCalciteQueryTest


any reason not to just put these test cases in CalciteSubqueryTest?

Not particularly, will move them

clintropolis · 2024-01-24T23:49:13Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+    if (columnType.is(ValueType.STRING)) {
+      isNotNull = (selector.getObject() != null);
+    } else {
+      isNotNull = !selector.isNull();
+    }
+    if (isNotNull) {
+      if (buf.get(position) == NullHandling.IS_NULL_BYTE) {
+        buf.put(position, NullHandling.IS_NOT_NULL_BYTE);
+      }
+      updatevalue(buf, position + Byte.BYTES);
+    }


Part of the potentially attractive part about pushing the ColumnType in here is that we can also use the NullableTypeStrategy to implement aggregate and get methods.

I pulled your branch and tried this and it runs into some problems with floats ending up as doubles, though I think this is actually a problem with either the underlying expression selector (expressions don't support floats so I think this is likely the culprit) because testSingleValueFloatAgg fails, so we need a helper function to get the object from the underlying selector to ensure that the thing we get from the selector matches the columnType.

@Override public void aggregate(ByteBuffer buf, int position) { if (isAggregateInvoked) { throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row"); } int written = typeStrategy.write( buf, position, getSelectorObject(), SingleValueAggregatoractory.DEFAULT_MAX_STRING_SIZE ); if (written < 0) { throw InvalidInput.exception("Single Value Aggregator value too big for buffer"); } isAggregateInvoked = true; }

(invalid input might not be the actual right exception here for exceeding size limit, since its only a single row we might want to figure out how to pass this in, or make a much larger limit since the user isn't calling this function directly..)

@Nullable @Override public Object get(ByteBuffer buf, int position) { return typeStrategy.read(buf, position); }

@Nullable private Object getSelectorObject() { if (selector.isNull()) { return null; } switch (columnType.getType()) { case LONG: return selector.getLong(); case FLOAT: return selector.getFloat(); case DOUBLE: return selector.getDouble(); default: return selector.getObject(); } }

The getLong/getFloat/getDouble methods could also be updated, there are some static methods in TypeStrategies which can help with this isNullableNull, readNotNullNullableLong, etc.

would look more cleaner the way you pointed. will update it.

clintropolis · 2024-01-24T23:53:57Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatoractory.java

+import java.util.Objects;
+
+@JsonTypeName("singleValue")
+public class SingleValueAggregatoractory extends AggregatorFactory


typo, should be SingleValueAggregatorFactory instead of SingleValueAggregatoractory

oh! thanks for pointing.

clintropolis · 2024-01-24T23:54:55Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  @Override
+  public void aggregate()
+  {
+    if (isAggregateInvoked) {


a null row is still a row in the context of this agg i think so it needs to set isAggregateInvoked

clintropolis

overall lgtm 👍

clintropolis · 2024-01-30T10:11:08Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+/**
+ *
+ */


nit: can just delete if not filling out javadocs

clintropolis · 2024-01-30T10:11:41Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+@JsonTypeName("singleValue")
+public class SingleValueAggregatorFactory extends AggregatorFactory


this might be nice to add javadocs for since its primarily to support SQL planner and probably doesn't have many direct use cases

clintropolis · 2024-01-30T10:12:03Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    if (isNullResult) {
+      return null;
+    }
+    return value;


this can just return value directly?

clintropolis · 2024-01-30T10:13:22Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    if (isAggregateInvoked) {
+      throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row");
+    }
+    boolean isNotNull = !selector.isNull();


like in the buffer agg, this is probably only accurate for numeric primitive selectors, since typically things that call getObject do not check isNull. isNull is mainly used when you plan to call getLong or the like (since java primitives cannot be null, so they use this method instead).

You could potentially just always call getObject, or could have a method similar to the method for the buffer agg though not sure its as useful here since getObject usually always works (i think)

clintropolis · 2024-01-30T10:16:29Z

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java

+    switch (aggregationType.getType()) {
+      case LONG:
+      case FLOAT:
+      case DOUBLE:
+      case STRING:
+        return new SingleValueAggregatorFactory(name, fieldName, aggregationType);
+      default:
+        // This error refers to the Druid type. But, we're in SQL validation.
+        // It should refer to the SQL type.
+        throw SimpleSqlAggregator.badTypeException(fieldName, "SINGLE_VALUE", aggregationType);
+    }


i think i already asked this, but is this needed or can it just always make a SingleValueAggregatorFactory with any aggregationType?

clintropolis · 2024-01-30T10:19:49Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteSubqueryTest.java

+                      new QueryDataSource(GroupByQuery.builder()
+                                                      .setDataSource(new QueryDataSource(
+                                                          Druids.newTimeseriesQueryBuilder()
+                                                                .dataSource(CalciteTests.DATASOURCE1)
+                                                                .intervals(querySegmentSpec(Filtration.eternity()))
+                                                                .granularity(Granularities.ALL)
+                                                                .aggregators(new FloatMaxAggregatorFactory("a0", "m1"))
+                                                                .build()
+                                                      ))
+                                                      .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                      .setGranularity(Granularities.ALL)
+                                                      .setVirtualColumns(expressionVirtualColumn(
+                                                                             "v0",
+                                                                             "(\"a0\" - 3.5)",
+                                                                             ColumnType.DOUBLE
+                                                                         )
+                                                      )
+                                                      .setAggregatorSpecs(
+                                                          aggregators(
+                                                              new SingleValueAggregatorFactory(
+                                                                  "_a0",
+                                                                  "v0",
+                                                                  ColumnType.DOUBLE
+                                                              )
+                                                          )
+                                                      )
+                                                      .setLimitSpec(NoopLimitSpec.instance())
+                                                      .setContext(QUERY_CONTEXT_DEFAULT)
+                                                      .build()


hmm, this doesn't need to be part of this PR, but it seems like there is room for improvement in the planner here... Like in this case i think the planner should just collapse this subquery into a single timeseries query with an expression post-aggregator, instead of a group by on a timeseries

clintropolis · 2024-01-30T10:26:06Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  public void aggregate()
+  {
+    if (isAggregateInvoked) {
+      throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row");


Since this is unlikely to be used directly, I wonder if this error message should mention sub-queries? I was playing around with postgres and it returns something like
error: more than one row returned by a subquery used as an expression so i wonder if we should do something similar (probably don't copy it exactly :p). It might be nice to include the name of the input column. Be sure to update the buffer agg error if we change this too

clintropolis · 2024-01-30T10:29:06Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  public float getFloat(ByteBuffer buf, int position)
+  {
+    if (TypeStrategies.isNullableNull(buf, position)) {
+      throw new IllegalStateException("Cannot return float for Null Value");


nit: can use DruidException.defensive since this shouldn't happen in practice because callers should know to call isNull if values can be null (if it happens it is a coding error probably)

LakshSingla · 2024-02-02T06:52:28Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  Object value;
+
+  private boolean isNullResult = true;
+


nit: Spacing not needed

LakshSingla · 2024-02-02T06:56:39Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    if (isNullResult) {
+      throw DruidException.defensive("Cannot return double for Null Value");
+    }


Probably for SQL-incompatible behavior, let's just return the default value. It's the onus of the caller to call .isNull and then the .getDouble. I am wondering if any SQL-compatible path that just calls .getDouble without calling .isNull() can cause it to throw. Anyways, it will also prevent an additional check in this path. Same goes for other selectors.

LakshSingla · 2024-02-02T07:08:37Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    boolean isNotNull = (selector.getObject() != null);
+    if (isNotNull) {
+      isNullResult = false;
+      value = selector.getObject();


Should probably reuse the .getObject() called above, because for primitives that would cause autoboxing.

I was going to suggest using .isNull() check at the top instead, however, I don't think they might behave correctly for object selectors, therefore refraining from the suggestion.

LakshSingla · 2024-02-02T07:13:54Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteSubqueryTest.java

+  @Test
+  public void testSingleValueEmptyInnerAgg()
+  {
+    msqIncompatible();


Why msqIncompatible()? I think this test isn't subclassed by MSQ anyways, so we can get rid of these calls, however, I do expect it to be MSQ-compatible.

LakshSingla · 2024-02-02T07:14:47Z

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java

+
+import javax.annotation.Nullable;
+
+public class SingleValueSqlAggregator extends SimpleSqlAggregator


nit: Can add javadoc as to how this will be called by the SQL query, since its not something that the user will supply.

LakshSingla · 2024-02-02T07:30:24Z

processing/src/test/java/org/apache/druid/query/aggregation/SingleValueAggregationTest.java

+  public SingleValueAggregationTest() throws Exception
+  {
+    String longAggSpecJson = "{\"type\": \"singleValue\", \"name\": \"lng\", \"fieldName\": \"lngFld\", \"columnType\": \"LONG\"}";
+    longAggFatory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);


Suggested change

longAggFatory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);

longAggFactory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);

LakshSingla · 2024-02-02T07:30:40Z

processing/src/test/java/org/apache/druid/query/aggregation/SingleValueAggregationTest.java

+    doubleAggFatory = TestHelper.makeJsonMapper().readValue(doubleAggSpecJson, SingleValueAggregatorFactory.class);
+
+    String strAggSpecJson = "{\"type\": \"singleValue\", \"name\": \"str\", \"fieldName\": \"strFld\", \"columnType\": \"STRING\"}";
+    stringAggFatory = TestHelper.makeJsonMapper().readValue(strAggSpecJson, SingleValueAggregatorFactory.class);
+  }


nit: spelling

LakshSingla · 2024-02-02T07:31:42Z

processing/src/test/java/org/apache/druid/query/aggregation/SingleValueAggregationTest.java

+  @Before
+  public void setup()
+  {
+    NullHandling.initializeForTests();


nit: Instead of calling it in the setup(), we can subclass the test class with extends InitializedNullHandlingTest

LakshSingla · 2024-02-02T07:32:03Z

processing/src/test/java/org/apache/druid/query/aggregation/SingleValueAggregationTest.java

+/**
+  */


nit: cleanup

LakshSingla · 2024-02-02T07:38:41Z

processing/src/test/java/org/apache/druid/query/aggregation/SingleValueAggregationTest.java

+    EasyMock.expect(colSelectorFactoryLong.makeColumnValueSelector("lngFld")).andReturn(selectorLong);
+    EasyMock.expect(colSelectorFactoryLong.getColumnCapabilities("lngFld")).andReturn(columnCapabilitiesLong);
+
+    EasyMock.replay(columnCapabilitiesLong);
+    EasyMock.replay(colSelectorFactoryLong);
+
+    selectorDouble = new TestDoubleColumnSelectorImpl(doubleValues);
+    columnCapabilitiesDouble = EasyMock.createMock(ColumnCapabilities.class);
+    EasyMock.expect(columnCapabilitiesDouble.getType()).andReturn(ValueType.DOUBLE);
+
+    colSelectorFactoryDouble = EasyMock.createMock(ColumnSelectorFactory.class);
+    EasyMock.expect(colSelectorFactoryDouble.makeColumnValueSelector("dblFld")).andReturn(selectorDouble);
+    EasyMock.expect(colSelectorFactoryDouble.getColumnCapabilities("dblFld")).andReturn(columnCapabilitiesDouble);
+
+    EasyMock.replay(columnCapabilitiesDouble);
+    EasyMock.replay(colSelectorFactoryDouble);


There should be a cleaner way than mocking the selector factory and the column capabilities. For column capabilities, you can use the ColumnCapabilitiesImpl#createSimpleNumericColumnCapabilities et al
. I was going through the code to find any reusable class for the column selector factory and I found TestColumnSelectorFactory. Perhaps others can achieve this more cleanly.

LakshSingla · 2024-02-05T04:31:30Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    if (isNullResult) {
+      throw DruidException.defensive("Cannot return float for Null Value");
+    }
+    return (float) value;


Suggested change

return (float) value;

return ((Number) value).floatValue();

LakshSingla · 2024-02-06T07:12:05Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  final ColumnValueSelector selector;
+  @Nullable
+  Object value;


nit: private for consistency.

LakshSingla · 2024-02-06T07:18:36Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    boolean isNotNull = (selectorObject != null);
+    if (isNotNull) {
+      isNullResult = false;
+      value = selectorObject;
+    }


nit: Coding style:

Now that it's written this way, it seems redundant to have isNullResult and isNotNull.
Following seems a much cleaner way

Suggested change

boolean isNotNull = (selectorObject != null);

if (isNotNull) {

isNullResult = false;

value = selectorObject;

}

value = selector.getObject();

}

We don't use any null check here.

The method isNull will do something like:

public boolean isNull() { return value == null; }

Then the methods relying on the isNullResult can use isNull() instead.

public float getFloat() { return isNull() ? NullHandling.ZERO_FLOAT : ((Number) value).floatValue(); }

This will prevent the code from maintaining two variables denoting same information.

LakshSingla · 2024-02-06T07:19:05Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    return "SingleValueAggregator{" +
+           "selector=" + selector +
+           '}';


Shouldn't this also print the value and aggregateInvoked?

LakshSingla · 2024-02-06T07:20:03Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final ColumnType columnType;
+  public static final int DEFAULT_MAX_BUFFER_SIZE = 1025;


Why 1025, and not 1024? (Latter seems more "correct")

LakshSingla · 2024-02-06T07:22:19Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+    if (columnType.isNumeric()) {
+      return Byte.BYTES + Double.BYTES;
+    }
+    return DEFAULT_MAX_BUFFER_SIZE;


nit: Is this correct, for long values?

LakshSingla · 2024-02-06T07:33:10Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  final ColumnValueSelector selector;
+  final ColumnType columnType;
+  final NullableTypeStrategy typeStrategy;
+  private boolean isAggregateInvoked = false;


nit: mark private for consistency

LakshSingla · 2024-02-06T07:33:54Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  final NullableTypeStrategy typeStrategy;
+  private boolean isAggregateInvoked = false;
+
+  SingleValueBufferAggregator(ColumnValueSelector selector, ColumnType columnType)


public, since the SingleValueAggregator is also public scope

LakshSingla · 2024-02-06T07:34:15Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  {
+    this.selector = selector;
+    this.columnType = columnType;
+    this.typeStrategy = columnType.getNullableStrategy();


This would NPE if columnType is null, hence we should add the null check in the aggregator factory.

LakshSingla · 2024-02-06T07:39:08Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  @Override
+  public float getFloat(ByteBuffer buf, int position)
+  {
+    return TypeStrategies.isNullableNull(buf, position)


I wonder if two calls are required, since we already have the nullable type strategy, so perhaps we can read it once, and check if that is null. However, not important, since its called for a single row only.

LakshSingla · 2024-02-06T07:42:41Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  @Override
+  public long getLong()
+  {
+    return isNullResult ? NullHandling.ZERO_LONG : ((Number) value).longValue();


Let's add an assertion in the primitive selectors that if the mode is sql compatible, then the selector cannot be null.

Suggested change

return isNullResult ? NullHandling.ZERO_LONG : ((Number) value).longValue();

assert NullHandling.replaceWithDefault() || !isNull();

return isNull() ? NullHandling.ZERO_LONG : ((Number) value).longValue();

LakshSingla · 2024-02-07T04:27:43Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+public class SingleValueAggregatorFactory extends AggregatorFactory
+{
+  @JsonProperty
+  @JsonInclude


We don't need JsonInclude annotation

Suggested change

@JsonInclude

LakshSingla

Thanks for being accommodating with the reviews. Final thoughts on the PR.

LakshSingla · 2024-02-07T04:35:39Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+  {
+    ColumnValueSelector selector = metricFactory.makeColumnValueSelector(fieldName);
+    ColumnCapabilities columnCapabilities = metricFactory.getColumnCapabilities(fieldName);
+    Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of [%s]", fieldName);


null checks are better done using an if clause. Preconditions is a shorthand, we mostly use in the constructor. It doesn't play nice with the DruidExcption system, because now we don't know the target persona for the error message. Therefore, let's throw a DruidException here, because, now it will force us to think of the persona of the error message, and word it accordingly.
I think it should be aimed at devs only, because users/admins/operators won't know what to do if such an error occurs, and won't make sense for them.

Also, check out https://github.com/apache/druid/blob/master/dev/style-conventions.md#message-formatting-for-logs-and-exceptions. The message should make sense if all the extrapolation ([%s]) is removed. Therefore, the message should be

Suggested change

Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of [%s]", fieldName);

Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of field [%s]", fieldName);

LakshSingla · 2024-02-07T04:36:08Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java

+  /**
+   * Combine method would never be invoked as the broker sends the subquery to multiple segments
+   * and gather the results to a single value on which the single value aggregator is applied.
+   * Though getCombiningFactory would be invoked for understanding the fieldname.
+   */


sreemanamala · 2024-02-08T02:23:13Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+  public long getLong()
+  {
+    assert validObjectValue();
+    return (value == null) ? NullHandling.ZERO_LONG : ((Number) value).longValue();


@LakshSingla had to modify the check to value rather than relying on isNull(). Else it was causing NPE.
we are checking the validity in the assertion above.

LakshSingla · 2024-02-08T05:39:43Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregator.java

+    return NullHandling.sqlCompatible() && value == null;
+  }
+
+  private boolean validObjectValue()


nit:

Suggested change

private boolean validObjectValue()

private boolean validPrimitiveValue()

LakshSingla · 2024-02-08T05:42:25Z

processing/src/main/java/org/apache/druid/query/aggregation/SingleValueBufferAggregator.java

+  @Nullable
+  private Object getSelectorObject()
+  {
+    if (columnType.isNumeric() && selector.isNull()) {
+      return null;
+    }
+    switch (columnType.getType()) {
+      case LONG:
+        return selector.getLong();
+      case FLOAT:
+        return selector.getFloat();
+      case DOUBLE:
+        return selector.getDouble();
+      default:
+        return selector.getObject();
+    }
+  }


Since we are boxing it to Object, I guess we can remove this method and call selector.getObject() wherever this method is used.
The benefit of using .getLong() is that we don't need to auto box-unbox, whenever we are working with primitive selectors, however, that is getting lost in the translation here. Therefore, we can remove this method, since it is don't what selector.getObject() will be doing for primitive types anyways.

i was a bit less confident that all of the numeric column value selectors (or things that present themselves as numbers) are implementing getObject equivalently to isNull/get primitive methods, which is why I advised doing this as a defensive measure, that said, it is probably safe to just call getObject...

If that's the case, I am cool with the current code.

abhishekagarwal87 · 2024-02-08T13:50:55Z

Thank you, @sreemanamala for your first contribution

sreemanamala and others added 4 commits January 15, 2024 15:18

Initial files

d942f89

Merge branch 'apache:master' into single-value-agg

05b07d2

sigle value aggregator classes for different types

07fe3c5

Custom Exceptions and Unit Tests

6ec056b

github-actions bot added the Area - Querying label Jan 17, 2024

soumyava reviewed Jan 17, 2024

View reviewed changes

github-advanced-security bot found potential problems Jan 17, 2024

View reviewed changes

...src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/SingleValueSqlAggregator.java Fixed Show fixed Hide fixed

...essing/src/main/java/org/apache/druid/query/aggregation/SingleValueLongBufferAggregator.java Fixed Show fixed Hide fixed

sreemanamala added 2 commits January 17, 2024 15:30

code format change

978b993

code format change

414a359

somu-imply reviewed Jan 18, 2024

View reviewed changes

clintropolis reviewed Jan 18, 2024

View reviewed changes

sreemanamala added 3 commits January 23, 2024 09:53

Single aggregator to handle multiple column types

8c9e619

Single aggregator to handle multiple column types

959fe31

fixed for subquery empty results & new unit tests

a314a19

soumyava reviewed Jan 24, 2024

View reviewed changes

clintropolis reviewed Jan 24, 2024

View reviewed changes

made use of Nullable type strategy to read/write from/to buffer

e3aca34

clintropolis reviewed Jan 30, 2024

View reviewed changes

sreemanamala added 3 commits January 31, 2024 14:39

added unit tests & refactored exceptions

892d1b4

refactored exceptions

f8fc536

fix unit tests for sql compat false

347418f

LakshSingla reviewed Feb 2, 2024

View reviewed changes

LakshSingla reviewed Feb 5, 2024

View reviewed changes

cleaner code

ffc3953

LakshSingla reviewed Feb 6, 2024

View reviewed changes

code restructure

be657cd

LakshSingla reviewed Feb 7, 2024

View reviewed changes

bug fix for sql compat false

2106813

sreemanamala commented Feb 8, 2024

View reviewed changes

LakshSingla reviewed Feb 8, 2024

View reviewed changes

LakshSingla approved these changes Feb 8, 2024

View reviewed changes

sreemanamala added 2 commits February 8, 2024 14:58

unit tests for branch coverage

c93268b

bug fix

fac4ffa

abhishekagarwal87 merged commit 57e12df into apache:master Feb 8, 2024
83 checks passed

sreemanamala deleted the single-value-agg branch March 13, 2024 01:38

adarshsanjeev added this to the 30.0.0 milestone May 6, 2024

adarshsanjeev mentioned this pull request May 28, 2024

[DRAFT] 30.0.0 release notes #16505

Closed

		@JsonTypeName("singleValue")
		public class SingleValueAggregatorFactory extends AggregatorFactory


		import javax.annotation.Nullable;

		public class SingleValueSqlAggregator extends SimpleSqlAggregator

	longAggFatory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);
	longAggFactory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);

	return isNullResult ? NullHandling.ZERO_LONG : ((Number) value).longValue();
	assert NullHandling.replaceWithDefault() \|\| !isNull();
	return isNull() ? NullHandling.ZERO_LONG : ((Number) value).longValue();

	Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of [%s]", fieldName);
	Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of field [%s]", fieldName);

	private boolean validObjectValue()
	private boolean validPrimitiveValue()

Sql Single Value Aggregator for scalar queries #15700

Sql Single Value Aggregator for scalar queries #15700

Conversation

sreemanamala commented Jan 17, 2024 • edited Loading

Description

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Key changed/added classes in this PR

Choose a reason for hiding this comment

LakshSingla commented Jan 18, 2024

somu-imply left a comment

Choose a reason for hiding this comment

somu-imply Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Jan 18, 2024

LakshSingla commented Jan 18, 2024

clintropolis commented Jan 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Feb 8, 2024

sreemanamala commented Jan 17, 2024 •

edited

Loading

somu-imply Jan 18, 2024 •

edited

Loading

LakshSingla Feb 2, 2024 •

edited

Loading