Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql Single Value Aggregator for scalar queries #15700

Merged
merged 18 commits into from
Feb 8, 2024

Conversation

sreemanamala
Copy link
Contributor

@sreemanamala sreemanamala commented Jan 17, 2024

Description

Added the Single Value Aggregator functionality for scalar queries in group by queries

Fixed the bug ...

Executing single value correlated queries will throw an exception today since single_value function is not available in druid.
With these added classes, this provides druid, the capability to plan and run such queries.

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Supporting Single Value aggregated group by queries for scalars


Key changed/added classes in this PR
  • SingleValueSqlAggregator
  • SingleValueAggregatorFactory, SingleValueBufferAggregator, SingleValueAggregator
  • AggregatorsModule, AggregatorUtil
  • DruidOperatorTable
  • CalciteSingleValueAggregatorTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

import org.apache.druid.sql.calcite.aggregation.builtin.StringSqlAggregator;
import org.apache.druid.sql.calcite.aggregation.builtin.SumSqlAggregator;
import org.apache.druid.sql.calcite.aggregation.builtin.SumZeroSqlAggregator;
import org.apache.druid.sql.calcite.aggregation.builtin.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit against including it as * and include only those libraries that we need, in future if we come up with another Agg that we do not use this wont be correct

@LakshSingla
Copy link
Contributor

Thanks for the first PR to Druid! 🚀

A few high level comments with regards to this

  1. What's the behaviour if the underlying relation has 0 rows. Shoud it throw an exception, replace it with null, or replace it with some dummy value (0.0 for floats, 0 for longs etc). We'd have to see what the SQL standard is or what other DBs commonly do and model the aggregation on top of it.
  2. String and Complex types need to be modeled as well. For string types, it can be invoked if used with aggregations like STRLEN
SELECT  
count(*)
 FROM "wiki"
 where 
  col1  >= STRLEN(SELECT dim1 FROM single_row_relation)

and with complex types, it can be invoked with functions performing finalization on the complex types, or something like PAIR_LEFT (not yet added into Druid though). However the complex use case seems far-fetched enough, that we should be fine with saying we don't support such a use case. The Calcite (or the user) can probably optimize and pushdown the aggregation as well.
3. Can we implement vector versions of the aggregator as well?

Copy link
Contributor

@somu-imply somu-imply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution Sree. Might be simple enough to add the vector aggs as this is on a single value. There's also a failure in static checks that needs to be addressed

{
final BaseLongColumnValueSelector valueSelector;

Long value;
Copy link
Contributor

@somu-imply somu-imply Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make these final, similar for the other classes

skipVectorize();
cannotVectorize();
testQuery(
"SELECT count(*) FROM foo where m1 >= (select max(m1) - 4 from foo)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add more examples when these are string functions and also other aggs like count etc.

)
{
if (aggregateCall.getArgList().size() > 1) {
throw DruidException.defensive(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is visible to user, defensive might not be the best idea

public SingleValueDoubleAggregator(BaseDoubleColumnValueSelector valueSelector)
{
this.valueSelector = valueSelector;
this.value = valueSelector.getDouble();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think numeric selectors probably need to check isNull() and set the value to null if true

@Override
public Aggregator factorize(ColumnSelectorFactory metricFactory)
{
final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(getFieldName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a separate aggregator factory for each type, why not call metricFactory.getColumnCapabilities(getFieldName()) and pick the right type of aggregator.

even further, do we even need aggs/buffer aggs for each type? Given that the typed aggs all appear to be storing values as Objects instead of java numeric primitives, why not just have a single Aggregator and BufferAggregator implementation. For the BufferAggregator, you could pass the ColumnType converted from getColumnCapabilities and use .getNullableStrategy() to write and read the value from the buffer.

This seems like a lot of classes for something that is basically a value holder...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the NullableTypeStrategy. I always forget that such a contract exists (even in #15559 😞 ), and misguided the author into thinking that we don't have such a contract, and it needs to be inferred on per type basis.

import org.apache.druid.sql.calcite.util.CalciteTests;
import org.junit.Test;

public class CalciteSingleValueAggregatorTest extends CalciteQueryTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not extend CalciteQueryTest since that will run all of those tests too... it doesn't seem like it needs to be its own test file in the first place, but if it does for some reason, it should extend BaseCalciteQueryTest

@clintropolis
Copy link
Member

Can we implement vector versions of the aggregator as well?

I don't think this could possibly be vectorized, can it? since this can only call aggregate once, while aggregate method for vector aggs take multiple values, unless i'm misunderstanding something about how this is used

@LakshSingla
Copy link
Contributor

I don't think this could possibly be vectorized, can it? since this can only call aggregate once, while aggregate method for vector aggs take multiple values, unless i'm misunderstanding something about how this is used

@clintropolis I don't have much insight into how the vector engines perform, however, I reasoned that it being a non-vector version only would prevent the query from being vectorized. While it won't gain any benefit from the vectorization, having a vectorized implementation would allow the rest of the query from being vectorized. Wdyt?

@clintropolis
Copy link
Member

@clintropolis I don't have much insight into how the vector engines perform, however, I reasoned that it being a non-vector version only would prevent the query from being vectorized. While it won't gain any benefit from the vectorization, having a vectorized implementation would allow the rest of the query from being vectorized. Wdyt?

Ah yeah, i guess it doesn't hurt to implement, I was just making a drive by comment and not entirely sure of the context of this agg. But it seems like any query with this agg can only process a single row, so I wasn't sure it would be much benefit either since the main benefit of vectorization is processing batches of rows.

@Override
public void aggregate()
{
if (isAggregateInvoked) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if selector.isNull() we can just return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a null row is still a row in the context of this agg i think so it needs to set isAggregateInvoked

Comment on lines 69 to 79
switch (aggregationType.getType()) {
case LONG:
case FLOAT:
case DOUBLE:
case STRING:
return new SingleValueAggregatoractory(name, fieldName, aggregationType);
default:
// This error refers to the Druid type. But, we're in SQL validation.
// It should refer to the SQL type.
throw SimpleSqlAggregator.badTypeException(fieldName, "SINGLE_VALUE", aggregationType);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need this check at all? Thinking of things like ARRAY_AGG which can aggregate an array of values...

import java.util.Arrays;
import java.util.HashSet;

public class CalciteSingleValueAggregatorTest extends BaseCalciteQueryTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to just put these test cases in CalciteSubqueryTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly, will move them

Comment on lines 65 to 75
if (columnType.is(ValueType.STRING)) {
isNotNull = (selector.getObject() != null);
} else {
isNotNull = !selector.isNull();
}
if (isNotNull) {
if (buf.get(position) == NullHandling.IS_NULL_BYTE) {
buf.put(position, NullHandling.IS_NOT_NULL_BYTE);
}
updatevalue(buf, position + Byte.BYTES);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the potentially attractive part about pushing the ColumnType in here is that we can also use the NullableTypeStrategy to implement aggregate and get methods.

I pulled your branch and tried this and it runs into some problems with floats ending up as doubles, though I think this is actually a problem with either the underlying expression selector (expressions don't support floats so I think this is likely the culprit) because testSingleValueFloatAgg fails, so we need a helper function to get the object from the underlying selector to ensure that the thing we get from the selector matches the columnType.

  @Override
  public void aggregate(ByteBuffer buf, int position)
  {
    if (isAggregateInvoked) {
      throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row");
    }

    int written = typeStrategy.write(
        buf,
        position,
        getSelectorObject(),
        SingleValueAggregatoractory.DEFAULT_MAX_STRING_SIZE
    );
    if (written < 0) {
      throw InvalidInput.exception("Single Value Aggregator value too big for buffer");
    }

    isAggregateInvoked = true;
  }

(invalid input might not be the actual right exception here for exceeding size limit, since its only a single row we might want to figure out how to pass this in, or make a much larger limit since the user isn't calling this function directly..)

  @Nullable
  @Override
  public Object get(ByteBuffer buf, int position)
  {
    return typeStrategy.read(buf, position);
  }
  @Nullable
  private Object getSelectorObject()
  {
    if (selector.isNull()) {
      return null;
    }
    switch (columnType.getType()) {
      case LONG:
        return selector.getLong();
      case FLOAT:
        return selector.getFloat();
      case DOUBLE:
        return selector.getDouble();
      default:
        return selector.getObject();
    }
  }

The getLong/getFloat/getDouble methods could also be updated, there are some static methods in TypeStrategies which can help with this isNullableNull, readNotNullNullableLong, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would look more cleaner the way you pointed. will update it.

import java.util.Objects;

@JsonTypeName("singleValue")
public class SingleValueAggregatoractory extends AggregatorFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be SingleValueAggregatorFactory instead of SingleValueAggregatoractory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! thanks for pointing.

@Override
public void aggregate()
{
if (isAggregateInvoked) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a null row is still a row in the context of this agg i think so it needs to set isAggregateInvoked

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm 👍

Comment on lines 27 to 29
/**
*
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can just delete if not filling out javadocs

Comment on lines +40 to +41
@JsonTypeName("singleValue")
public class SingleValueAggregatorFactory extends AggregatorFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be nice to add javadocs for since its primarily to support SQL planner and probably doesn't have many direct use cases

Comment on lines 63 to 66
if (isNullResult) {
return null;
}
return value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just return value directly?

if (isAggregateInvoked) {
throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row");
}
boolean isNotNull = !selector.isNull();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like in the buffer agg, this is probably only accurate for numeric primitive selectors, since typically things that call getObject do not check isNull. isNull is mainly used when you plan to call getLong or the like (since java primitives cannot be null, so they use this method instead).

You could potentially just always call getObject, or could have a method similar to the method for the buffer agg though not sure its as useful here since getObject usually always works (i think)

Comment on lines 69 to 79
switch (aggregationType.getType()) {
case LONG:
case FLOAT:
case DOUBLE:
case STRING:
return new SingleValueAggregatorFactory(name, fieldName, aggregationType);
default:
// This error refers to the Druid type. But, we're in SQL validation.
// It should refer to the SQL type.
throw SimpleSqlAggregator.badTypeException(fieldName, "SINGLE_VALUE", aggregationType);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i already asked this, but is this needed or can it just always make a SingleValueAggregatorFactory with any aggregationType?

Comment on lines +1124 to +1152
new QueryDataSource(GroupByQuery.builder()
.setDataSource(new QueryDataSource(
Druids.newTimeseriesQueryBuilder()
.dataSource(CalciteTests.DATASOURCE1)
.intervals(querySegmentSpec(Filtration.eternity()))
.granularity(Granularities.ALL)
.aggregators(new FloatMaxAggregatorFactory("a0", "m1"))
.build()
))
.setInterval(querySegmentSpec(Filtration.eternity()))
.setGranularity(Granularities.ALL)
.setVirtualColumns(expressionVirtualColumn(
"v0",
"(\"a0\" - 3.5)",
ColumnType.DOUBLE
)
)
.setAggregatorSpecs(
aggregators(
new SingleValueAggregatorFactory(
"_a0",
"v0",
ColumnType.DOUBLE
)
)
)
.setLimitSpec(NoopLimitSpec.instance())
.setContext(QUERY_CONTEXT_DEFAULT)
.build()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this doesn't need to be part of this PR, but it seems like there is room for improvement in the planner here... Like in this case i think the planner should just collapse this subquery into a single timeseries query with an expression post-aggregator, instead of a group by on a timeseries

public void aggregate()
{
if (isAggregateInvoked) {
throw InvalidInput.exception("Single Value Aggregator would not be applied to more than one row");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is unlikely to be used directly, I wonder if this error message should mention sub-queries? I was playing around with postgres and it returns something like
error: more than one row returned by a subquery used as an expression so i wonder if we should do something similar (probably don't copy it exactly :p). It might be nice to include the name of the input column. Be sure to update the buffer agg error if we change this too

public float getFloat(ByteBuffer buf, int position)
{
if (TypeStrategies.isNullableNull(buf, position)) {
throw new IllegalStateException("Cannot return float for Null Value");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use DruidException.defensive since this shouldn't happen in practice because callers should know to call isNull if values can be null (if it happens it is a coding error probably)

Object value;

private boolean isNullResult = true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Spacing not needed

Comment on lines 86 to 88
if (isNullResult) {
throw DruidException.defensive("Cannot return double for Null Value");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably for SQL-incompatible behavior, let's just return the default value. It's the onus of the caller to call .isNull and then the .getDouble. I am wondering if any SQL-compatible path that just calls .getDouble without calling .isNull() can cause it to throw. Anyways, it will also prevent an additional check in this path. Same goes for other selectors.

boolean isNotNull = (selector.getObject() != null);
if (isNotNull) {
isNullResult = false;
value = selector.getObject();
Copy link
Contributor

@LakshSingla LakshSingla Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably reuse the .getObject() called above, because for primitives that would cause autoboxing.

I was going to suggest using .isNull() check at the top instead, however, I don't think they might behave correctly for object selectors, therefore refraining from the suggestion.

@Test
public void testSingleValueEmptyInnerAgg()
{
msqIncompatible();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why msqIncompatible()? I think this test isn't subclassed by MSQ anyways, so we can get rid of these calls, however, I do expect it to be MSQ-compatible.


import javax.annotation.Nullable;

public class SingleValueSqlAggregator extends SimpleSqlAggregator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can add javadoc as to how this will be called by the SQL query, since its not something that the user will supply.

public SingleValueAggregationTest() throws Exception
{
String longAggSpecJson = "{\"type\": \"singleValue\", \"name\": \"lng\", \"fieldName\": \"lngFld\", \"columnType\": \"LONG\"}";
longAggFatory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
longAggFatory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);
longAggFactory = TestHelper.makeJsonMapper().readValue(longAggSpecJson, SingleValueAggregatorFactory.class);

Comment on lines 65 to 69
doubleAggFatory = TestHelper.makeJsonMapper().readValue(doubleAggSpecJson, SingleValueAggregatorFactory.class);

String strAggSpecJson = "{\"type\": \"singleValue\", \"name\": \"str\", \"fieldName\": \"strFld\", \"columnType\": \"STRING\"}";
stringAggFatory = TestHelper.makeJsonMapper().readValue(strAggSpecJson, SingleValueAggregatorFactory.class);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling

@Before
public void setup()
{
NullHandling.initializeForTests();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of calling it in the setup(), we can subclass the test class with extends InitializedNullHandlingTest

Comment on lines 36 to 37
/**
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cleanup

Comment on lines 81 to 96
EasyMock.expect(colSelectorFactoryLong.makeColumnValueSelector("lngFld")).andReturn(selectorLong);
EasyMock.expect(colSelectorFactoryLong.getColumnCapabilities("lngFld")).andReturn(columnCapabilitiesLong);

EasyMock.replay(columnCapabilitiesLong);
EasyMock.replay(colSelectorFactoryLong);

selectorDouble = new TestDoubleColumnSelectorImpl(doubleValues);
columnCapabilitiesDouble = EasyMock.createMock(ColumnCapabilities.class);
EasyMock.expect(columnCapabilitiesDouble.getType()).andReturn(ValueType.DOUBLE);

colSelectorFactoryDouble = EasyMock.createMock(ColumnSelectorFactory.class);
EasyMock.expect(colSelectorFactoryDouble.makeColumnValueSelector("dblFld")).andReturn(selectorDouble);
EasyMock.expect(colSelectorFactoryDouble.getColumnCapabilities("dblFld")).andReturn(columnCapabilitiesDouble);

EasyMock.replay(columnCapabilitiesDouble);
EasyMock.replay(colSelectorFactoryDouble);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a cleaner way than mocking the selector factory and the column capabilities. For column capabilities, you can use the ColumnCapabilitiesImpl#createSimpleNumericColumnCapabilities et al
. I was going through the code to find any reusable class for the column selector factory and I found TestColumnSelectorFactory. Perhaps others can achieve this more cleanly.

if (isNullResult) {
throw DruidException.defensive("Cannot return float for Null Value");
}
return (float) value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (float) value;
return ((Number) value).floatValue();

Comment on lines 31 to 33
final ColumnValueSelector selector;
@Nullable
Object value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private for consistency.

Comment on lines 49 to 53
boolean isNotNull = (selectorObject != null);
if (isNotNull) {
isNullResult = false;
value = selectorObject;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Coding style:

Now that it's written this way, it seems redundant to have isNullResult and isNotNull.
Following seems a much cleaner way

Suggested change
boolean isNotNull = (selectorObject != null);
if (isNotNull) {
isNullResult = false;
value = selectorObject;
}
value = selector.getObject();
}

We don't use any null check here.

The method isNull will do something like:

  public boolean isNull()
  {
    return value == null;
  }

Then the methods relying on the isNullResult can use isNull() instead.

  public float getFloat()
  {
    return isNull() ? NullHandling.ZERO_FLOAT : ((Number) value).floatValue();
  }

This will prevent the code from maintaining two variables denoting same information.

Comment on lines 96 to 98
return "SingleValueAggregator{" +
"selector=" + selector +
'}';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also print the value and aggregateInvoked?

@JsonProperty
@JsonInclude(JsonInclude.Include.NON_NULL)
private final ColumnType columnType;
public static final int DEFAULT_MAX_BUFFER_SIZE = 1025;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1025, and not 1024? (Latter seems more "correct")

Comment on lines 158 to 161
if (columnType.isNumeric()) {
return Byte.BYTES + Double.BYTES;
}
return DEFAULT_MAX_BUFFER_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this correct, for long values?

Comment on lines 36 to 39
final ColumnValueSelector selector;
final ColumnType columnType;
final NullableTypeStrategy typeStrategy;
private boolean isAggregateInvoked = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mark private for consistency

final NullableTypeStrategy typeStrategy;
private boolean isAggregateInvoked = false;

SingleValueBufferAggregator(ColumnValueSelector selector, ColumnType columnType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public, since the SingleValueAggregator is also public scope

{
this.selector = selector;
this.columnType = columnType;
this.typeStrategy = columnType.getNullableStrategy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would NPE if columnType is null, hence we should add the null check in the aggregator factory.

@Override
public float getFloat(ByteBuffer buf, int position)
{
return TypeStrategies.isNullableNull(buf, position)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if two calls are required, since we already have the nullable type strategy, so perhaps we can read it once, and check if that is null. However, not important, since its called for a single row only.

@Override
public long getLong()
{
return isNullResult ? NullHandling.ZERO_LONG : ((Number) value).longValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an assertion in the primitive selectors that if the mode is sql compatible, then the selector cannot be null.

Suggested change
return isNullResult ? NullHandling.ZERO_LONG : ((Number) value).longValue();
assert NullHandling.replaceWithDefault() || !isNull();
return isNull() ? NullHandling.ZERO_LONG : ((Number) value).longValue();

public class SingleValueAggregatorFactory extends AggregatorFactory
{
@JsonProperty
@JsonInclude
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need JsonInclude annotation

Suggested change
@JsonInclude

Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being accommodating with the reviews. Final thoughts on the PR.

{
ColumnValueSelector selector = metricFactory.makeColumnValueSelector(fieldName);
ColumnCapabilities columnCapabilities = metricFactory.getColumnCapabilities(fieldName);
Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of [%s]", fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null checks are better done using an if clause. Preconditions is a shorthand, we mostly use in the constructor. It doesn't play nice with the DruidExcption system, because now we don't know the target persona for the error message. Therefore, let's throw a DruidException here, because, now it will force us to think of the persona of the error message, and word it accordingly.
I think it should be aimed at devs only, because users/admins/operators won't know what to do if such an error occurs, and won't make sense for them.

Also, check out https://github.com/apache/druid/blob/master/dev/style-conventions.md#message-formatting-for-logs-and-exceptions. The message should make sense if all the extrapolation ([%s]) is removed. Therefore, the message should be

Suggested change
Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of [%s]", fieldName);
Preconditions.checkNotNull(columnCapabilities, "Unable to get the capabilities of field [%s]", fieldName);

Comment on lines +96 to +100
/**
* Combine method would never be invoked as the broker sends the subquery to multiple segments
* and gather the results to a single value on which the single value aggregator is applied.
* Though getCombiningFactory would be invoked for understanding the fieldname.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

public long getLong()
{
assert validObjectValue();
return (value == null) ? NullHandling.ZERO_LONG : ((Number) value).longValue();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LakshSingla had to modify the check to value rather than relying on isNull(). Else it was causing NPE.
we are checking the validity in the assertion above.

return NullHandling.sqlCompatible() && value == null;
}

private boolean validObjectValue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
private boolean validObjectValue()
private boolean validPrimitiveValue()

Comment on lines +77 to +93
@Nullable
private Object getSelectorObject()
{
if (columnType.isNumeric() && selector.isNull()) {
return null;
}
switch (columnType.getType()) {
case LONG:
return selector.getLong();
case FLOAT:
return selector.getFloat();
case DOUBLE:
return selector.getDouble();
default:
return selector.getObject();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are boxing it to Object, I guess we can remove this method and call selector.getObject() wherever this method is used.
The benefit of using .getLong() is that we don't need to auto box-unbox, whenever we are working with primitive selectors, however, that is getting lost in the translation here. Therefore, we can remove this method, since it is don't what selector.getObject() will be doing for primitive types anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was a bit less confident that all of the numeric column value selectors (or things that present themselves as numbers) are implementing getObject equivalently to isNull/get primitive methods, which is why I advised doing this as a defensive measure, that said, it is probably safe to just call getObject...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, I am cool with the current code.

@abhishekagarwal87 abhishekagarwal87 merged commit 57e12df into apache:master Feb 8, 2024
83 checks passed
@abhishekagarwal87
Copy link
Contributor

Thank you, @sreemanamala for your first contribution

@sreemanamala sreemanamala deleted the single-value-agg branch March 13, 2024 01:38
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants