-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tighten up array handling, fix bug with array_slice output type inference #12914
Conversation
This pull request introduces 4 alerts when merging 8763c66 into f70f7b4 - view on LGTM.com new alerts:
|
Do these matter? |
public static boolean isAllConstants(Expr... exprs) | ||
{ | ||
return isAllConstants(Arrays.asList(exprs)); | ||
for (Expr expr : exprs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why duplicate the code? To avoid wrapping the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this change wasn't really related to anything, but it seemed a bit pointless to wrap it... though i guess probably not that expensive in the scheme of things since this is only used when flattening expressions in the Parser
, so on second thought can just change it back
@@ -429,18 +443,15 @@ public static ExprEval ofType(@Nullable ExpressionType type, @Nullable Object va | |||
case STRING: | |||
// not all who claim to be "STRING" are always a String, prepare ourselves... | |||
if (value instanceof String[]) { | |||
return new ArrayExprEval(ExpressionType.STRING_ARRAY, (String[]) value); | |||
return new ArrayExprEval(ExpressionType.STRING_ARRAY, Arrays.stream((String[]) value).toArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream construction in hot paths can be expensive. May be better to avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, looking around there are quite a few usages of steam in array stuffs, should I try to fix them all up in this PR or do as a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, I'm OK either way.
if (val instanceof String[]) { | ||
return new ArrayExprEval(ExpressionType.STRING_ARRAY, (String[]) val); | ||
return new ArrayExprEval(ExpressionType.STRING_ARRAY, Arrays.stream((String[]) val).toArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream construction in hot paths can be expensive. May be better to avoid it.
/** | ||
* Druid string type. Values will be represented as {@link String} or {@link java.util.List<String>} in the case | ||
* of multi-value string columns. {@link ColumnType} has insufficient information to distinguish between single | ||
* and multi-value strings, this requires a type 'ColumnCapabilities' which is available at a higher layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the specific hasMultipleValues function? Or mention, if it's not available from this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in this package unfortunately but will do. i do still someday wish to merge druid-core into druid-processing since i find the split a bit frustrating, but that is so disruptive i should probably give up hope that it will ever happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so disruptive i should probably give up hope that it will ever happen
There's always the old "do it during a quiet week" strategy 🙂
public static final ColumnType FLOAT = new ColumnType(ValueType.FLOAT, null, null); | ||
// currently, arrays only come from expressions or aggregators | ||
// and there are no native float expressions (or aggs which produce float arrays) | ||
/** | ||
* An array of Strings. Values will reprsented as Object[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
represented (spelling)
* this complex type. | ||
* in the form of a type name which must be registered in the complex type registry. This type is not currently | ||
* supported as a grouping key for aggregations, and might only be supported by the specific aggregators crafted to | ||
* handle this complex type. Filtering on this type with standard filters will most likely have limited support, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"most likely"? Are there cases where they are treated as anything other than null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, there is isFilterable
on column capabilities which is used exclusively to allow complex types to provide indexes to use for filtering, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/ColumnSelectorColumnIndexSelector.java#L86. This could allow complex columns to directly provide indexes that filters understand, which I suppose could be used for filtering.
However, there are no examples of this in current complex types, and importantly, I don't consider this support to be fully built-out yet though because without value matcher integration its not really complete and likely not correct except in certain situations. We did start to orient some things to make this someday be possible, like adding the object predicate https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java#L46, but some more plumbing needs to be done to make use of it I think.
The big win here on finishing this stuff out someday would be to allow matching constant forms of complex values with selector filters, and making is null/is not null actually work correctly (like is null matches everything since the column is treated as a null) which is like a special case of the same thing but especially jarring.
Beyond equality checks I'm not sure most of the standard filters make sense with complex types in general, but tangentially, it would probably be nice in the future to have a mode where filtering on them with filters that don't make sense is an error condition instead of silently treating it as a column of null values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seemed too complicated to explain so now just mention that filters will treat as null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can distill this to: "filters generally treat complex values as null, although this is neither guaranteed nor required."
* input to expression virtual columns, and might only be supported by the specific aggregators crafted to handle | ||
* this complex type. | ||
* in the form of a type name which must be registered in the complex type registry. This type is not currently | ||
* supported as a grouping key for aggregations, and might only be supported by the specific aggregators crafted to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"might"? How about:
Complex types are not currently supported as a grouping key for aggregations. Complex types can be used as inputs to aggregators, in cases where the specific aggregator supports the specific complex type.
*/ | ||
COMPLEX, | ||
|
||
/** | ||
* Placeholder for arbitrary arrays of other {@link ValueType}. This type is not currently supported as a grouping | ||
* Placeholder for arbitrary arrays of other {@link ValueType}. This type has limited support as a grouping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "limited support" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can group on specific array types, such as ARRAY<STRING>
, ARRAY<LONG>
, ARRAY<DOUBLE>
, ARRAY<FLOAT>
, but not fully generically, like on ARRAY<ARRAY<LONG>>
, or ARRAY<COMPLEX<json>>
(assuming grouping on COMPLEX
types was also supported, which it isn't yet... but soon.). I'll clarify that some specific array grouping is supported, but array grouping in general is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense. Would be good for that to come across in the comment.
* There are currently no native ARRAY typed columns, but they may be produced by expression virtual columns, | ||
* aggregators, and post-aggregators. | ||
* | ||
* Arrays are typically represented as Object[], but may also be Java primitive arrays such as long[], double[], or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the idea of this paragraph. How do you feel about this adjustment? The idea is to firm up the language and give people clearer expectations.
Arrays are represented as Object[], long[], double[], or float[]. The preferred type is Object[], since the expression system is the main consumer of arrays, and the expression system uses Object[] internally. Some code represents arrays in other ways; in particular the groupBy engine and SQL result layer. Over time we expect these usages to migrate to Object[], long[], double[], and float[].
if (lResult == null) { | ||
return false; | ||
} | ||
|
||
return Arrays.stream(lResult).filter(Objects::nonNull).anyMatch(Evals::asBoolean); | ||
return Arrays.stream(lResult).filter(Objects::nonNull).anyMatch(o -> Evals.asBoolean((long) o)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this code was pre-existing, but, may be a good idea to move it away from streams since it's a hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest patch LGTM.
…ff intended for terminals into editor...
Description
This PR adjusts array handling in the direction of standardizing the representations on
Object[]
as well aslong[]
,double[]
, andfloat[]
. This is follow-up to #12498 (comment) and a follow-up discussion I had with @gianm.The main changes are removing the remaining leaks of typed object arrays such as
Long[]
,Double[]
, andFloat[]
from the expression system, as well as adding support for handling arrays of java primitives such aslong[]
,float[]
, anddouble[]
. Expressions do not produce these primitive arrays, but aggregators and post-aggs can.I've added javadocs to try to help provide additional details on native type matters.
I've adjusted the behavior of the expression column value selector when the virtual column output type is
STRING
to behave a bit more like a normal multi-value string column, and return the value directly when there is only a single value contained in the list. This does slightly change scan query results, which for multi-value string expressions would always produce array typed results, but now will produce a mix of array and scalar values. I think this behavior is more consistent with how the multi-value string columns themselves behave, but I can put it behind a flag if anyone is worried about this change.I encountered a bug with
array_slice
output type inference, which was not indicating its output as array output, instead reporting the output type of its argument. This issue became clear due to the stronger separation between multi-value string handling and array handling that now occurs in the expression selectors.Several redundant uses of
NullHandling.emptyToNullIfNeeded
have been removed from expression selectors, since it is baked intoStringExprEval
itself:In a follow-up PR, I'm going to try to see if I can remove the rest of the representation of ARRAY types as List, which is primarily the grouping engine. I decided to split it out from this PR to keep change set small.
This PR has: