-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SemanticCreator #16700
Refactor SemanticCreator #16700
Conversation
import org.apache.druid.query.rowsandcols.RowsAndColumns; | ||
import org.apache.druid.segment.column.RowSignature; | ||
|
||
public interface FrameMaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its not necessary needed to introduce an interface if there will be only 1 implementation
I wonder why not somehow make RowsAndColumns.as(Frame.class)
and
RowsAndColumns.as(RowSignature.class)
work for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this currently only has toColumnBasedFrame()
, it's equally plausible to have a toRowBasedFrame()
and allow implementations to ask for the one that they expect to be working with. The notion of which kind of Frame that you are getting is not something that can be relayed with RowsAndColumns.as(Frame.class)
.
That said, one thing that does annoy me about the .as()
and the semantic interfaces is that you have these factory-style interfaces for when you are building a thing. I don't know of another way to deal with it and maintain the extensibility, so I've just been thinking that it's the price we pay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently the DefaultFrameMaker
class has no internal state - and with these methods I think there won't be any - unless it tries to cache the frame or something...
I think one of the key problems with Frame
is that its decoupled from RowSignature
- however most of the time those two should be together....why not introduce a class which could hold a Frame
along with the RowSignature
or extend frame somehow to optionally add a RowSignature
that way a single as
method could be implemented; and there won't be a need for the FrameMaker
my recent experiences sugest that the production of a frame from a rac may not need to be aligned as 1-1 with it; so I think a class to which rac
-s would be written to produce a frame would be more like what we will need in the future - other way to get to to more-or-less the same is to use ConcatRowAndColumns
would it be possible to use the approach you end up with at the places where we have this frame creation logic copied-to already?
LazilyDecoratedRowsAndColumns
(2 places)StorageAdapterRowsAndColumns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, aligning a RowSignature with a Frame is definitely a really annoying thing when dealing with Frames, having the Frame carry it forward is nice. That said, one of the reasons for this interface is, if we assume that a Frame is the serialization format that we want for the wire, then we will want some "standard" means of converting from RowsAndColumns to the concrete wire-serializable form.
processing/src/main/java/org/apache/druid/common/semantic/SemanticUtils.java
Show resolved
Hide resolved
|
||
public class SemanticUtils | ||
{ | ||
private static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> OVERRIDES = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to put everything into a centralized map?
there is no point in caching this as all filed which are calling makeAsMap
are static
fields for which only once that method will be invoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists to allow extensions to register new interfaces and implementations without needing to impact the core code. This is probably worthy of javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean to override default behaviour?
could you give an example? how this supposed to work if 2-3 classes want to override the same?
what's the problem you are trying to solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, we could have a class A that has a SemanticCreator toB(). This allows A.as(B.class) to work.
However, there could be extensions that want a different implementation to be bound. The extension would want to use that implementation only if the extension is loaded and also, it would not be able to change the first binding. Overrides will allow the extension to bind something like:
SemanticUtils.registerAsOverride(
A.class,
B.class,
(a) -> new C(a) // C extends B
);
to modify
If there are multiple bindings for A and B from different places, SemanticUtils would throw an exception.
@Nullable | ||
private final Metadata metadata; | ||
@NotNull | ||
private final Supplier<Metadata> metadataSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks odd - an index with named columns ; but the Metadata
is softened with a supplier
why soften Metadata
with a Supplier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% certain that this is why it was done, but one benefit of softening it could be to avoid holding the object in memory which potentially softens memory pressure. There's probably a legitimate question to be asked if this should really be a Supplier<>
or if SimpleQueryableIndex
should be made an abstract class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted it to an abstract class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
|
||
public class SemanticUtils | ||
{ | ||
private static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> OVERRIDES = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists to allow extensions to register new interfaces and implementations without needing to impact the core code. This is probably worthy of javadoc.
processing/src/main/java/org/apache/druid/common/semantic/SemanticUtils.java
Show resolved
Hide resolved
import org.apache.druid.query.rowsandcols.RowsAndColumns; | ||
import org.apache.druid.segment.column.RowSignature; | ||
|
||
public interface FrameMaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this currently only has toColumnBasedFrame()
, it's equally plausible to have a toRowBasedFrame()
and allow implementations to ask for the one that they expect to be working with. The notion of which kind of Frame that you are getting is not something that can be relayed with RowsAndColumns.as(Frame.class)
.
That said, one thing that does annoy me about the .as()
and the semantic interfaces is that you have these factory-style interfaces for when you are building a thing. I don't know of another way to deal with it and maintain the extensibility, so I've just been thinking that it's the price we pay.
@Nullable | ||
private final Metadata metadata; | ||
@NotNull | ||
private final Supplier<Metadata> metadataSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% certain that this is why it was done, but one benefit of softening it could be to avoid holding the object in memory which potentially softens memory pressure. There's probably a legitimate question to be asked if this should really be a Supplier<>
or if SimpleQueryableIndex
should be made an abstract class.
processing/src/main/java/org/apache/druid/segment/nested/CompressedNestedDataComplexColumn.java
Outdated
Show resolved
Hide resolved
Looking at the failures, this needs improved coverage. For |
processing/src/test/java/org/apache/druid/common/semantic/SemanticUtilsTest.java
Fixed
Show fixed
Hide fixed
processing/src/test/java/org/apache/druid/common/semantic/SemanticUtilsTest.java
Dismissed
Show dismissed
Hide dismissed
@@ -28,4 +28,10 @@ | |||
*/ | |||
public interface ColumnarInts extends IndexedInts, Closeable | |||
{ | |||
default void get(int[] out, int offset, int start, int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any benefit of implementing this method here - and not in IndexedInts
which already has a get
method?
@@ -169,7 +175,7 @@ public void get(final long[] out, final int start, final int length) | |||
} | |||
|
|||
final int limit = Math.min(length - p, sizePer - bufferIndex); | |||
reader.read(out, p, bufferIndex, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any tests which are passing not 0
as offset?
import org.apache.druid.query.rowsandcols.RowsAndColumns; | ||
import org.apache.druid.segment.column.RowSignature; | ||
|
||
public interface FrameMaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently the DefaultFrameMaker
class has no internal state - and with these methods I think there won't be any - unless it tries to cache the frame or something...
I think one of the key problems with Frame
is that its decoupled from RowSignature
- however most of the time those two should be together....why not introduce a class which could hold a Frame
along with the RowSignature
or extend frame somehow to optionally add a RowSignature
that way a single as
method could be implemented; and there won't be a need for the FrameMaker
my recent experiences sugest that the production of a frame from a rac may not need to be aligned as 1-1 with it; so I think a class to which rac
-s would be written to produce a frame would be more like what we will need in the future - other way to get to to more-or-less the same is to use ConcatRowAndColumns
would it be possible to use the approach you end up with at the places where we have this frame creation logic copied-to already?
LazilyDecoratedRowsAndColumns
(2 places)StorageAdapterRowsAndColumns
@Override | ||
public Metadata getMetadata() | ||
{ | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its valid to have null
metadata; it could be the default
implementation in the interface
@Nullable | ||
default <T> T as(Class<? extends T> clazz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this PR is about to make the SemanticCreator
more mainstream: could you please introduce an interface for it and use it everywhere ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this, do you mean adding a new interface like "SemanticClass" that all other classes would implement if it has a SemanticCreator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - exactly
import java.util.Map; | ||
import java.util.function.Function; | ||
|
||
public class SemanticUtils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this Utils
could be renamed to be the interface containing the as
for these things; the static
method could remain as those are providing services...
please also add apidoc how this as
stuff works/etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea about creating the interface for .as()
and having these utils be static helpers on it. As a refactor, that should probably be done for all of the places that are using an .as()
so I think I'd like to let this current PR go through (as it's following the current pattern in the code) and one of us can take on the refactor as you defined it as I do think that's an improvement.
@Nullable | ||
public <T> T as(Class<? extends T> clazz) | ||
{ | ||
return column.as(clazz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a non-overridable as method....or not?
how will I be able to override this with SemanticUtils.registerOverride
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class does not use SemanticUtils.makeAsMap so it would not be directly overridable. I think the decision of using overrides or not currently belongs to the class itself, in this way. The extension can still override it by adding an override for ColumnarLongs which this class is pointing to.
@Nullable | ||
default <T> T as(Class<? extends T> clazz) | ||
{ | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a non-overridable as
method....or not?
I think registerOverride
should also work when it was not provided before....
how will I be able to override this with SemanticUtils.registerOverride
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to have implementations of base column be classes that have semantic creators and have possible conversions. Overrides can be registered against them.
If a class does not have any conversion to another class, and does not override this function, it defaults to null.
Adding a map and allowing functions to register overrides to it may cause it to use the conversions from the base class, if it is not overridden. I am not sure if that behavior is expected.
get(out, 0, start, length); | ||
} | ||
|
||
default void get(long[] out, int offset, int start, int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's driving this change?
why the need for this here and not in other classes like ColumnarDoubles
?
public Metadata getMetadata() | ||
{ | ||
try { | ||
ByteBuffer metadataBB = smooshedFiles.mapFile("metadata.drd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doesn't really know these parts; but this will mean that the Metadata
will be read from disk every time its asked for; is that desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. The smooshedFiles is memory mapped and this is returning a memoized memory-mapped buffer. It will do some less than fast operations, but not necessarily brand new disk IO. The getMetadata() call is used seldom (only a few specific introspection-oriented calls like the segmentMetadata query) so it's better to have it be lazy and not consume on-heap memory.
I had some off-line conversations with Zoltan about his comments. He's got some great comments for cleaning things up. I like eliminating SemanticUtils and just putting it on the interface that we use for the |
Refactors the SemanticCreator annotation.
This PR has: