Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SemanticCreator #16700

Merged
merged 8 commits into from
Aug 6, 2024
Merged

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Jul 8, 2024

Refactors the SemanticCreator annotation.

  • Moves the interface to the semantic package.
  • Create a SemanticUtils to hold logic for storing semantic maps.
  • Add FrameMaker interface.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

import org.apache.druid.query.rowsandcols.RowsAndColumns;
import org.apache.druid.segment.column.RowSignature;

public interface FrameMaker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its not necessary needed to introduce an interface if there will be only 1 implementation

I wonder why not somehow make RowsAndColumns.as(Frame.class) and
RowsAndColumns.as(RowSignature.class) work for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this currently only has toColumnBasedFrame(), it's equally plausible to have a toRowBasedFrame() and allow implementations to ask for the one that they expect to be working with. The notion of which kind of Frame that you are getting is not something that can be relayed with RowsAndColumns.as(Frame.class).

That said, one thing that does annoy me about the .as() and the semantic interfaces is that you have these factory-style interfaces for when you are building a thing. I don't know of another way to deal with it and maintain the extensibility, so I've just been thinking that it's the price we pay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the DefaultFrameMaker class has no internal state - and with these methods I think there won't be any - unless it tries to cache the frame or something...

I think one of the key problems with Frame is that its decoupled from RowSignature - however most of the time those two should be together....why not introduce a class which could hold a Frame along with the RowSignature or extend frame somehow to optionally add a RowSignature
that way a single as method could be implemented; and there won't be a need for the FrameMaker

my recent experiences sugest that the production of a frame from a rac may not need to be aligned as 1-1 with it; so I think a class to which rac-s would be written to produce a frame would be more like what we will need in the future - other way to get to to more-or-less the same is to use ConcatRowAndColumns

would it be possible to use the approach you end up with at the places where we have this frame creation logic copied-to already?

  • LazilyDecoratedRowsAndColumns (2 places)
  • StorageAdapterRowsAndColumns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, aligning a RowSignature with a Frame is definitely a really annoying thing when dealing with Frames, having the Frame carry it forward is nice. That said, one of the reasons for this interface is, if we assume that a Frame is the serialization format that we want for the wire, then we will want some "standard" means of converting from RowsAndColumns to the concrete wire-serializable form.


public class SemanticUtils
{
private static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> OVERRIDES = new LinkedHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to put everything into a centralized map?
there is no point in caching this as all filed which are calling makeAsMap are static fields for which only once that method will be invoked

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists to allow extensions to register new interfaces and implementations without needing to impact the core code. This is probably worthy of javadoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a javadoc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean to override default behaviour?
could you give an example? how this supposed to work if 2-3 classes want to override the same?

what's the problem you are trying to solve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, we could have a class A that has a SemanticCreator toB(). This allows A.as(B.class) to work.

However, there could be extensions that want a different implementation to be bound. The extension would want to use that implementation only if the extension is loaded and also, it would not be able to change the first binding. Overrides will allow the extension to bind something like:

SemanticUtils.registerAsOverride(
        A.class,
        B.class,
        (a) -> new C(a) // C extends B
    );

to modify
If there are multiple bindings for A and B from different places, SemanticUtils would throw an exception.

@Nullable
private final Metadata metadata;
@NotNull
private final Supplier<Metadata> metadataSupplier;
Copy link
Member

@kgyrtkirk kgyrtkirk Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks odd - an index with named columns ; but the Metadata is softened with a supplier
why soften Metadata with a Supplier ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain that this is why it was done, but one benefit of softening it could be to avoid holding the object in memory which potentially softens memory pressure. There's probably a legitimate question to be asked if this should really be a Supplier<> or if SimpleQueryableIndex should be made an abstract class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted it to an abstract class

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM


public class SemanticUtils
{
private static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> OVERRIDES = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists to allow extensions to register new interfaces and implementations without needing to impact the core code. This is probably worthy of javadoc.

import org.apache.druid.query.rowsandcols.RowsAndColumns;
import org.apache.druid.segment.column.RowSignature;

public interface FrameMaker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this currently only has toColumnBasedFrame(), it's equally plausible to have a toRowBasedFrame() and allow implementations to ask for the one that they expect to be working with. The notion of which kind of Frame that you are getting is not something that can be relayed with RowsAndColumns.as(Frame.class).

That said, one thing that does annoy me about the .as() and the semantic interfaces is that you have these factory-style interfaces for when you are building a thing. I don't know of another way to deal with it and maintain the extensibility, so I've just been thinking that it's the price we pay.

@Nullable
private final Metadata metadata;
@NotNull
private final Supplier<Metadata> metadataSupplier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain that this is why it was done, but one benefit of softening it could be to avoid holding the object in memory which potentially softens memory pressure. There's probably a legitimate question to be asked if this should really be a Supplier<> or if SimpleQueryableIndex should be made an abstract class.

@imply-cheddar
Copy link
Contributor

Looking at the failures, this needs improved coverage. For SemanticUtils, it should be fine to create a class like inside of SemanticUtilsTest.ClassForSemanticUtilsCoverage that you then register implementations against and run things. This should allow you to validate capabilities without the static nature of the internal object becoming an issue.

@@ -28,4 +28,10 @@
*/
public interface ColumnarInts extends IndexedInts, Closeable
{
default void get(int[] out, int offset, int start, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any benefit of implementing this method here - and not in IndexedInts which already has a get method?

@@ -169,7 +175,7 @@ public void get(final long[] out, final int start, final int length)
}

final int limit = Math.min(length - p, sizePer - bufferIndex);
reader.read(out, p, bufferIndex, limit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any tests which are passing not 0 as offset?

import org.apache.druid.query.rowsandcols.RowsAndColumns;
import org.apache.druid.segment.column.RowSignature;

public interface FrameMaker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the DefaultFrameMaker class has no internal state - and with these methods I think there won't be any - unless it tries to cache the frame or something...

I think one of the key problems with Frame is that its decoupled from RowSignature - however most of the time those two should be together....why not introduce a class which could hold a Frame along with the RowSignature or extend frame somehow to optionally add a RowSignature
that way a single as method could be implemented; and there won't be a need for the FrameMaker

my recent experiences sugest that the production of a frame from a rac may not need to be aligned as 1-1 with it; so I think a class to which rac-s would be written to produce a frame would be more like what we will need in the future - other way to get to to more-or-less the same is to use ConcatRowAndColumns

would it be possible to use the approach you end up with at the places where we have this frame creation logic copied-to already?

  • LazilyDecoratedRowsAndColumns (2 places)
  • StorageAdapterRowsAndColumns

@Override
public Metadata getMetadata()
{
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its valid to have null metadata; it could be the default implementation in the interface

Comment on lines +70 to +71
@Nullable
default <T> T as(Class<? extends T> clazz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this PR is about to make the SemanticCreator more mainstream: could you please introduce an interface for it and use it everywhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this, do you mean adding a new interface like "SemanticClass" that all other classes would implement if it has a SemanticCreator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - exactly

import java.util.Map;
import java.util.function.Function;

public class SemanticUtils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this Utils could be renamed to be the interface containing the as for these things; the static method could remain as those are providing services...

please also add apidoc how this as stuff works/etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea about creating the interface for .as() and having these utils be static helpers on it. As a refactor, that should probably be done for all of the places that are using an .as() so I think I'd like to let this current PR go through (as it's following the current pattern in the code) and one of us can take on the refactor as you defined it as I do think that's an improvement.

@Nullable
public <T> T as(Class<? extends T> clazz)
{
return column.as(clazz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a non-overridable as method....or not?

how will I be able to override this with SemanticUtils.registerOverride?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class does not use SemanticUtils.makeAsMap so it would not be directly overridable. I think the decision of using overrides or not currently belongs to the class itself, in this way. The extension can still override it by adding an override for ColumnarLongs which this class is pointing to.

@Nullable
default <T> T as(Class<? extends T> clazz)
{
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a non-overridable as method....or not?
I think registerOverride should also work when it was not provided before....

how will I be able to override this with SemanticUtils.registerOverride?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to have implementations of base column be classes that have semantic creators and have possible conversions. Overrides can be registered against them.
If a class does not have any conversion to another class, and does not override this function, it defaults to null.

Adding a map and allowing functions to register overrides to it may cause it to use the conversions from the base class, if it is not overridden. I am not sure if that behavior is expected.

get(out, 0, start, length);
}

default void get(long[] out, int offset, int start, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's driving this change?
why the need for this here and not in other classes like ColumnarDoubles ?

public Metadata getMetadata()
{
try {
ByteBuffer metadataBB = smooshedFiles.mapFile("metadata.drd");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doesn't really know these parts; but this will mean that the Metadata will be read from disk every time its asked for; is that desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. The smooshedFiles is memory mapped and this is returning a memoized memory-mapped buffer. It will do some less than fast operations, but not necessarily brand new disk IO. The getMetadata() call is used seldom (only a few specific introspection-oriented calls like the segmentMetadata query) so it's better to have it be lazy and not consume on-heap memory.

@cheddar
Copy link
Contributor

cheddar commented Aug 6, 2024

I had some off-line conversations with Zoltan about his comments. He's got some great comments for cleaning things up. I like eliminating SemanticUtils and just putting it on the interface that we use for the .as() method. And I believe that having the RowSignature alongside the Frame is a good thing. We will allow them to be done in future PRs, so I'm going to merge this as is.

@cheddar cheddar merged commit 2b81c18 into apache:master Aug 6, 2024
88 checks passed
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants