-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support filtering data in Auto Compaction #11922
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the PR going to add more changes? One thing I found interesting is that we are re-compacting the data if the filter changes. But the data we are compacting has already passed through a filter. Those records are gone and are not going to come back. Say if I first do a filter on dim = a
and then change the filter to dim = b
. It might be nice to add a warning in the UI for this when we eventually add those changes.
} | ||
|
||
@Nullable | ||
public DimFilter getFilter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it missing JsonProperty
annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed
ClientCompactionTaskTransformSpec transformSpec; | ||
if (config.getTransformSpec() != null) { | ||
transformSpec = new ClientCompactionTaskTransformSpec( | ||
config.getTransformSpec().getFilter() | ||
); | ||
} else { | ||
transformSpec = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClientCompactionTaskTransformSpec transformSpec; | |
if (config.getTransformSpec() != null) { | |
transformSpec = new ClientCompactionTaskTransformSpec( | |
config.getTransformSpec().getFilter() | |
); | |
} else { | |
transformSpec = null; | |
} | |
ClientCompactionTaskTransformSpec transformSpec = null; | |
if (config.getTransformSpec() != null) { | |
transformSpec = new ClientCompactionTaskTransformSpec( | |
config.getTransformSpec().getFilter() | |
); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - could be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I have finish adding all changes to this PR (implementation, unit tests, docs, integration tests). Your understanding is correct. That is now true for other auto compaction config that can remove data. For example, if you remove one of your dimension using auto compaction then add that dimension back later, the removed dimension will not come back (although that dimension in future data will not be removed). I'll work on clarifying this in the docs to make sure that user are well aware of this new behavior in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
2 high level comments since you are making a lot of changes in this area:
- It would be good to have some sort of integration test with segments in different compaction states so that we can simulate an upgrade where users have segments with different compaction states and validate that we don't re-compact segments each time we add a new field here.
- It would be good to add a warning in the docs that filtering data that matches an entire segment may not work as expected since Druid doesn't handle "empty segments" too well today.
All my comments do not need to block this PR, and can be addressed in future changes.
@Test | ||
public void testIteratorReturnsSegmentsAsSegmentsWasCompactedAndHaveDifferentFilter() throws Exception | ||
{ | ||
NullHandling.initializeForTests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this should be done at the test class level.
Support filtering data in Auto Compaction
Description
This PR adds
filter
field via thetransformSpec
to manual compaction and auto compaction.By setting
filter
, the user will be able to drop data matching thefilter
from the datasource.This PR has: