-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compaction tasks reports getting overwritten #15981
Fix compaction tasks reports getting overwritten #15981
Conversation
ed286b1
to
f9a736a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me, imo we don't need the task id in the keys of the reports though,
i think ingestionStatsAndErrors_0 ingestionStatsAndErrors_1 (similar to what is done by query_controller tasks) or maybe the interval that is being compacted instead would be good.
@georgew5656 Modified to |
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
Show resolved
Hide resolved
...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
Show resolved
Hide resolved
Is the current format documented? If so, we should consider options that preserve compatibility with previous documentation. If not, changing the format is fair game. Although, if you are depending on the specific output format, you might want to add documentation, as otherwise it might be changed later in a way you don't expect. |
I tested this change a little bit and found that if you run a compact task with no additional subtasks - the report is an empty json object. Before this change the report would have some details. I think we should fix it so that the task report details are preserved for a compact task with no sub-tasks, even if we change the format of the report. +1 for gian's suggestion of documenting the report so that other users don't break it once we've decided on a particular format. I think it would be a good idea to also add a test of some sort that validates the format of the report. There are a few integration tests for compaction - perhaps one of those would be a good place to add validation for the format of the task report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me on the approach. Thanks for the integration test and docs @adithyachakilam!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, @adithyachakilam , I couldn't get to reviewing this PR sooner.
I have left some comments for better code readability. There is also a minor concern regarding holding too many reports and causing an OOM.
Since this PR has already been merged, the comments can be addressed in a follow-up PR.
...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
Show resolved
Hide resolved
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
Show resolved
Hide resolved
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
Show resolved
Hide resolved
...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
Show resolved
Hide resolved
@@ -499,6 +502,7 @@ public TaskStatus runTask(TaskToolbox toolbox) throws Exception | |||
log.info("Generated [%d] compaction task specs", totalNumSpecs); | |||
|
|||
int failCnt = 0; | |||
Map<String, TaskReport> completionReports = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the compaction is being run on several intervals (not very likely but still a possibility), can holding all the task reports in memory potentially cause an OOM exception? Currently, most of the task reports contain only ingestStatsAndErrors
but they may contain other stuff in the future.
In the future, we should consider writing out the sub-reports in a streaming fashion alongwith the required changes to the TaskReportFileWriter
API.
For now, we should add a guardrail here so that we don't try to hold too many reports in memory and fail with an OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think is a good size to hold then ?
Description
A single compaction task could be splitted the into multiple index tasks based on the interval given in spec. In such cases, all the index tasks are run with same id and the task completion report is getting over written. This PR skips writing the report for each parallel index task and instead writes on the compaction task completion.
With this change instead of overwriting the reports file, we make multiple entries and it looks something like:
Key changed/added classes in this PR
CompactionTask
ParallelIndexSupervisorTask
This PR has: