Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handoff wait time to IngestionStatsAndErrorsTaskReportData #11090

Merged

Conversation

capistrant
Copy link
Contributor

@capistrant capistrant commented Apr 9, 2021

Description

Follow on to #10676.

The main change is adding segmentAvailabilityWaitTimeMs to IngestionStatsAndErrorsTaskReportData. This is the milliseconds that a task waited for its segments to be handed off to Historical nodes. If there is no wait for handoff, the value is simply 0. Otherwise the handoff value can be between 0 and the configured timeout.

Secondly, I made a small refactor of AbstractBatchIndexTask#waitForSegmentAvailability. The method now sets the value for segmentAvailabilityConfirmationCompleted as well as returns the boolean value assigned to that variable. In the past the code only returned the boolean and relied on the caller to set the value, this seemed like poor practice.


Key changed/added classes in this PR
  • IngestionStatsAndErrorsTaskReportData
  • AbstractBatchIndexTask

This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@capistrant
Copy link
Contributor Author

@jihoonson you mentioned that it would be nice to have this added to the report in #10676 (discussion) If you get a chance, could you take a peek at this in the next few weeks? would be great to get it into the code before 0.22.0 release cycle starts so it goes in as the same time as the pr to add the handoff wait. Thanks!

@jihoonson
Copy link
Contributor

Thank you for the follow-up! Code changes LGTM, but can you please add some integration tests? I guess you can modify those integration tests you added in the previous PR to verify that the waitTime is always larger than 0.

@capistrant
Copy link
Contributor Author

Thank you for the follow-up! Code changes LGTM, but can you please add some integration tests? I guess you can modify those integration tests you added in the previous PR to verify that the waitTime is always larger than 0.

good point. I went with a simple assertion in the existing code to verify the report has a non-zero wait time for the IT that perform a handoff wait. I believe that should suffice in making sure the report stays valid

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Do we need any accompanying docs for this change?

Nice addition to get the IT for free :)

@@ -198,7 +198,6 @@ A sample task is shown below:
|id|The task ID. If this is not explicitly specified, Druid generates the task ID using task type, data source name, interval, and date-time stamp. |no|
|spec|The ingestion spec including the data schema, IOConfig, and TuningConfig. See below for more details. |yes|
|context|Context containing various task configuration parameters. See below for more details.|no|
|awaitSegmentAvailabilityTimeoutMillis|Long|Milliseconds to wait for the newly indexed segments to become available for query after ingestion completes. If `<= 0`, no wait will occur. If `> 0`, the task will wait for the Coordinator to indicate that the new segments are available for querying. If the timeout expires, the task will exit as successful, but the segments were not confirmed to have become available for query. Note for compaction tasks: you should not set this to a non-zero value because it is not supported by the compaction task type at this time.|no (default = 0)|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized that this doc was misplaced and duplicated in the native-batch file. It is found below in the tuningConfig section where it should be. This is just clean up.

@capistrant
Copy link
Contributor Author

LGTM! Do we need any accompanying docs for this change?

Nice addition to get the IT for free :)

good point. just added to the tasks.md docs to help inform users

@capistrant
Copy link
Contributor Author

@jihoonson should we have to do anything more than adding this to release notes in future release to notify clients of json payload change? I suppose it is possible that clients are relying on some static schema for this report JSON so the new field could break code that doesn't ignore new fields being added.

@jihoonson
Copy link
Contributor

LGTM, Thanks @capistrant. I don't particularly worry about it but we can include some warning about it in the release notes if you want.

@jihoonson jihoonson merged commit 5c3f3da into apache:master Sep 21, 2021
@maytasm
Copy link
Contributor

maytasm commented Sep 21, 2021

This change is causing build to fail. Seems like the change #11688 that went in a few days ago use the IngestionStatsAndErrorsTaskReportData constructor which is now changed in this PR. Travis green check in this PR was from 25 days ago. In the future, I recommend rerunning Travis before merging for checks that passed long time ago.

@capistrant
Copy link
Contributor Author

This change is causing build to fail. Seems like the change #11688 that went in a few days ago use the IngestionStatsAndErrorsTaskReportData constructor which is now changed in this PR. Travis green check in this PR was from 25 days ago. In the future, I recommend rerunning Travis before merging for checks that passed long time ago.

agree 100%. I've been trying to stay on top of merging with master on a regular basis to make sure I keep CI fresh but let this one slip. Thanks for the quick fix for master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants