Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support task resource tracking in OpenSearch #2639

Merged

Conversation

tushar-kharbanda72
Copy link
Contributor

@tushar-kharbanda72 tushar-kharbanda72 commented Mar 29, 2022

Description

  • Added framework support for tracking resource utilisation by tasks.
  • Currently, SearchTask and SearchShardTask supports resource tracking. Can be easily extended to any other task if required in future.
  • Dynamic setting support is added to enable and disable the feature at runtime.
  • List Tasks API now makes sure it refreshes the running tasks resource info before returning response.

Issues Resolved

#1179

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@tushar-kharbanda72 tushar-kharbanda72 requested a review from a team as a code owner March 29, 2022 07:45
@tushar-kharbanda72 tushar-kharbanda72 force-pushed the resource-tracker-integration branch from d023569 to d4a085e Compare March 29, 2022 07:46
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d023569a4cd8ec1f4e820d56a141fcfd7dc47489
Log 3842

Reports 3842

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d4a085e20f3c5a9a9aa91ed64f4a39446d651bec
Log 3843

Reports 3843

@tushar-kharbanda72
Copy link
Contributor Author

@dblock @andrross @Bukhtawar Could you please review this PR. Thanks

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 8ac708a5d59beea160f13dd77ca26d0398b84341
Log 3851

Reports 3851

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tushar. The changes look good overall. I'll continue with the full review in sometime but below are some early comments I have.

listener.onFailure(validationException);
return;
}
ThreadContext.StoredContext storedContext = taskManager.addTaskIdInThreadContext(task);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add task in thread context centrally as a part of task creation, instead of doing it at multiple places after task creation, it could lead to maintenance overhead overtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done as per the ThreadContext's usage documentation which is something like :

  1. Clone/Stash existing thread context
  2. Add new fields to the new thread context.
  3. Execute the operation (or take task to a different thread)
  4. Restore the original thread context.

If we want to do it in register method then we would have to face challenges of making ThreadContext mutable which today throws exception and will also need to find a common place to remove the header from the thread context so that some other request doesn't use this just because we missed to remove it from thread context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just call out this works for tasks that forks off on a separate thread(non-network thread)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tushar-kharbanda72 I would like to back @Bukhtawar concerns, manipulating thread context in a such away is not desirable (cause essentially someone down the line could override it again). The suggestion I have in mind is to introduce the distinction between Runnable (no task context) and TaskRunnable extends Runnable (with Task ID as final property). In this case, the taskId could be extracted by thread pool and baked into the thread local context from the start (by checking if Runnable is instance of TaskRunnable). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta Agree with the problem you mentioned of Task Id being overridden. The solution for that I think could be to always preserve Task Id when a thread context stash is called and always make sure it makes it to the new context. Thoughts on this?

Yes having taskId available in TaskRunnable without ThreadContext is desirable but I don't seem to currently have a way to get that information for all use cases in OpenSearch without having changes at multiple places in codebase. I believe this essentially translates to having a task id available where ever we are submitting any runnable to an executor. Right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tushar-kharbanda72 yes, it is possible to manage task id transfer using stash(), but fundamental problem here (in my opinion) - the thread context should not be manipulated through TaskManager from the outside. Yes, it will introduce larger scope of changes, the benefit - less dependency on thread context (black box), explicit link between task and its execution model, just my 5 cents, @Bukhtawar do you have an option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fundamental problem here (in my opinion) - the thread context should not be manipulated through TaskManager from the outside

I agree. This is essentially using the thread context to stash a global variable and is likely to lead to pain in the long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta @andrross Currently coupling the thread context update with when a task gets registered in resource tracking service as both together made more sense.

Having task id set as a member in the runnable is an ideal case scenario but comes with it's own sets of challenges:

  • Submitting a runnable will not be very flexible. We would need Task Id available as a function param at every point throughout the code base wherever we need to submit a runnable.
  • If a blocking sync child action gets executed on the same thread - this would essentially mean updating the runnable's task id in some way and reverting it back as soon as it finishes.
  • For new features folks would need to handle the overlapping case of making task id available and I see this would increase the number of params they handle in the function.

I see the benefits of the suggested approach but it also comes with the above mentioned downsides and these could be even more when we deep dive further.

Also, similar approach is used for request tracing X-Opaque-Id in OpenSearch which I see a more critical feature for some users.

@Bukhtawar Would like to hear your thoughts on this as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tushar-kharbanda72 sorry for the delay, I certainly agree it is challenging to have robust and reliable context propagation support but this is also very important to avoid any surprises by opening the gate for (arbitrary) context manipulations. The approach could be generalized by eliminating the focus on task to abstract runnable with contextual setup: in this case the thread context could be established by runnable (fe, associated with task or anything else) before hand.

There are quite a few examples of contextual runnables like ContextPreservingAbstractRunnable and ContextPreservingRunnable.

X-Opaque-Id is also a good example of context propagation but it is passed through requestHeaders, may be we could extend ThreadContextStruct with non-headers key/value pairs, task id being one of them. I will try to prototype the possible implementation(s) shortly and share with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta Created an issue to discuss further on this #2819 (comment). We can use this to close the final change required for making task id available to runnable.

I believe this can be added in a follow up PR as it's not changing the core logic and just the way runnable knows about the Task. Do let me know your thoughts.

server/src/main/java/org/opensearch/node/Node.java Outdated Show resolved Hide resolved
server/src/main/java/org/opensearch/tasks/TaskManager.java Outdated Show resolved Hide resolved
server/src/main/java/org/opensearch/tasks/TaskManager.java Outdated Show resolved Hide resolved
@dblock
Copy link
Member

dblock commented Mar 29, 2022

@dblock @andrross @Bukhtawar Could you please review this PR. Thanks

Thanks for jumping in @Bukhtawar! Will leave this to you for now until you're happy with the PR and we can have more 👀 on it later.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ba1116110b2dad43043138c9f199dd2a17b1f693
Log 3972

Reports 3972

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 61a6e76d079d0eed9e9ca669d4268a6427dd5805
Log 3973

Reports 3973

@tushar-kharbanda72 tushar-kharbanda72 force-pushed the resource-tracker-integration branch from 61a6e76 to 8a18752 Compare March 31, 2022 12:24
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 8a187522ac9a20b0f90831048e095649fa5adda8
Log 3974

Reports 3974

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a0e291e588a3e4c6a76bc1551f97924d1ee230e1
Log 3977

Reports 3977

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 36d2a5238399791e3e1bae339cecaea400a3c20b
Log 4195

Reports 4195

@tushar-kharbanda72 tushar-kharbanda72 force-pushed the resource-tracker-integration branch from 36d2a52 to b92f5e3 Compare April 5, 2022 09:41
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure b92f5e3aea8b947cc35049b386167244e4c5e246
Log 4196

Reports 4196

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure da2220dc2b957a1ec3420a400612adc108d476fe
Log 4197

Reports 4197

@tushar-kharbanda72 tushar-kharbanda72 force-pushed the resource-tracker-integration branch from da2220d to f6d43f9 Compare April 5, 2022 11:58
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure f6d43f905da4194e190fcbd35d6349adce96b9e4
Log 4198

Reports 4198

@tushar-kharbanda72 tushar-kharbanda72 force-pushed the resource-tracker-integration branch from f6d43f9 to 4b71cba Compare April 5, 2022 12:37
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 4b71cbaef54f81c9bf9298d5ecb1525a552210b4
Log 4200

Reports 4200

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 8c59e526942cfc413733d3b6c5a6db7a2927417c
Log 4271

Reports 4271

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure cbcf3c525bf516fb7164f0221491a7b25c1f96ec
Log 4275

Reports 4275

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 4eb49c56c1c155b30f374e05bbd24efea5645283
Log 4276

Reports 4276

nknize pushed a commit that referenced this pull request Apr 25, 2022
This reverts commit 6517eec.

Signed-off-by: dblock <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
@dblock
Copy link
Member

dblock commented Apr 25, 2022

This was reverted in #3046

@Bukhtawar
Copy link
Collaborator

@dblock can you point me to the build failure links?

@Bukhtawar
Copy link
Collaborator

Looks like this is the only assertion that is tripping

 java.lang.AssertionError: No thread should be marked active when task finishes

@dblock
Copy link
Member

dblock commented Apr 25, 2022

@Bukhtawar There are a few recent mentions of this in issues linked off #3046

@tushar-kharbanda72
Copy link
Contributor Author

@dblock Will take a look at assertion failures and resolve the issue this week.

tushar-kharbanda72 added a commit to tushar-kharbanda72/OpenSearch that referenced this pull request May 23, 2022
* Add Task id in Thread Context

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add resource tracking update support for tasks

Signed-off-by: Tushar Kharbanda <[email protected]>

* List tasks action support for task resource refresh

Signed-off-by: Tushar Kharbanda <[email protected]>

* Handle task unregistration case on same thread

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add lazy initialisation for RunnableTaskExecutionListener

Signed-off-by: Tushar Kharbanda <[email protected]>

* Segregate resource tracking logic to a separate service.

Signed-off-by: Tushar Kharbanda <[email protected]>

* Check for running threads during task unregister

Signed-off-by: Tushar Kharbanda <[email protected]>

* Moved thread context logic to resource tracking service

Signed-off-by: Tushar Kharbanda <[email protected]>

* preserve task id in thread context even after stash

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add null check for resource tracking service

Signed-off-by: Tushar Kharbanda <[email protected]>

* Tracking service tests and minor refactoring

Signed-off-by: Tushar Kharbanda <[email protected]>

* Preserve task id fix with test

Signed-off-by: Tushar Kharbanda <[email protected]>

* Minor test changes and Task tracking call update

Signed-off-by: Tushar Kharbanda <[email protected]>

* Fix Auto Queue executor method's signature

Signed-off-by: Tushar Kharbanda <[email protected]>

* Make task runnable task listener factory implement consumer

Signed-off-by: Tushar Kharbanda <[email protected]>

* Use reflection for ThreadMXBean

Signed-off-by: Tushar Kharbanda <[email protected]>

* Formatting

Signed-off-by: Tushar Kharbanda <[email protected]>

* Replace RunnableTaskExecutionListenerFactory with AtomicReference

Signed-off-by: Tushar Kharbanda <[email protected]>

* Revert "Use reflection for ThreadMXBean"

This reverts commit cbcf3c525bf516fb7164f0221491a7b25c1f96ec.

Signed-off-by: Tushar Kharbanda <[email protected]>

* Suppress Warning related to ThreadMXBean

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add separate method for task resource tracking supported check

Signed-off-by: Tushar Kharbanda <[email protected]>

* Enabled setting by default

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add debug logs for stale context id

Signed-off-by: Tushar Kharbanda <[email protected]>

* Remove hardcoded task overhead in tests

Signed-off-by: Tushar Kharbanda <[email protected]>

* Bump stale task id in thread context log level to warn

Signed-off-by: Tushar Kharbanda <[email protected]>

* Improve assertions and logging

Signed-off-by: Tushar Kharbanda <[email protected]>

Co-authored-by: Tushar Kharbanda <[email protected]>
ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Jul 31, 2022
Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking.
Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

Changes since opensearch-project#2639:
* Replaced the usage of AutoQueueAdjustingExecutorBuilder with ResizableExecutorBuilder
* Resolved merge conflicts
* Fixed broken tests

Signed-off-by: Ketan Verma <[email protected]>
Bukhtawar pushed a commit that referenced this pull request Aug 2, 2022
* Support task resource tracking in OpenSearch

* Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

* Fixed a race-condition when Task is unregistered before its threads are stopped

* Improved error handling and simplified task resource tracking completion listener

* Avoid registering listeners on already completed tasks

Signed-off-by: Ketan Verma <[email protected]>
ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Aug 2, 2022
* Support task resource tracking in OpenSearch

* Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

* Fixed a race-condition when Task is unregistered before its threads are stopped

* Improved error handling and simplified task resource tracking completion listener

* Avoid registering listeners on already completed tasks

Signed-off-by: Ketan Verma <[email protected]>
Bukhtawar pushed a commit that referenced this pull request Aug 2, 2022
* [Backport 2.x] Support task resource tracking in OpenSearch

* Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

* Fixed a race-condition when Task is unregistered before its threads are stopped

* Improved error handling and simplified task resource tracking completion listener

* Avoid registering listeners on already completed tasks

Signed-off-by: Ketan Verma <[email protected]>
nssuresh2007 pushed a commit to nssuresh2007/OpenSearch that referenced this pull request Aug 31, 2022
* Add Task id in Thread Context

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add resource tracking update support for tasks

Signed-off-by: Tushar Kharbanda <[email protected]>

* List tasks action support for task resource refresh

Signed-off-by: Tushar Kharbanda <[email protected]>

* Handle task unregistration case on same thread

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add lazy initialisation for RunnableTaskExecutionListener

Signed-off-by: Tushar Kharbanda <[email protected]>

* Segregate resource tracking logic to a separate service.

Signed-off-by: Tushar Kharbanda <[email protected]>

* Check for running threads during task unregister

Signed-off-by: Tushar Kharbanda <[email protected]>

* Moved thread context logic to resource tracking service

Signed-off-by: Tushar Kharbanda <[email protected]>

* preserve task id in thread context even after stash

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add null check for resource tracking service

Signed-off-by: Tushar Kharbanda <[email protected]>

* Tracking service tests and minor refactoring

Signed-off-by: Tushar Kharbanda <[email protected]>

* Preserve task id fix with test

Signed-off-by: Tushar Kharbanda <[email protected]>

* Minor test changes and Task tracking call update

Signed-off-by: Tushar Kharbanda <[email protected]>

* Fix Auto Queue executor method's signature

Signed-off-by: Tushar Kharbanda <[email protected]>

* Make task runnable task listener factory implement consumer

Signed-off-by: Tushar Kharbanda <[email protected]>

* Use reflection for ThreadMXBean

Signed-off-by: Tushar Kharbanda <[email protected]>

* Formatting

Signed-off-by: Tushar Kharbanda <[email protected]>

* Replace RunnableTaskExecutionListenerFactory with AtomicReference

Signed-off-by: Tushar Kharbanda <[email protected]>

* Revert "Use reflection for ThreadMXBean"

This reverts commit cbcf3c525bf516fb7164f0221491a7b25c1f96ec.

Signed-off-by: Tushar Kharbanda <[email protected]>

* Suppress Warning related to ThreadMXBean

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add separate method for task resource tracking supported check

Signed-off-by: Tushar Kharbanda <[email protected]>

* Enabled setting by default

Signed-off-by: Tushar Kharbanda <[email protected]>

* Add debug logs for stale context id

Signed-off-by: Tushar Kharbanda <[email protected]>

* Remove hardcoded task overhead in tests

Signed-off-by: Tushar Kharbanda <[email protected]>

* Bump stale task id in thread context log level to warn

Signed-off-by: Tushar Kharbanda <[email protected]>

* Improve assertions and logging

Signed-off-by: Tushar Kharbanda <[email protected]>

Co-authored-by: Tushar Kharbanda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants