Add in a layer of indirection for task completion callbacks [databricks] #9009

revans2 · 2023-08-11T16:00:36Z

This fixes #8482

It touches a lot of files because I wanted to move as much of the old API over to the new one so that I could properly test it. I think there may be other places where removing a callback might be nice too, but I didn't take that route here.

Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 · 2023-08-11T16:00:56Z

build

revans2 · 2023-08-11T18:50:47Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala

tests/src/test/scala/com/nvidia/spark/rapids/GpuSemaphoreSuite.scala

revans2 · 2023-08-14T13:07:16Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala

abellina · 2023-08-14T14:23:45Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala

+    private var wasCalled = false
+
+    override def removeCallback(): Unit = {
+      val topLevel = ScalableTaskCompletion.getTopLevel(tc.taskAttemptId())


why not call remove on the hashmap blindly, rather than get and remove?

This API is removing a callback from a task. We get the collection object that holds task callbacks and check to see if it (the collection object) is null. If it is not we then remove ourselves, the callback, from that collection. I guess each handle could hold onto a reference to the original collection associated with it instead of asking the singleton for it. Is that what you are asking?

abellina · 2023-08-14T14:33:55Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala

+   * When the current task completes, if there is a current task call the given function.
+   * @return a handle that can be used to remove the callback.
+   */
+  def onTaskCompletionIfNotTest(f: => Unit): TaskCompletionCallbackHandle = {


I am a little unclear on when I should use onTaskCompletionIfNotTest vs onTaskCompletion and it feels like having multiple choices is an area where we could introduce bugs. Is there any way we could change the logic if we know we are in the scala tests, but the calling code is unaware and has 1 choice to register?

onTaskCompletionIfNotTest inserts a callback if a TaskContext is available. If none is available, then it is not inserted. This was done by a lot of code for unit tests. The problem is that I don't want to encourage this behavior. The API itself is somewhat dangerous because we might or might not get a callback. I am fine with removing it and then forcing everywhere it is used to put in their own workaround.

revans2 · 2023-08-14T17:15:30Z

build

revans2 · 2023-08-14T17:16:44Z

@abellina and @jlowe I think I have addressed all feedback. If you could take another look I'd appreciate it.

Add in a layer of indirection for task completion callbacks

aa6e171

Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 self-assigned this Aug 11, 2023

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Aug 11, 2023

Fix databicks typo

d4925e5

jlowe reviewed Aug 11, 2023

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala Outdated Show resolved Hide resolved

tests/src/test/scala/com/nvidia/spark/rapids/GpuSemaphoreSuite.scala Outdated Show resolved Hide resolved

revans2 added 2 commits August 11, 2023 16:49

Review comments

ad23db0

More review comments

aa0649b

jlowe reviewed Aug 14, 2023

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ScalableTaskCompletion.scala Outdated Show resolved Hide resolved

abellina reviewed Aug 14, 2023

View reviewed changes

Review comments

64c8e41

jlowe approved these changes Aug 14, 2023

View reviewed changes

abellina approved these changes Aug 14, 2023

View reviewed changes

revans2 merged commit efe3310 into NVIDIA:branch-23.10 Aug 14, 2023
27 checks passed

revans2 deleted the task_callback_scaling branch August 14, 2023 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in a layer of indirection for task completion callbacks [databricks] #9009

Add in a layer of indirection for task completion callbacks [databricks] #9009

revans2 commented Aug 11, 2023

revans2 commented Aug 11, 2023

revans2 commented Aug 11, 2023

revans2 commented Aug 14, 2023

abellina Aug 14, 2023

revans2 Aug 14, 2023

abellina Aug 14, 2023

revans2 Aug 14, 2023

revans2 commented Aug 14, 2023

revans2 commented Aug 14, 2023

Add in a layer of indirection for task completion callbacks [databricks] #9009

Add in a layer of indirection for task completion callbacks [databricks] #9009

Conversation

revans2 commented Aug 11, 2023

revans2 commented Aug 11, 2023

revans2 commented Aug 11, 2023

revans2 commented Aug 14, 2023

abellina Aug 14, 2023

Choose a reason for hiding this comment

revans2 Aug 14, 2023

Choose a reason for hiding this comment

abellina Aug 14, 2023

Choose a reason for hiding this comment

revans2 Aug 14, 2023

Choose a reason for hiding this comment

revans2 commented Aug 14, 2023

revans2 commented Aug 14, 2023