Fix collect time metric in CoalesceBatches #729
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #658
GpuCoalesceBatches has a collectTime metric that measures the time a task spent waiting to collect all the input batches. This metric can be incorrect when a task becomes blocked by the GPU semaphore trying to access the GPU. GpuCoalesceBatches only starts the collectTime metric in the next() call, but the GPU semphore may try to be acquired by an upstream node when GpuCoalesceBatches's input iterator's hasNext() call is invoked. Any time spent in the first invocation of the input iterator hasNext() method is not tracked, leading to large discrepancies in time as reported in #622.
Moved the start of this into hasNext which was most straight forward.
Note also needed to move the totalTime metric.
attached is screenshot of the fix where GpuCoalesceBatches properly shows collect batch time and total time very close to the GpuShuffledHashJoin build time. Without the fix the times are very different.
After Fix:
Before Fix: