Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] WholestageCodegen shows as unsupported when children are removed #860

Closed
amahussein opened this issue Mar 20, 2024 · 1 comment · Fixed by #1142
Closed

[BUG] WholestageCodegen shows as unsupported when children are removed #860

amahussein opened this issue Mar 20, 2024 · 1 comment · Fixed by #1142
Assignees
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@amahussein
Copy link
Collaborator

Describe the bug

Currently this is how we decide whther wholStage exec is supported:

    // if any of the execs in WholeStageCodegen supported mark this entire thing
    // as supported
    val anySupported = childNodes.exists(_.isSupported == true)
    val unSupportedExprsArray = childNodes.filter(_.unsupportedExprs.nonEmpty ).map(
      x => x.unsupportedExprs).flatten.toArray
    // average speedup across the execs in the WholeStageCodegen for now

There is a corner case when WholestageCodegen contains an exec like ColumnarToRow.
As a result the WholestageCodegen is set to unsupported but the problem that the ColumnarToRow is supposed to be removed.
In that case, the WholestageCodegen should either be set to shouldRemove or it should be marked as supported because it is theoretically empty.

@amahussein amahussein added bug Something isn't working ? - Needs Triage core_tools Scope the core module (scala) and removed ? - Needs Triage labels Mar 20, 2024
@amahussein amahussein self-assigned this Apr 1, 2024
@tgravescs
Copy link
Collaborator

Agree, we should just mark it as should remove like we do when its not in a wholestagecodegen. If that is only exec in it we could just remove the entire wholestagecode gen. That does bring up an interesting question though, that is if cpu has this transition, it might completely go away on GPU if everything is staying columnar. That could make the speedup better. For the qual tool speedup calculation if we remove it we will likely spread that time across the other execs. I don't think we have any logic complex enough to deal with that very well right now.

amahussein added a commit to amahussein/spark-rapids-tools that referenced this issue Jun 26, 2024
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Fixes NVIDIA#860

This commit adjusts the accuracy of the Qual tool by targeting the
following issues:
- child nodes of wholeStageCodeGen would not be assigned to stages if
  they have no metrics.
- there is a corner case when all the childs of wholeStageCodeGen are
  marked as shouldRemove. In that case, the node would still be
considered unsupported and contribute to the speedup.

The changes are:
- propagate the stageIDs of wholeStageCodeGen to the child nodes
- a wholeStageCodeGen node is marked as shouldRemove when all the childs
  are marked as shouldRemove.
- fix unit-test which has 4 different wholeStageCodeGen nodes that
  contain only `ColumnarToRow` execs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants