Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Nightly build failed: ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims #8962

Closed
tgravescs opened this issue Aug 9, 2023 · 6 comments · Fixed by #8977 or #9006
Assignees
Labels
bug Something isn't working

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
exec binary-dedupe.sh failed, exit code is 255, error msg is org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims

There was a change yesterday to this file, so perhaps its pulling in something that isn't shim agnostic:
[[https://github.com/NVIDIA/spark-rapids/commit/1fb5fc4cbea646588fababca5c50a8829e90348e#diff-c2b3f7332a76954f1e54afe7[…]707057b51d20faddd33db7b0045fd](https://github.com/NVIDIA/spark-rapids/commit/1fb5fc4cbea646588fababca5c50a8829e90348e#diff-c2b3f7332a76954f1e54afe7a088136071e707057b51d20faddd33db7b0045fd)](1fb5fc4#diff-c2b3f7332a76954f1e54afe7a088136071e707057b51d20faddd33db7b0045fd)

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 9, 2023
@mythrocks
Copy link
Collaborator

mythrocks commented Aug 9, 2023

perhaps its pulling in something that isn't shim agnostic...

At first glance, I don't see what it might be.

This test pulls in the following:

import org.apache.spark.sql.execution.adaptive.{..., AdaptiveSparkPlanHelper, ...}
...
import org.apache.spark.sql.execution.FileSourceScanExec
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.catalyst.parser.CatalystSqlParser

Judging from how these classes are used elsewhere, it appears that they don't need shimming. This needs a closer look.

Edit: I wonder if it's because GpuFileSourceScanExec is being used here. It has dependencies on some shimmed classes.

import com.nvidia.spark.rapids.shims.{GpuDataSourceRDD, PartitionedFileUtilsShim, SparkShimImpl}

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 10, 2023

Looking at it with @razajafri, it appears to be caused by extends AdaptiveSparkPlanHelper that pulls more methods on the DBR side. This could be even a compatible difference because methods are added and some method signatures are changed to receive more params but they have a default value. However, our current logic requires bitwise identity for a fool-proof compatibility. So this trait should be inherited via a shim such as PlanShims or a new dedicated one.

@jlowe
Copy link
Member

jlowe commented Aug 10, 2023

Does this need to target branch-23.08 or is it only an issue in 23.10?

@gerashegalov
Copy link
Collaborator

Does this need to target branch-23.08 or is it only an issue in 23.10?

We should target 23.08 because that what the offending PR's target is https://github.com/NVIDIA/spark-rapids/pull/8744/files#diff-c2b3f7332a76954f1e54afe7a088136071e707057b51d20faddd33db7b0045fdR32

It passed CI because pre-merge did not include [databricks]

@jlowe jlowe self-assigned this Aug 11, 2023
gerashegalov added a commit that referenced this issue Aug 11, 2023
Fixes #9008

Testing:

- [x] reproduces #8962 
- [x] build.sh and test.sh on top of #9006 works

Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
@sameerz
Copy link
Collaborator

sameerz commented Aug 13, 2023

Retargeting this issue to 23.10, as the original PR is a test case and is reverted in 23.08.

@jlowe
Copy link
Member

jlowe commented Aug 14, 2023

This should not have been retargeted but rather simply closed. The bug is fixed, it just happened to be fixed by reverting rather than a new code change.

@jlowe jlowe closed this as completed Aug 14, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment