You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Found some CPU/GPU mismatch cases of json_tuple function when I'm changing the StringGen in IT. In these cases, some results from plugins look correct, but results from plugins show null. I guess that's a bug in spark, but anyway it leads to mismatching.
Steps/Code to reproduce bug
Simply making the length longer will fail the test:
23/06/25 19:44:34 WARN GpuOverrides:
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
@Partitioning <SinglePartition$> could run on GPU
*Exec <GenerateExec> will run on GPU
*Expression <JsonTuple> json_tuple(c1#4, a, email, owner, b, b$, b$$) will run on GPU
! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
@Expression <AttributeReference> c1#4 could run on GPU
+----+--------------------+--------+----+----+----+
|c0 |c1 |c2 |c3 |c4 |c5 |
+----+--------------------+--------+----+----+----+
|null|[email protected]|fzgbbtbm|null|null|null||null|[email protected]|zrpixdyb|null|null|null||null|null |null |null|null|null||null|[email protected]|crdswdlu|null|null|null||null|[email protected]|qskumfra|null|null|null||null|[email protected]|ccezwsja|null|null|null||null|[email protected]|prmhpbkd|null|null|null||null|[email protected]|gcvgoqzu|null|null|null||null|[email protected]|ocdrlqus|null|null|null||null|[email protected]|ieuiziyq|null|null|null|
+----+--------------------+--------+----+----+----+
Note: the strings generated by StringGen in this test_json_tuple may not cover all cases because of #8593
Expected behavior
The GPU should produce the same results as the CPU. If we don't plan to fix it, at least related IT cases shouldn't fail because of it.
Environment details (please complete the following information)
Latest code(23.08) and spark 3.3.0
The text was updated successfully, but these errors were encountered:
Seems it is because of the possible leading zero in the price of the bike. changing the price in the missing line to 14.02 will make the line appear in the cpu results. I will avoid generating leading zeros in test cases as a workaround.
Also, the issue will affect test_get_json_object too.
thirtiseven
changed the title
[BUG] CPU/GPU mismatch cases in json_tuple function
[BUG] Mismatch cases in json_tuple function when json items have leading zeroes
Jun 26, 2023
So the rapids plugin will strip leading zeros from all numbers, but allowNumericLeadingZeros is set to false in Spark. So simply avoiding generating leading zeros should fix it.
Describe the bug
Found some CPU/GPU mismatch cases of
json_tuple
function when I'm changing theStringGen
in IT. In these cases, some results from plugins look correct, but results from plugins show null. I guess that's a bug in spark, but anyway it leads to mismatching.Steps/Code to reproduce bug
Simply making the length longer will fail the test:
Here is a case that can be reproduced in spark-shell:
cpu result:
gpu result:
Note: the strings generated by
StringGen
in thistest_json_tuple
may not cover all cases because of #8593Expected behavior
The GPU should produce the same results as the CPU. If we don't plan to fix it, at least related IT cases shouldn't fail because of it.
Environment details (please complete the following information)
Latest code(23.08) and spark 3.3.0
The text was updated successfully, but these errors were encountered: