-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Result mismatch found in FlushableAgg #6630
Comments
could you post Gluten version/commit? FlushableAgg has non-empty metrics of |
Thanks for reply. My Gluten version is v1.2.0-rc1 |
@jiangjiangtian could you try with |
yes, I have tried. The result is right. |
@Yohahaha I add two screenshots above, you can see that when I set |
@zhztheplayer Can you help take a look at this problem? |
I see the new description. I have submit a similar issue #4421. |
#4421 is closed. Is the issue fixed? |
@jiangjiangtian Please check it. |
ok, I will check it. |
@kecookier we will take a look |
SQL explain.
|
@PHILO-HE Can you take a look? |
I managed to get a more similar case and still not reproduced the issue. # Generate partitioned data:
tools/gluten-it/sbin/gluten-it.sh data-gen-only --local-cluster --auto-cluster-resource -s=100.0 --gen-partitioned-data
tools/gluten-it/ sbin/gluten-it.sh spark-shell --local-cluster --auto-cluster-resource -s=100.0 --data-gen=skip
# In opened Spark shell, run:
spark sql "set spark.sql.adaptive.coalescePartitions.minPartitionSize=500m" show # force AQEShuffleReadExec
spark sql "set spark.sql.autoBroadcastJoinThreshold=-1" show # disable bhj
val df = spark sql "select * from (select distinct l_orderkey,l_partkey from lineitem) a inner join (select l_orderkey from lineitem limit 10) b on a.l_orderkey = b.l_orderkey limit 10" # run query
df collect # execute
df explain # explain And the plan explained is fine: In debugger, AQEShuffleReadExec has correct outputPartitioning: |
To reproduce this issue, ensure that the outputPartitioning of AQEShuffleReadExec is UnknownPartitioning. This means that the child (AQEShuffleReadExec) output is NOT partitioned by aggregation keys. Under these conditions, the final aggregation will be transformed into FlushableHashAggregate. |
Backend
VL (Velox)
Bug description
I have a sql query that runs in gluten and vanilla spark, its format is as follows:
I get different number of rows. And I look at the spark ui, I found the reason is that the numbers of rows of the second subquery don't match.
vanilla spark:
gluten:
Actually, I found that some rows are duplicate.
But when I just run the second subquery, I get the right result.
We can see the plan is different. The second hash aggregation is regular.
Besides, I set
spark.gluten.sql.columnar.backend.velox.flushablePartialAggregation
to false and I get the right result.So I think there might be a bug for flushable hash aggregation or the plan conversion, but I can't find a small SQL to demonstrate the bug.
I'm sorry for not having a small example.
Spark version
3.0
Spark configurations
No response
System information
Velox System Info v0.0.2
Commit: 96712646c63bf4305cca4eaa7dfd26c2179547b1
CMake Version: 3.17.5
System: Linux-3.10.0-862.mt20190308.130.el7.x86_64
Arch: x86_64
CPU Name: Model name: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
C++ Compiler: /opt/rh/devtoolset-10/root/usr/bin/c++
C++ Compiler Version: 10.2.1
C Compiler: /opt/rh/devtoolset-10/root/usr/bin/cc
C Compiler Version: 10.2.1
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response
The text was updated successfully, but these errors were encountered: