Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

TPC-DS Q51 Failed in SF100 with "UnsupportedOperationException: Join Type FullOuter is not supported yet". #367

Closed
weiting-chen opened this issue Jun 16, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@weiting-chen
Copy link
Collaborator

weiting-chen commented Jun 16, 2021

Test Configuration:
Spark version = 3.1.1
Data Scale = SF1
Non-partitioned Table
THRIFTSERVER_CONFIG="--name tpcds_power__sf100_native --num-executors 4 --driver-memory 10g --executor-memory 20g --executor-cores 18 --master yarn --deploy-mode client --conf spark.executorEnv.CC=/usr/local/bin/gcc --conf spark.sql.extensions=com.intel.oap.ColumnarPlugin --conf spark.driver.extraClassPath=${nativesql_jars} --conf spark.executor.extraClassPath=${nativesql_jars} --conf spark.executorEnv.LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.shuffleTracking.enable=false --conf spark.shuffle.service.enabled=true --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager --conf spark.sql.join.preferSortMergeJoin=false --conf spark.sql.inMemoryColumnarStorage.batchSize=${batchsize} --conf spark.sql.parquet.columnarReaderBatchSize=${batchsize} --conf spark.sql.execution.arrow.maxRecordsPerBatch=${batchsize} --conf spark.executor.memoryOverhead=2g --conf spark.sql.autoBroadcastJoinThreshold=31457280 --conf spark.sql.broadcastTimeout=3600 --conf spark.driver.maxResultSize=20g --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.server2.thrift.bind.host=sr124 --conf spark.sql.shuffle.partitions=72 --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=40g --conf spark.oap.commitid=6090971f8d63961d6a29b850510b5e779821939b --conf spark.oap.sql.columnar.preferColumnar=true --conf spark.locality.wait=0 --conf spark.executorEnv.LD_PRELOAD=/usr/local/lib/libjemalloc.so --conf spark.oap.sql.columnar.numaBinding=true --conf spark.oap.sql.columnar.coreRange=0-17,36-53|18-35,54-71 --conf spark.sql.files.maxPartitionBytes=1073741824 --conf spark.kryoserializer.buffer.max=1024 --conf spark.sql.columnar.sort.broadcast.cache.timeout=300 --conf spark.oap.sql.columnar.shuffle.customizedCompression=true --conf spark.sql.columnar.nanCheck=true --conf spark.oap.sql.columnar.joinOptimizationLevel=12 --conf spark.oap.sql.columnar.sortmergejoin=true --conf spark.sql.crossJoin.enabled=true --conf spark.oap.sql.columnar.removecoalescebatch=true --conf spark.sql.columnar.nanCheck=true --conf spark.yarn.shuffle.stopOnFailure=true --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=10"

Error:
UnsupportedOperationException: Join Type FullOuter is not supported yet

DAG:
It looks like we miss a condition check when using ColumnarSHJ with FullOuter. It should also fall back to row-based processing such as SMJ in this case.

Workaround:
Set preferSMJ = true can avoid this issue.

@weiting-chen weiting-chen added the bug Something isn't working label Jun 16, 2021
@rui-mo
Copy link
Collaborator

rui-mo commented Jun 16, 2021

I had a fix to fallback Full Outer Join in SHJ and BHJ in this commit: aaf3743, and I can also add it to SMJ.

@zhouyuan
Copy link
Collaborator

should be fixed by #356

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants