This repository has been archived by the owner on Sep 18, 2023. It is now read-only.
TPC-DS Q51 Failed in SF100 with "UnsupportedOperationException: Join Type FullOuter is not supported yet". #367
Labels
bug
Something isn't working
Test Configuration:
Spark version = 3.1.1
Data Scale = SF1
Non-partitioned Table
THRIFTSERVER_CONFIG="--name tpcds_power__sf100_native --num-executors 4 --driver-memory 10g --executor-memory 20g --executor-cores 18 --master yarn --deploy-mode client --conf spark.executorEnv.CC=/usr/local/bin/gcc --conf spark.sql.extensions=com.intel.oap.ColumnarPlugin --conf spark.driver.extraClassPath=${nativesql_jars} --conf spark.executor.extraClassPath=${nativesql_jars} --conf spark.executorEnv.LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.shuffleTracking.enable=false --conf spark.shuffle.service.enabled=true --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager --conf spark.sql.join.preferSortMergeJoin=false --conf spark.sql.inMemoryColumnarStorage.batchSize=${batchsize} --conf spark.sql.parquet.columnarReaderBatchSize=${batchsize} --conf spark.sql.execution.arrow.maxRecordsPerBatch=${batchsize} --conf spark.executor.memoryOverhead=2g --conf spark.sql.autoBroadcastJoinThreshold=31457280 --conf spark.sql.broadcastTimeout=3600 --conf spark.driver.maxResultSize=20g --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.server2.thrift.bind.host=sr124 --conf spark.sql.shuffle.partitions=72 --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=40g --conf spark.oap.commitid=6090971f8d63961d6a29b850510b5e779821939b --conf spark.oap.sql.columnar.preferColumnar=true --conf spark.locality.wait=0 --conf spark.executorEnv.LD_PRELOAD=/usr/local/lib/libjemalloc.so --conf spark.oap.sql.columnar.numaBinding=true --conf spark.oap.sql.columnar.coreRange=0-17,36-53|18-35,54-71 --conf spark.sql.files.maxPartitionBytes=1073741824 --conf spark.kryoserializer.buffer.max=1024 --conf spark.sql.columnar.sort.broadcast.cache.timeout=300 --conf spark.oap.sql.columnar.shuffle.customizedCompression=true --conf spark.sql.columnar.nanCheck=true --conf spark.oap.sql.columnar.joinOptimizationLevel=12 --conf spark.oap.sql.columnar.sortmergejoin=true --conf spark.sql.crossJoin.enabled=true --conf spark.oap.sql.columnar.removecoalescebatch=true --conf spark.sql.columnar.nanCheck=true --conf spark.yarn.shuffle.stopOnFailure=true --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=10"
Error:
UnsupportedOperationException: Join Type FullOuter is not supported yet
DAG:
It looks like we miss a condition check when using ColumnarSHJ with FullOuter. It should also fall back to row-based processing such as SMJ in this case.
Workaround:
Set preferSMJ = true can avoid this issue.
The text was updated successfully, but these errors were encountered: