[BUG] Check that keys are not null when creating a map #8984

abellina · 2023-08-10T15:03:34Z

I was able to create a map on the GPU with a null key:

scala> spark.sql("select map(x, -1) from (select explode(array(1,null)) as x)").show()
23/08/10 15:00:25 WARN GpuOverrides: 
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
  @Partitioning <SinglePartition$> could run on GPU
  *Exec <ProjectExec> will run on GPU
    *Expression <Alias> cast(map(x#1, -1) as string) AS map(x, -1)#5 will run on GPU
      *Expression <Cast> cast(map(x#1, -1) as string) will run on GPU
        *Expression <CreateMap> map(x#1, -1) will run on GPU
    *Exec <GenerateExec> will run on GPU
      *Expression <Explode> explode([1,null]) will run on GPU
      ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec

+------------+                                                                  
|  map(x, -1)|
+------------+
|   {1 -> -1}|
|{null -> -1}|
+------------+

That said, this is not allowed on the CPU, so we should prevent it from happening. If we allowed these maps with null keys who knows what else could break within spark in really odd ways.

CPU output example;

scala> spark.sql("select map(x, -1) from (select explode(array(1,null)) as x)").show()
23/08/10 15:00:41 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1) (executor 0): java.lang.RuntimeException: Cannot use null as map key!
	at org.apache.spark.sql.errors.QueryExecutionErrors$.nullAsMapKeyNotAllowedError(QueryExecutionErrors.scala:260)
	at org.apache.spark.sql.catalyst.util.ArrayBasedMapBuilder.put(ArrayBasedMapBuilder.scala:56)
	at org.apache.spark.sql.catalyst.util.ArrayBasedMapBuilder.putAll(ArrayBasedMapBuilder.scala:94)
	at org.apache.spark.sql.catalyst.util.ArrayBasedMapBuilder.from(ArrayBasedMapBuilder.scala:122)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

The text was updated successfully, but these errors were encountered:

revans2 · 2023-08-10T15:10:58Z

Yup it looks like

GpuCreateMap.createMapFromKeysValuesAsStructs is not checking for null values on the keys. We should add the check there unconditionally.

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 10, 2023

abellina added test Only impacts tests reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed test Only impacts tests labels Aug 10, 2023

abellina mentioned this issue Aug 10, 2023

Fix test_map_scalars_supported_key_types #8971

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 16, 2023

revans2 self-assigned this Sep 13, 2023

revans2 mentioned this issue Sep 13, 2023

Check for null keys when creating map #9237

Merged

revans2 closed this as completed in #9237 Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Check that keys are not null when creating a map #8984

[BUG] Check that keys are not null when creating a map #8984

abellina commented Aug 10, 2023

revans2 commented Aug 10, 2023

[BUG] Check that keys are not null when creating a map #8984

[BUG] Check that keys are not null when creating a map #8984

Comments

abellina commented Aug 10, 2023

revans2 commented Aug 10, 2023