Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unshim cache serializer and other 311+-all code [databricks] #5076

Merged
merged 8 commits into from
Mar 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions dist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,8 @@ for each version of Spark supported in the jar, i.e., spark311/, spark312/, spar

If you have to change the contents of the uber jar the following files control what goes into the base jar as classes that are not shaded.

1. `unshimmed-common-from-spark311.txt` - this has classes and files that should go into the base jar with their normal
1. `unshimmed-common-from-spark311.txt` - This has classes and files that should go into the base jar with their normal
package name (not shaded). This includes user visible classes (i.e., com/nvidia/spark/SQLPlugin), python files,
and other files that aren't version specific. Uses Spark 3.1.1 built jar for these base classes as explained above.
2. `unshimmed-from-each-spark3xx.txt` - This is applied to all the individual Spark specific version jars to pull
any files that need to go into the base of the jar and not into the Spark specific directory.
3. `unshimmed-spark311.txt` - This is applied to all the Spark 3.1.1 specific version jars to pull any files that need to go
into the base of the jar and not into the Spark specific directory.
1 change: 0 additions & 1 deletion dist/maven-antrun/build-parallel-worlds.xml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@
<patternset id="shared-world-includes">
<includesfile name="${project.basedir}/unshimmed-common-from-spark311.txt"/>
<includesfile name="${project.basedir}/unshimmed-from-each-spark3xx.txt"/>
<includesfile name="${project.basedir}/unshimmed-spark311.txt"/>
</patternset>
</unzip>
<unzip src="${aggregatorPrefix}-spark@{bv}.jar"
Expand Down
2 changes: 2 additions & 0 deletions dist/unshimmed-common-from-spark311.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ META-INF/LICENSE
META-INF/NOTICE
META-INF/maven/**
com/nvidia/spark/ExclusiveModeGpuDiscoveryPlugin*
com/nvidia/spark/GpuCachedBatchSerializer*
com/nvidia/spark/ParquetCachedBatchSerializer*
com/nvidia/spark/RapidsUDF*
com/nvidia/spark/SQLPlugin*
com/nvidia/spark/rapids/ColumnarRdd*
Expand Down
2 changes: 0 additions & 2 deletions dist/unshimmed-spark311.txt
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
com/nvidia/spark/ParquetCachedBatchSerializer*
com/nvidia/spark/GpuCachedBatchSerializer*
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@ Name | Description | Default Value | Notes
<a name="sql.exec.HashAggregateExec"></a>spark.rapids.sql.exec.HashAggregateExec|The backend for hash based aggregations|true|None|
<a name="sql.exec.ObjectHashAggregateExec"></a>spark.rapids.sql.exec.ObjectHashAggregateExec|The backend for hash based aggregations supporting TypedImperativeAggregate functions|true|None|
<a name="sql.exec.SortAggregateExec"></a>spark.rapids.sql.exec.SortAggregateExec|The backend for sort based aggregations|true|None|
<a name="sql.exec.InMemoryTableScanExec"></a>spark.rapids.sql.exec.InMemoryTableScanExec|Implementation of InMemoryTableScanExec to use GPU accelerated Caching|true|None|
<a name="sql.exec.InMemoryTableScanExec"></a>spark.rapids.sql.exec.InMemoryTableScanExec|Implementation of InMemoryTableScanExec to use GPU accelerated caching|true|None|
<a name="sql.exec.DataWritingCommandExec"></a>spark.rapids.sql.exec.DataWritingCommandExec|Writing data|true|None|
<a name="sql.exec.BatchScanExec"></a>spark.rapids.sql.exec.BatchScanExec|The backend for most file input|true|None|
<a name="sql.exec.BroadcastExchangeExec"></a>spark.rapids.sql.exec.BroadcastExchangeExec|The backend for broadcast exchange of data|true|None|
Expand Down
2 changes: 1 addition & 1 deletion docs/supported_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -611,7 +611,7 @@ Accelerator supports are described below.
</tr>
<tr>
<td rowspan="1">InMemoryTableScanExec</td>
<td rowspan="1">Implementation of InMemoryTableScanExec to use GPU accelerated Caching</td>
<td rowspan="1">Implementation of InMemoryTableScanExec to use GPU accelerated caching</td>
<td rowspan="1">None</td>
<td>Input/Output</td>
<td>S</td>
Expand Down
20 changes: 0 additions & 20 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/311-nondb/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-noncdh/scala</source>
Expand All @@ -140,7 +139,6 @@
<module>api_validation</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -181,7 +179,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/312db/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-db/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-noncdh/scala</source>
Expand Down Expand Up @@ -232,7 +229,6 @@
<sources>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/312-nondb/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-noncdh/scala</source>
Expand All @@ -258,7 +254,6 @@
<module>tools</module>
<module>aggregator</module>
<module>api_validation</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -286,7 +281,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/313/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-noncdh/scala</source>
Expand All @@ -312,7 +306,6 @@
<module>tools</module>
<module>aggregator</module>
<module>api_validation</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -340,7 +333,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/314/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-noncdh/scala</source>
Expand Down Expand Up @@ -376,7 +368,6 @@
<module>tools</module>
<module>aggregator</module>
<module>api_validation</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -404,7 +395,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/320/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/311until330-nondb/scala</source>
Expand Down Expand Up @@ -440,7 +430,6 @@
<module>udf-compiler</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -468,7 +457,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/321/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/311until330-nondb/scala</source>
Expand Down Expand Up @@ -504,7 +492,6 @@
<module>udf-compiler</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -532,7 +519,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/322/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/311until330-nondb/scala</source>
Expand Down Expand Up @@ -568,7 +554,6 @@
<module>udf-compiler</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -610,7 +595,6 @@
<sources>
<source>${project.basedir}/src/main/321db/scala</source>
<source>${project.basedir}/src/main/311until330-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-db/scala</source>
<source>${project.basedir}/src/main/320+/scala</source>
<source>${project.basedir}/src/main/321+/scala</source>
Expand Down Expand Up @@ -659,7 +643,6 @@
<configuration>
<sources>
<source>${project.basedir}/src/main/330/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/320+/scala</source>
<source>${project.basedir}/src/main/320+-nondb/scala</source>
Expand Down Expand Up @@ -693,7 +676,6 @@
<module>udf-compiler</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down Expand Up @@ -722,7 +704,6 @@
<sources>
<source>${project.basedir}/src/main/311-nondb/scala</source>
<source>${project.basedir}/src/main/311cdh/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311cdh/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
Expand Down Expand Up @@ -753,7 +734,6 @@
<module>udf-compiler</module>
<module>tools</module>
<module>aggregator</module>
<module>tests-spark310+</module>
</modules>
</profile>
<profile>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ package com.nvidia.spark.rapids.shims

import scala.collection.mutable.ListBuffer

import com.nvidia.spark.InMemoryTableScanMeta
import com.nvidia.spark.rapids._
import org.apache.hadoop.fs.FileStatus

Expand All @@ -35,7 +34,6 @@ import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils}
import org.apache.spark.sql.connector.read.Scan
import org.apache.spark.sql.execution._
import org.apache.spark.sql.execution.adaptive._
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.command.{AlterTableRecoverPartitionsCommand, RunnableCommand}
import org.apache.spark.sql.execution.datasources._
import org.apache.spark.sql.execution.datasources.v2.orc.OrcScan
Expand Down Expand Up @@ -394,12 +392,7 @@ abstract class Spark31XShims extends SparkShims with Spark31Xuntil33XShims with
wrapped.tableIdentifier,
wrapped.disableBucketedScan)(conf)
}
}),
GpuOverrides.exec[InMemoryTableScanExec](
"Implementation of InMemoryTableScanExec to use GPU accelerated Caching",
ExecChecks((TypeSig.commonCudfTypes + TypeSig.DECIMAL_128 + TypeSig.STRUCT
+ TypeSig.ARRAY + TypeSig.MAP).nested(), TypeSig.all),
(scan, conf, p, r) => new InMemoryTableScanMeta(scan, conf, p, r))
})
).map(r => (r.getClassFor.asSubclass(classOf[SparkPlan]), r)).toMap
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
package com.nvidia.spark.rapids.shims

import com.databricks.sql.execution.window.RunningWindowFunctionExec
import com.nvidia.spark.InMemoryTableScanMeta
import com.nvidia.spark.rapids._
import org.apache.hadoop.fs.FileStatus

Expand All @@ -34,7 +33,6 @@ import org.apache.spark.sql.catalyst.trees.TreeNode
import org.apache.spark.sql.connector.read.Scan
import org.apache.spark.sql.execution._
import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanExec, BroadcastQueryStageExec, ShuffleQueryStageExec}
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.command.{AlterTableRecoverPartitionsCommand, RunnableCommand}
import org.apache.spark.sql.execution.datasources._
import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
Expand Down Expand Up @@ -284,12 +282,7 @@ abstract class Spark31XdbShims extends Spark31XdbShimsBase with Logging {
wrapped.tableIdentifier,
wrapped.disableBucketedScan)(conf)
}
}),
GpuOverrides.exec[InMemoryTableScanExec](
"Implementation of InMemoryTableScanExec to use GPU accelerated Caching",
ExecChecks((TypeSig.commonCudfTypes + TypeSig.DECIMAL_128 + TypeSig.STRUCT
+ TypeSig.ARRAY + TypeSig.MAP).nested(), TypeSig.all),
(scan, conf, p, r) => new InMemoryTableScanMeta(scan, conf, p, r))
})
).map(r => (r.getClassFor.asSubclass(classOf[SparkPlan]), r)).toMap
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ package com.nvidia.spark.rapids.shims

import scala.collection.mutable.ListBuffer

import com.nvidia.spark.InMemoryTableScanMeta
import com.nvidia.spark.rapids._
import com.nvidia.spark.rapids.GpuOverrides.exec

Expand All @@ -33,7 +32,6 @@ import org.apache.spark.sql.catalyst.util.DateFormatter
import org.apache.spark.sql.connector.read.Scan
import org.apache.spark.sql.execution._
import org.apache.spark.sql.execution.adaptive._
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.command._
import org.apache.spark.sql.execution.datasources._
import org.apache.spark.sql.execution.datasources.v2.BatchScanExec
Expand Down Expand Up @@ -407,11 +405,6 @@ trait Spark320PlusShims extends SparkShims with RebaseShims with Logging {
wrapped.disableBucketedScan)(conf)
}
}),
GpuOverrides.exec[InMemoryTableScanExec](
"Implementation of InMemoryTableScanExec to use GPU accelerated Caching",
ExecChecks((TypeSig.commonCudfTypes + TypeSig.DECIMAL_128 + TypeSig.STRUCT
+ TypeSig.ARRAY + TypeSig.MAP).nested(), TypeSig.all),
(scan, conf, p, r) => new InMemoryTableScanMeta(scan, conf, p, r)),
GpuOverrides.exec[BatchScanExec](
"The backend for most file input",
ExecChecks(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
package com.nvidia.spark.rapids.shims

import ai.rapids.cudf.DType
import com.nvidia.spark.InMemoryTableScanMeta
import com.nvidia.spark.rapids._
import org.apache.parquet.schema.MessageType

Expand Down
Loading