Skip to content

Commit

Permalink
rebrand to com.nvidia instead of ai.rapids (NVIDIA#188)
Browse files Browse the repository at this point in the history
  • Loading branch information
revans2 authored Jun 17, 2020
1 parent 0f50ef9 commit 98fe1b9
Show file tree
Hide file tree
Showing 198 changed files with 455 additions and 444 deletions.
28 changes: 14 additions & 14 deletions api_validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,73 +27,73 @@ Sample Output
************************************************************************************************
Types differ for below parameters in this Exec
SparkExec - [org.apache.spark.sql.execution.aggregate.HashAggregateExec]
GpuExec - [ai.rapids.spark.GpuHashAggregateExec]
GpuExec - [com.nvidia.spark.rapids.GpuHashAggregateExec]
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Seq[org.apache.spark.sql.catalyst.expressions.Attribute] | Seq[ai.rapids.spark.GpuAttributeReference]
Seq[org.apache.spark.sql.catalyst.expressions.Attribute] | Seq[com.nvidia.spark.rapids.GpuAttributeReference]
************************************************************************************************
Types differ for below parameters in this Exec
SparkExec - [org.apache.spark.sql.execution.GenerateExec]
GpuExec - [ai.rapids.spark.GpuGenerateExec]
GpuExec - [com.nvidia.spark.rapids.GpuGenerateExec]
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
org.apache.spark.sql.catalyst.expressions.Generator | Boolean
Seq[org.apache.spark.sql.catalyst.expressions.Attribute] | Seq[ai.rapids.spark.GpuExpression]
Seq[org.apache.spark.sql.catalyst.expressions.Attribute] | Seq[com.nvidia.spark.rapids.GpuExpression]
Boolean | Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
************************************************************************************************
Types differ for below parameters in this Exec
SparkExec - [org.apache.spark.sql.execution.ExpandExec]
GpuExec - [ai.rapids.spark.GpuExpandExec]
GpuExec - [com.nvidia.spark.rapids.GpuExpandExec]
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Seq[org.apache.spark.sql.catalyst.expressions.Attribute] | Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression]
************************************************************************************************
Types differ for below parameters in this Exec
SparkExec - [org.apache.spark.sql.execution.joins.SortMergeJoinExec]
GpuExec - [ai.rapids.spark.GpuShuffledHashJoinExec]
GpuExec - [com.nvidia.spark.rapids.GpuShuffledHashJoinExec]
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Option[org.apache.spark.sql.catalyst.expressions.Expression] | org.apache.spark.sql.execution.joins.BuildSide
org.apache.spark.sql.execution.SparkPlan | Option[ai.rapids.spark.GpuExpression]
org.apache.spark.sql.execution.SparkPlan | Option[com.nvidia.spark.rapids.GpuExpression]
Boolean | org.apache.spark.sql.execution.SparkPlan
************************************************************************************************
Parameter lengths don't match between Execs
SparkExec - [org.apache.spark.sql.execution.CollectLimitExec]
GpuExec - [ai.rapids.spark.GpuCollectLimitExec]
GpuExec - [com.nvidia.spark.rapids.GpuCollectLimitExec]
Spark code has 2 parameters where as plugin code has 3 parameters
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Int | Int
org.apache.spark.sql.execution.SparkPlan | ai.rapids.spark.GpuPartitioning
org.apache.spark.sql.execution.SparkPlan | com.nvidia.spark.rapids.GpuPartitioning
| org.apache.spark.sql.execution.SparkPlan
************************************************************************************************
Parameter lengths don't match between Execs
SparkExec - [org.apache.spark.sql.execution.SortExec]
GpuExec - [ai.rapids.spark.GpuSortExec]
GpuExec - [com.nvidia.spark.rapids.GpuSortExec]
Spark code has 4 parameters where as plugin code has 5 parameters
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Seq[org.apache.spark.sql.catalyst.expressions.SortOrder] | Seq[ai.rapids.spark.GpuSortOrder]
Seq[org.apache.spark.sql.catalyst.expressions.SortOrder] | Seq[com.nvidia.spark.rapids.GpuSortOrder]
Boolean | Boolean
org.apache.spark.sql.execution.SparkPlan | org.apache.spark.sql.execution.SparkPlan
Int | ai.rapids.spark.CoalesceGoal
Int | com.nvidia.spark.rapids.CoalesceGoal
| Int
************************************************************************************************
Parameter lengths don't match between Execs
SparkExec - [org.apache.spark.sql.execution.window.WindowExec]
GpuExec - [ai.rapids.spark.GpuWindowExec]
GpuExec - [com.nvidia.spark.rapids.GpuWindowExec]
Spark code has 4 parameters where as plugin code has 2 parameters
Spark parameters Plugin parameters
------------------------------------------------------------------------------------------------
Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression] | Seq[ai.rapids.spark.GpuExpression]
Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression] | Seq[com.nvidia.spark.rapids.GpuExpression]
Seq[org.apache.spark.sql.catalyst.expressions.Expression] | org.apache.spark.sql.execution.SparkPlan
Seq[org.apache.spark.sql.catalyst.expressions.SortOrder] |
org.apache.spark.sql.execution.SparkPlan
Expand Down
6 changes: 3 additions & 3 deletions api_validation/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
<version>0.1-SNAPSHOT</version>
</parent>
Expand Down Expand Up @@ -48,7 +48,7 @@
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
Expand All @@ -61,7 +61,7 @@
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<configuration>
<mainClass>ai.rapids.spark.api.ApiValidation</mainClass>
<mainClass>com.nvidia.spark.api.ApiValidation</mainClass>
</configuration>
</plugin>
</plugins>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
* limitations under the License.
*/

package ai.rapids.spark.api
package com.nvidia.spark.rapids.api

import scala.reflect.api
import scala.reflect.runtime.universe._

import ai.rapids.spark._
import com.nvidia.spark.rapids._

import org.apache.spark.internal.Logging

Expand Down Expand Up @@ -86,25 +86,25 @@ object ApiValidation extends Logging {
case "BroadcastHashJoinExec" | "BroadcastExchangeExec" =>
s"org.apache.spark.sql.execution.Gpu" + execType
case "FileSourceScanExec" => s"org.apache.spark.sql.rapids.Gpu" + execType
case "SortMergeJoinExec" => s"ai.rapids.spark.GpuShuffledHashJoinExec"
case "SortAggregateExec" => s"ai.rapids.spark.GpuHashAggregateExec"
case _ => s"ai.rapids.spark.Gpu" + execType
case "SortMergeJoinExec" => s"com.nvidia.spark.rapids.GpuShuffledHashJoinExec"
case "SortAggregateExec" => s"com.nvidia.spark.rapids.GpuHashAggregateExec"
case _ => s"com.nvidia.spark.rapids.Gpu" + execType
}

// TODO: Add error handling if Type is not present
val gpuTypes = stringToTypeTag(gpu)

val sparkToGpuExecMap = Map(
"org.apache.spark.sql.catalyst.expressions.Expression" ->
"ai.rapids.spark.GpuExpression",
"com.nvidia.spark.rapids.GpuExpression",
"org.apache.spark.sql.catalyst.expressions.NamedExpression" ->
"ai.rapids.spark.GpuExpression",
"com.nvidia.spark.rapids.GpuExpression",
"org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression" ->
"org.apache.spark.sql.rapids.GpuAggregateExpression",
"org.apache.spark.sql.catalyst.expressions.AttributeReference" ->
"ai.rapids.spark.GpuAttributeReference",
"com.nvidia.spark.rapids.GpuAttributeReference",
"org.apache.spark.sql.execution.command.DataWritingCommand" ->
"ai.rapids.spark.GpuDataWritingCommand",
"com.nvidia.spark.rapids.GpuDataWritingCommand",
"org.apache.spark.sql.execution.joins.BuildSide" ->
"org.apache.spark.sql.execution.joins.BuildSide")
.withDefaultValue("sparkKeyNotPresent")
Expand Down
10 changes: 5 additions & 5 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,24 @@
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
<version>0.1-SNAPSHOT</version>
</parent>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_2.12</artifactId>
<name>RAPIDS Plugin for Apache Spark Distribution</name>
<name>RAPIDS Accelerator for Apache Spark Distribution</name>
<description>Creates the distribution package of the RAPIDS plugin for Apache Spark</description>
<version>0.1-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>ai.rapids</groupId>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shuffle_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
Expand Down
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.1-SNAPSHOT.jar,cudf-0.14-SNAPSHOT-cuda10.jar' \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```

Expand Down
12 changes: 6 additions & 6 deletions docs/dev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ being processed by the RAPIDS plugin:
+- SortMergeJoin [o_orderkey#0L], [l_orderkey#18L], LeftSemi
:- *(1) GpuColumnarToRow false
: +- !GpuSort [o_orderkey#0L ASC NULLS FIRST], false, 0
: +- GpuCoalesceBatches ai.rapids.spark.PreferSingleBatch$@40dcd875
: +- GpuCoalesceBatches com.nvidia.spark.rapids.PreferSingleBatch$@40dcd875
: +- !GpuColumnarExchange gpuhashpartitioning(o_orderkey#0L, 200), true, [id=#543]
: +- !GpuProject [o_orderkey#0L, o_orderpriority#5]
: +- GpuCoalesceBatches TargetSize(1000000)
: +- !GpuFilter ((gpuisnotnull(o_orderdate#4) AND (o_orderdate#4 >= 8582)) AND (o_orderdate#4 < 8674))
: +- GpuBatchScan[o_orderkey#0L, o_orderdate#4, o_orderpriority#5] GpuParquetScan Location: InMemoryFileIndex[file:/home/example/parquet/orders.tbl], ReadSchema: struct<o_orderkey:bigint,o_orderdate:date,o_orderpriority:string>
+- *(2) GpuColumnarToRow false
+- !GpuSort [l_orderkey#18L ASC NULLS FIRST], false, 0
+- GpuCoalesceBatches ai.rapids.spark.PreferSingleBatch$@40dcd875
+- GpuCoalesceBatches com.nvidia.spark.rapids.PreferSingleBatch$@40dcd875
+- !GpuColumnarExchange gpuhashpartitioning(l_orderkey#18L, 200), true, [id=#551]
+- !GpuProject [l_orderkey#18L]
+- GpuCoalesceBatches TargetSize(1000000)
Expand Down Expand Up @@ -153,8 +153,8 @@ The plugin configuration documentation can be generated by executing the
`RapidsConf.help` method. An easy way to do this is to use the Spark shell
REPL then copy-n-paste the resulting output. For example:
```
scala> import ai.rapids.spark.RapidsConf
import ai.rapids.spark.RapidsConf
scala> import com.nvidia.spark.rapids.RapidsConf
import com.nvidia.spark.rapids.RapidsConf
scala> RapidsConf.help(true)
# Rapids Plugin 4 Spark Configuration
Expand Down Expand Up @@ -203,11 +203,11 @@ leaking references to the semaphore and possibly causing deadlocks.
#### Disabling the Semaphore
If there is ever a need to execute without the semaphore semantics, the
semaphore code can be disabled at runtime by setting the Java system property
`ai.rapids.spark.semaphore.enabled` to `false` before the `GpuSemaphore` class
`com.nvidia.spark.rapids.semaphore.enabled` to `false` before the `GpuSemaphore` class
is loaded. Typically this would be set as one of the Spark executor Java
options, e.g.:
```
--conf spark.executor.extraJavaOptions=-Dai.rapids.spark.semaphore.enabled=false
--conf spark.executor.extraJavaOptions=-Dcom.nvidia.spark.rapids.semaphore.enabled=false
```

## Debugging Tips
Expand Down
24 changes: 12 additions & 12 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ To enable GPU processing acceleration you will need:
- Add the following jars:
- A cudf jar that corresponds to the version of CUDA available on your cluster.
- RAPIDS Spark accelerator plugin jar.
- Set the config `spark.plugins` to `ai.rapids.spark.SQLPlugin`
- Set the config `spark.plugins` to `com.nvidia.spark.SQLPlugin`

## Prerequisites
Each node where you are running Spark needs to have the following installed. If you are running
Expand Down Expand Up @@ -88,7 +88,7 @@ To install Apache Spark please follow the official
scala version 2.12 is currently supported by the accelerator.

## Download the RAPIDS jars
The [accelerator](https://mvnrepository.com/artifact/ai.rapids/rapids-4-spark_2.12) and
The [accelerator](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark_2.12) and
[cudf](https://mvnrepository.com/artifact/ai.rapids/cudf) jars are available in
[maven central](https://mvnrepository.com/search?q=ai.rapids)

Expand Down Expand Up @@ -129,7 +129,7 @@ everything in a single process on a single node.
- Launch your Spark shell session

Default configs usually work fine in local mode. The required changes are setting the config
`spark.plugins` to `ai.rapids.spark.SQLPlugin` and including the jars as a dependency. All of the
`spark.plugins` to `com.nvidia.spark.SQLPlugin` and including the jars as a dependency. All of the
other config settings and command line parameters are to try and better configure spark for GPU
execution.

Expand All @@ -144,7 +144,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
```
You can run one of the examples below such as the [Example Join Operation](#example-join-operation)
Expand Down Expand Up @@ -214,7 +214,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin
--conf spark.plugins=com.nvidia.spark.SQLPlugin
```

## Running on YARN
Expand Down Expand Up @@ -261,7 +261,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
Expand All @@ -288,7 +288,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
Expand Down Expand Up @@ -325,8 +325,8 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.resourceDiscovery.plugin=ai.rapids.spark.ExclusiveModeGpuDiscoveryPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.resourceDiscovery.plugin=com.nvidia.spark.ExclusiveModeGpuDiscoveryPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
Expand Down Expand Up @@ -368,7 +368,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.locality.wait=0s \
--conf spark.sql.files.maxPartitionBytes=512m \
--conf spark.sql.shuffle.partitions=10 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh \
--conf spark.executor.resource.gpu.vendor=nvidia.com \
--conf spark.kubernetes.container.image=$IMAGE_NAME
Expand All @@ -383,7 +383,7 @@ and application.
1. If you are using the KryoSerializer with Spark, e.g.:
`--conf spark.serializer=org.apache.spark.serializer.KryoSerializer`, you will have to register
the GpuKryoRegistrator class, e.g.:
`--conf spark.kryo.registrator=ai.rapids.spark.GpuKryoRegistrator`.
`--conf spark.kryo.registrator=com.nvidia.spark.rapids.GpuKryoRegistrator`.
1. Configure the amount of executor memory like you would for a normal Spark application. If most
of the job will run on the GPU then often you can run with less executor heap memory than would
be needed for the corresponding Spark job on the CPU.
Expand All @@ -392,7 +392,7 @@ and application.
```shell script
$SPARK_HOME/bin/spark-shell --master yarn \
--num-executors 1 \
--conf spark.plugins=ai.rapids.spark.SQLPlugin \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.cores=6 \
--conf spark.rapids.sql.concurrentGpuTasks=2 \
--executor-memory 20g \
Expand Down
2 changes: 1 addition & 1 deletion docs/ml-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

There are cases where you may want to get access to the raw data on the GPU, preferably without
copying it. One use case for this is exporting the data to an ML framework after doing feature
extraction. To do this we provide a simple Scala utility `ai.rapids.spark.ColumnarRdd` that can
extraction. To do this we provide a simple Scala utility `com.nvidia.spark.rapids.ColumnarRdd` that can
be used to convert a `DataFrame` to an `RDD[ai.rapids.cudf.Table]`. Each `Table` will have the same
schema as the `DataFrame` passed in.

Expand Down
Loading

0 comments on commit 98fe1b9

Please sign in to comment.