rebrand to com.nvidia instead of ai.rapids (#188)

NVIDIA · Jun 17, 2020 · 765344e · 765344e
1 parent 10a3ecb
commit 765344e
Show file tree

Hide file tree

Showing 198 changed files with 455 additions and 444 deletions.
diff --git a/api_validation/README.md b/api_validation/README.md
@@ -27,73 +27,73 @@ Sample Output
 ************************************************************************************************
 Types differ for below parameters in this Exec
 SparkExec  - [org.apache.spark.sql.execution.aggregate.HashAggregateExec]
-GpuExec - [ai.rapids.spark.GpuHashAggregateExec]
+GpuExec - [com.nvidia.spark.rapids.GpuHashAggregateExec]
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
-Seq[org.apache.spark.sql.catalyst.expressions.Attribute]          | Seq[ai.rapids.spark.GpuAttributeReference]
+Seq[org.apache.spark.sql.catalyst.expressions.Attribute]          | Seq[com.nvidia.spark.rapids.GpuAttributeReference]
 
 ************************************************************************************************
 Types differ for below parameters in this Exec
 SparkExec  - [org.apache.spark.sql.execution.GenerateExec]
-GpuExec - [ai.rapids.spark.GpuGenerateExec]
+GpuExec - [com.nvidia.spark.rapids.GpuGenerateExec]
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
 org.apache.spark.sql.catalyst.expressions.Generator               | Boolean
-Seq[org.apache.spark.sql.catalyst.expressions.Attribute]          | Seq[ai.rapids.spark.GpuExpression]
+Seq[org.apache.spark.sql.catalyst.expressions.Attribute]          | Seq[com.nvidia.spark.rapids.GpuExpression]
 Boolean                                                           | Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
 
 ************************************************************************************************
 Types differ for below parameters in this Exec
 SparkExec  - [org.apache.spark.sql.execution.ExpandExec]
-GpuExec - [ai.rapids.spark.GpuExpandExec]
+GpuExec - [com.nvidia.spark.rapids.GpuExpandExec]
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
 Seq[org.apache.spark.sql.catalyst.expressions.Attribute]          | Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression]
 
 ************************************************************************************************
 Types differ for below parameters in this Exec
 SparkExec  - [org.apache.spark.sql.execution.joins.SortMergeJoinExec]
-GpuExec - [ai.rapids.spark.GpuShuffledHashJoinExec]
+GpuExec - [com.nvidia.spark.rapids.GpuShuffledHashJoinExec]
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
 Option[org.apache.spark.sql.catalyst.expressions.Expression]      | org.apache.spark.sql.execution.joins.BuildSide
-org.apache.spark.sql.execution.SparkPlan                          | Option[ai.rapids.spark.GpuExpression]
+org.apache.spark.sql.execution.SparkPlan                          | Option[com.nvidia.spark.rapids.GpuExpression]
 Boolean                                                           | org.apache.spark.sql.execution.SparkPlan
 
 
 
 ************************************************************************************************
 Parameter lengths don't match between Execs
 SparkExec - [org.apache.spark.sql.execution.CollectLimitExec]
-GpuExec - [ai.rapids.spark.GpuCollectLimitExec]
+GpuExec - [com.nvidia.spark.rapids.GpuCollectLimitExec]
 Spark code has 2 parameters where as plugin code has 3 parameters
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
 Int                                                               | Int
-org.apache.spark.sql.execution.SparkPlan                          | ai.rapids.spark.GpuPartitioning
+org.apache.spark.sql.execution.SparkPlan                          | com.nvidia.spark.rapids.GpuPartitioning
                                                                   | org.apache.spark.sql.execution.SparkPlan
 
 ************************************************************************************************
 Parameter lengths don't match between Execs
 SparkExec - [org.apache.spark.sql.execution.SortExec]
-GpuExec - [ai.rapids.spark.GpuSortExec]
+GpuExec - [com.nvidia.spark.rapids.GpuSortExec]
 Spark code has 4 parameters where as plugin code has 5 parameters
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
-Seq[org.apache.spark.sql.catalyst.expressions.SortOrder]          | Seq[ai.rapids.spark.GpuSortOrder]
+Seq[org.apache.spark.sql.catalyst.expressions.SortOrder]          | Seq[com.nvidia.spark.rapids.GpuSortOrder]
 Boolean                                                           | Boolean
 org.apache.spark.sql.execution.SparkPlan                          | org.apache.spark.sql.execution.SparkPlan
-Int                                                               | ai.rapids.spark.CoalesceGoal
+Int                                                               | com.nvidia.spark.rapids.CoalesceGoal
                                                                   | Int
 
 ************************************************************************************************
 Parameter lengths don't match between Execs
 SparkExec - [org.apache.spark.sql.execution.window.WindowExec]
-GpuExec - [ai.rapids.spark.GpuWindowExec]
+GpuExec - [com.nvidia.spark.rapids.GpuWindowExec]
 Spark code has 4 parameters where as plugin code has 2 parameters
 Spark parameters                                                    Plugin parameters
 ------------------------------------------------------------------------------------------------
-Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression]    | Seq[ai.rapids.spark.GpuExpression]
+Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression]    | Seq[com.nvidia.spark.rapids.GpuExpression]
 Seq[org.apache.spark.sql.catalyst.expressions.Expression]         | org.apache.spark.sql.execution.SparkPlan
 Seq[org.apache.spark.sql.catalyst.expressions.SortOrder]          | 
 org.apache.spark.sql.execution.SparkPlan 

diff --git a/api_validation/pom.xml b/api_validation/pom.xml
@@ -20,7 +20,7 @@
     <modelVersion>4.0.0</modelVersion>
 
     <parent>
-        <groupId>ai.rapids</groupId>
+        <groupId>com.nvidia</groupId>
         <artifactId>rapids-4-spark-parent</artifactId>
         <version>0.1-SNAPSHOT</version>
     </parent>
@@ -48,7 +48,7 @@
 	    <scope>provided</scope>
         </dependency>
         <dependency>
-            <groupId>ai.rapids</groupId>
+            <groupId>com.nvidia</groupId>
             <artifactId>rapids-4-spark_${scala.binary.version}</artifactId>
             <version>${project.version}</version>
             <scope>provided</scope>
@@ -61,7 +61,7 @@
 		<groupId>net.alchim31.maven</groupId>
                 <artifactId>scala-maven-plugin</artifactId>
                 <configuration>
-                    <mainClass>ai.rapids.spark.api.ApiValidation</mainClass>
+                    <mainClass>com.nvidia.spark.api.ApiValidation</mainClass>
                 </configuration>
 	    </plugin>	
         </plugins>

diff --git a/...a/ai/rapids/spark/api/ApiValidation.scala → ...idia/spark/rapids/api/ApiValidation.scala b/...a/ai/rapids/spark/api/ApiValidation.scala → ...idia/spark/rapids/api/ApiValidation.scala
@@ -14,12 +14,12 @@
  * limitations under the License.
  */
 
-package ai.rapids.spark.api
+package com.nvidia.spark.rapids.api
 
 import scala.reflect.api
 import scala.reflect.runtime.universe._
 
-import ai.rapids.spark._
+import com.nvidia.spark.rapids._
 
 import org.apache.spark.internal.Logging
 
@@ -86,25 +86,25 @@ object ApiValidation extends Logging {
           case "BroadcastHashJoinExec" | "BroadcastExchangeExec" =>
             s"org.apache.spark.sql.execution.Gpu" + execType
           case "FileSourceScanExec" => s"org.apache.spark.sql.rapids.Gpu" + execType
-          case "SortMergeJoinExec" => s"ai.rapids.spark.GpuShuffledHashJoinExec"
-          case "SortAggregateExec" => s"ai.rapids.spark.GpuHashAggregateExec"
-          case _ => s"ai.rapids.spark.Gpu" + execType
+          case "SortMergeJoinExec" => s"com.nvidia.spark.rapids.GpuShuffledHashJoinExec"
+          case "SortAggregateExec" => s"com.nvidia.spark.rapids.GpuHashAggregateExec"
+          case _ => s"com.nvidia.spark.rapids.Gpu" + execType
         }
 
         // TODO: Add error handling if Type is not present
         val gpuTypes = stringToTypeTag(gpu)
 
         val sparkToGpuExecMap = Map(
           "org.apache.spark.sql.catalyst.expressions.Expression" ->
-            "ai.rapids.spark.GpuExpression",
+            "com.nvidia.spark.rapids.GpuExpression",
           "org.apache.spark.sql.catalyst.expressions.NamedExpression" ->
-            "ai.rapids.spark.GpuExpression",
+            "com.nvidia.spark.rapids.GpuExpression",
           "org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression" ->
             "org.apache.spark.sql.rapids.GpuAggregateExpression",
           "org.apache.spark.sql.catalyst.expressions.AttributeReference" ->
-            "ai.rapids.spark.GpuAttributeReference",
+            "com.nvidia.spark.rapids.GpuAttributeReference",
           "org.apache.spark.sql.execution.command.DataWritingCommand" ->
-            "ai.rapids.spark.GpuDataWritingCommand",
+            "com.nvidia.spark.rapids.GpuDataWritingCommand",
           "org.apache.spark.sql.execution.joins.BuildSide" ->
             "org.apache.spark.sql.execution.joins.BuildSide")
           .withDefaultValue("sparkKeyNotPresent")

diff --git a/dist/pom.xml b/dist/pom.xml
@@ -20,24 +20,24 @@
   <modelVersion>4.0.0</modelVersion>
 
   <parent>
-    <groupId>ai.rapids</groupId>
+    <groupId>com.nvidia</groupId>
     <artifactId>rapids-4-spark-parent</artifactId>
     <version>0.1-SNAPSHOT</version>
   </parent>
-  <groupId>ai.rapids</groupId>
+  <groupId>com.nvidia</groupId>
   <artifactId>rapids-4-spark_2.12</artifactId>
-  <name>RAPIDS Plugin for Apache Spark Distribution</name>
+  <name>RAPIDS Accelerator for Apache Spark Distribution</name>
   <description>Creates the distribution package of the RAPIDS plugin for Apache Spark</description>
   <version>0.1-SNAPSHOT</version>
 
   <dependencies>
     <dependency>
-       <groupId>ai.rapids</groupId>
+       <groupId>com.nvidia</groupId>
        <artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
        <version>${project.version}</version>
     </dependency>
     <dependency>
-       <groupId>ai.rapids</groupId>
+       <groupId>com.nvidia</groupId>
        <artifactId>rapids-4-spark-shuffle_${scala.binary.version}</artifactId>
        <version>${project.version}</version>
     </dependency>

diff --git a/docs/configs.md b/docs/configs.md
@@ -6,7 +6,7 @@ On startup use: `--conf [conf key]=[conf value]`. For example:
 
 ```
 ${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.1-SNAPSHOT.jar,cudf-0.14-SNAPSHOT-cuda10.jar' \
---conf spark.plugins=ai.rapids.spark.SQLPlugin \
+--conf spark.plugins=com.nvidia.spark.SQLPlugin \
 --conf spark.rapids.sql.incompatibleOps.enabled=true
 ```
 

diff --git a/docs/dev/README.md b/docs/dev/README.md
@@ -93,15 +93,15 @@ being processed by the RAPIDS plugin:
                      +- SortMergeJoin [o_orderkey#0L], [l_orderkey#18L], LeftSemi
                         :- *(1) GpuColumnarToRow false
                         :  +- !GpuSort [o_orderkey#0L ASC NULLS FIRST], false, 0
-                        :     +- GpuCoalesceBatches ai.rapids.spark.PreferSingleBatch$@40dcd875
+                        :     +- GpuCoalesceBatches com.nvidia.spark.rapids.PreferSingleBatch$@40dcd875
                         :        +- !GpuColumnarExchange gpuhashpartitioning(o_orderkey#0L, 200), true, [id=#543]
                         :           +- !GpuProject [o_orderkey#0L, o_orderpriority#5]
                         :              +- GpuCoalesceBatches TargetSize(1000000)
                         :                 +- !GpuFilter ((gpuisnotnull(o_orderdate#4) AND (o_orderdate#4 >= 8582)) AND (o_orderdate#4 < 8674))
                         :                    +- GpuBatchScan[o_orderkey#0L, o_orderdate#4, o_orderpriority#5] GpuParquetScan Location: InMemoryFileIndex[file:/home/example/parquet/orders.tbl], ReadSchema: struct<o_orderkey:bigint,o_orderdate:date,o_orderpriority:string>
                         +- *(2) GpuColumnarToRow false
                            +- !GpuSort [l_orderkey#18L ASC NULLS FIRST], false, 0
-                              +- GpuCoalesceBatches ai.rapids.spark.PreferSingleBatch$@40dcd875
+                              +- GpuCoalesceBatches com.nvidia.spark.rapids.PreferSingleBatch$@40dcd875
                                  +- !GpuColumnarExchange gpuhashpartitioning(l_orderkey#18L, 200), true, [id=#551]
                                     +- !GpuProject [l_orderkey#18L]
                                        +- GpuCoalesceBatches TargetSize(1000000)
@@ -153,8 +153,8 @@ The plugin configuration documentation can be generated by executing the
 `RapidsConf.help` method.  An easy way to do this is to use the Spark shell
 REPL then copy-n-paste the resulting output.  For example:
 ```
-scala> import ai.rapids.spark.RapidsConf
-import ai.rapids.spark.RapidsConf
+scala> import com.nvidia.spark.rapids.RapidsConf
+import com.nvidia.spark.rapids.RapidsConf
 
 scala> RapidsConf.help(true)
 # Rapids Plugin 4 Spark Configuration
@@ -203,11 +203,11 @@ leaking references to the semaphore and possibly causing deadlocks.
 #### Disabling the Semaphore
 If there is ever a need to execute without the semaphore semantics, the
 semaphore code can be disabled at runtime by setting the Java system property
-`ai.rapids.spark.semaphore.enabled` to `false` before the `GpuSemaphore` class
+`com.nvidia.spark.rapids.semaphore.enabled` to `false` before the `GpuSemaphore` class
 is loaded.  Typically this would be set as one of the Spark executor Java
 options, e.g.:
 ```
---conf spark.executor.extraJavaOptions=-Dai.rapids.spark.semaphore.enabled=false
+--conf spark.executor.extraJavaOptions=-Dcom.nvidia.spark.rapids.semaphore.enabled=false
 ```
 
 ## Debugging Tips

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -26,7 +26,7 @@ To enable GPU processing acceleration you will need:
 - Add the following jars:
     - A cudf jar that corresponds to the version of CUDA available on your cluster.
     - RAPIDS Spark accelerator plugin jar.
-- Set the config `spark.plugins` to `ai.rapids.spark.SQLPlugin`
+- Set the config `spark.plugins` to `com.nvidia.spark.SQLPlugin`
 
 ## Prerequisites
 Each node where you are running Spark needs to have the following installed. If you are running
@@ -88,7 +88,7 @@ To install Apache Spark please follow the official
 scala version 2.12 is currently supported by the accelerator. 
 
 ## Download the RAPIDS jars
-The [accelerator](https://mvnrepository.com/artifact/ai.rapids/rapids-4-spark_2.12) and 
+The [accelerator](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark_2.12) and 
 [cudf](https://mvnrepository.com/artifact/ai.rapids/cudf) jars are available in 
 [maven central](https://mvnrepository.com/search?q=ai.rapids)
 
@@ -129,7 +129,7 @@ everything in a single process on a single node.
 - Launch your Spark shell session
 
 Default configs usually work fine in local mode.  The required changes are setting the config 
-`spark.plugins` to `ai.rapids.spark.SQLPlugin` and including the jars as a dependency. All of the
+`spark.plugins` to `com.nvidia.spark.SQLPlugin` and including the jars as a dependency. All of the
 other config settings and command line parameters are to try and better configure spark for GPU
 execution.
 
@@ -144,7 +144,7 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin \
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
        --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
 ```
 You can run one of the examples below such as the [Example Join Operation](#example-join-operation)
@@ -214,7 +214,7 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin
 ```
 
 ## Running on YARN
@@ -261,7 +261,7 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin \
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
        --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
        --files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
        --jars  ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
@@ -288,7 +288,7 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin \
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
        --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
        --files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
        --jars  ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
@@ -325,8 +325,8 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin \
-       --conf spark.resourceDiscovery.plugin=ai.rapids.spark.ExclusiveModeGpuDiscoveryPlugin \
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
+       --conf spark.resourceDiscovery.plugin=com.nvidia.spark.ExclusiveModeGpuDiscoveryPlugin \
        --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
        --files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
        --jars  ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
@@ -368,7 +368,7 @@ $SPARK_HOME/bin/spark-shell \
        --conf spark.locality.wait=0s \
        --conf spark.sql.files.maxPartitionBytes=512m \
        --conf spark.sql.shuffle.partitions=10 \
-       --conf spark.plugins=ai.rapids.spark.SQLPlugin \
+       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
        --conf spark.executor.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh \
        --conf spark.executor.resource.gpu.vendor=nvidia.com \
        --conf spark.kubernetes.container.image=$IMAGE_NAME
@@ -383,7 +383,7 @@ and application.
 1. If you are using the KryoSerializer with Spark, e.g.:
    `--conf spark.serializer=org.apache.spark.serializer.KryoSerializer`, you will have to register
    the GpuKryoRegistrator class, e.g.:
-   `--conf spark.kryo.registrator=ai.rapids.spark.GpuKryoRegistrator`.
+   `--conf spark.kryo.registrator=com.nvidia.spark.rapids.GpuKryoRegistrator`.
 1. Configure the amount of executor memory like you would for a normal Spark application.  If most
    of the job will run on the GPU then often you can run with less executor heap memory than would
    be needed for the corresponding Spark job on the CPU.
@@ -392,7 +392,7 @@ and application.
 ```shell script
 $SPARK_HOME/bin/spark-shell --master yarn \
   --num-executors 1 \
-  --conf spark.plugins=ai.rapids.spark.SQLPlugin \
+  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
   --conf spark.executor.cores=6 \
   --conf spark.rapids.sql.concurrentGpuTasks=2 \
   --executor-memory 20g \

diff --git a/docs/ml-integration.md b/docs/ml-integration.md
@@ -2,7 +2,7 @@
 
 There are cases where you may want to get access to the raw data on the GPU, preferably without
 copying it. One use case for this is exporting the data to an ML framework after doing feature
-extraction. To do this we provide a simple Scala utility `ai.rapids.spark.ColumnarRdd` that can
+extraction. To do this we provide a simple Scala utility `com.nvidia.spark.rapids.ColumnarRdd` that can
 be used to convert a `DataFrame` to an `RDD[ai.rapids.cudf.Table]`. Each `Table` will have the same
 schema as the `DataFrame` passed in.