[Feat][Spark] Add examples to show how to load/dump data from/to Grap…

…hAr for Nebula (#244)
apache · Oct 9, 2023 · 78f9226 · 78f9226
1 parent de5f2bb
commit 78f9226
Show file tree

Hide file tree

Showing 9 changed files with 554 additions and 0 deletions.
diff --git a/.github/workflows/spark.yaml b/.github/workflows/spark.yaml
@@ -76,3 +76,32 @@ jobs:
 
         # stop and clean
         popd
+      
+    - name: Run Nebula2GraphAr example
+      run: |
+        export JAVA_HOME=${JAVA_HOME_11_X64}
+        pushd spark
+        scripts/get-nebula-to-home.sh
+        export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
+        export PATH="${SPARK_HOME}/bin":"${PATH}"
+
+        scripts/get-nebula-to-home.sh
+
+        scripts/deploy-nebula-default-data.sh
+
+        scripts/build.sh
+
+        scripts/run-nebula2graphar.sh
+
+        # clean the data
+        docker run \
+            --rm \
+            --name nebula-console-loader \
+            --network nebula-docker-env_nebula-net \
+            vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e "use basketballplayer; clear space basketballplayer;"
+        
+        # import from GraphAr
+        scripts/run-graphar2nebula.sh
+
+        # stop and clean
+        popd
diff --git a/spark/README.md b/spark/README.md
@@ -135,12 +135,88 @@ echo "match (a) -[r] -> () delete a, r;match (a) delete a;" | cypher-shell -u ${
 ```
 
 Then run the example:
+
 ```bash
 scripts/run-graphar2neo4j.sh
 ```
 
 The example will import the movie graph from GraphAr to Neo4j and you can check the result in the Neo4j browser.
 
+## Running NebulaGraph to GraphAr example
+
+Running this example requires `Docker` to be installed, if not, follow [this link](https://docs.docker.com/engine/install/). Run `docker version` to check it.
+
+Spark provides a simple example to convert NebulaGraph data to GraphAr data.
+The example is located in the directory ``spark/src/main/scala/com/alibaba/graphar/examples/``.
+
+To run the example, download Spark and Neo4j first.
+
+### Spark 3.2.x
+
+Spark 3.2.x is the recommended runtime to use. The rest of the instructions are provided assuming Spark 3.2.x.
+
+To place Spark under `${HOME}`:
+
+```bash
+scripts/get-spark-to-home.sh
+export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
+export PATH="${SPARK_HOME}/bin":"${PATH}"
+```
+
+### NebulaGraph
+
+To place NebulaGraph docker-compose.yaml under `${HOME}`:
+
+```bash
+scripts/get-nebula-to-home.sh
+```
+
+Start NebulaGraph server by Docker and load `basketballplayer` data:
+
+```bash
+scripts/deploy-nebula-default-data.sh
+```
+
+Use [NebulaGraph Studio](https://docs.nebula-graph.com.cn/master/nebula-studio/deploy-connect/st-ug-deploy/#docker_studio) to check the graph data, the username is ``root`` and the password is ``nebula``.
+
+### Building the project
+
+Run:
+
+```bash
+scripts/build.sh
+```
+
+### Running the Nebula2GraphAr example
+
+```bash
+scripts/run-nebula2graphar.sh
+```
+
+The example will convert the basketballplayer data in NebulaGraph to GraphAr data and save it to the directory ``/tmp/graphar/nebula2graphar``.
+
+### Running the GraphAr2Nebula example
+
+We can also import the basketballplayer graph from GraphAr to NebulaGraph.
+
+First clear the NebulaGraph's basketballplayer graph space to show the import result clearly:
+
+```bash
+docker run \
+    --rm \
+    --name nebula-console-loader \
+    --network nebula-docker-env_nebula-net \
+    vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e "use basketballplayer; clear space basketballplayer;"
+```
+
+Then run the example:
+
+```bash
+scripts/run-graphar2nebula.sh
+```
+
+The example will import the basketballplayer graph from GraphAr to NebulaGraph and you can check the result in NebulaGraph Studio.
+
 ## How to use
 
 Please refer to our [GraphAr Spark Library Documentation](https://alibaba.github.io/GraphAr/user-guide/spark-lib.html).
diff --git a/spark/pom.xml b/spark/pom.xml
@@ -101,6 +101,16 @@
             <artifactId>neo4j-connector-apache-spark_2.12</artifactId>
             <version>5.0.0_for_spark_3</version>
         </dependency>
+        <dependency>
+            <groupId>com.vesoft</groupId>
+            <artifactId>nebula-spark-connector_3.0</artifactId>
+            <version>3.6.0</version>
+        </dependency>
+        <dependency>
+            <groupId>org.scala-lang.modules</groupId>
+            <artifactId>scala-collection-compat_2.12</artifactId>
+            <version>2.1.1</version>
+        </dependency>
     </dependencies>
     <build>
         <plugins>

diff --git a/spark/scripts/deploy-nebula-default-data.sh b/spark/scripts/deploy-nebula-default-data.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with this
+# work for additional information regarding copyright ownership. The ASF
+# licenses this file to You under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+#
+
+set -eu
+
+nebula_env_dir="${HOME}/nebula-docker-env"
+cd ${nebula_env_dir}
+
+docker compose up -d
+sleep 30
+
+docker run \
+    --rm \
+    --name nebula-console-loader \
+    --network nebula-docker-env_nebula-net \
+    vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e ":play basketballplayer"
diff --git a/spark/scripts/get-nebula-to-home.sh b/spark/scripts/get-nebula-to-home.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with this
+# work for additional information regarding copyright ownership. The ASF
+# licenses this file to You under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+#
+
+set -eu
+cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+nebula_env_dir="${HOME}/nebula-docker-env"
+if [[ ! -d ${nebula_env_dir} ]]; then
+  mkdir ${nebula_env_dir}
+else
+  echo ${nebula_env_dir} already exist.
+fi
+cd ${nebula_env_dir}
+
+curl -s \
+     -o docker-compose.yaml \
+     https://raw.githubusercontent.com/vesoft-inc/nebula-spark-connector/master/nebula-spark-connector_3.0/src/test/resources/docker-compose.yaml
diff --git a/spark/scripts/run-graphar2nebula.sh b/spark/scripts/run-graphar2nebula.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with this
+# work for additional information regarding copyright ownership. The ASF
+# licenses this file to You under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+#
+
+set -eu
+
+cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+jar_file="${cur_dir}/../target/graphar-0.1.0-SNAPSHOT-shaded.jar"
+
+graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/nebula2graphar/basketballplayergraph.graph.yml}"
+spark-submit --class com.alibaba.graphar.example.GraphAr2Nebula ${jar_file} \
+    ${graph_info_path}
diff --git a/spark/scripts/run-nebula2graphar.sh b/spark/scripts/run-nebula2graphar.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with this
+# work for additional information regarding copyright ownership. The ASF
+# licenses this file to You under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+#
+
+set -eu
+
+cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+jar_file="${cur_dir}/../target/graphar-0.1.0-SNAPSHOT-shaded.jar"
+
+vertex_chunk_size=100
+edge_chunk_size=1024
+file_type="parquet"
+spark-submit --class com.alibaba.graphar.example.Nebula2GraphAr ${jar_file} \
+    "/tmp/graphar/nebula2graphar" ${vertex_chunk_size} ${edge_chunk_size} ${file_type}