Skip to content

Commit

Permalink
[Feat][Spark] Add examples to show how to load/dump data from/to Grap…
Browse files Browse the repository at this point in the history
…hAr for Nebula (#244)
  • Loading branch information
liuxiaocs7 authored Oct 9, 2023
1 parent de5f2bb commit 78f9226
Show file tree
Hide file tree
Showing 9 changed files with 554 additions and 0 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/spark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,32 @@ jobs:
# stop and clean
popd
- name: Run Nebula2GraphAr example
run: |
export JAVA_HOME=${JAVA_HOME_11_X64}
pushd spark
scripts/get-nebula-to-home.sh
export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
export PATH="${SPARK_HOME}/bin":"${PATH}"
scripts/get-nebula-to-home.sh
scripts/deploy-nebula-default-data.sh
scripts/build.sh
scripts/run-nebula2graphar.sh
# clean the data
docker run \
--rm \
--name nebula-console-loader \
--network nebula-docker-env_nebula-net \
vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e "use basketballplayer; clear space basketballplayer;"
# import from GraphAr
scripts/run-graphar2nebula.sh
# stop and clean
popd
76 changes: 76 additions & 0 deletions spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,88 @@ echo "match (a) -[r] -> () delete a, r;match (a) delete a;" | cypher-shell -u ${
```

Then run the example:

```bash
scripts/run-graphar2neo4j.sh
```

The example will import the movie graph from GraphAr to Neo4j and you can check the result in the Neo4j browser.

## Running NebulaGraph to GraphAr example

Running this example requires `Docker` to be installed, if not, follow [this link](https://docs.docker.com/engine/install/). Run `docker version` to check it.

Spark provides a simple example to convert NebulaGraph data to GraphAr data.
The example is located in the directory ``spark/src/main/scala/com/alibaba/graphar/examples/``.

To run the example, download Spark and Neo4j first.

### Spark 3.2.x

Spark 3.2.x is the recommended runtime to use. The rest of the instructions are provided assuming Spark 3.2.x.

To place Spark under `${HOME}`:

```bash
scripts/get-spark-to-home.sh
export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
export PATH="${SPARK_HOME}/bin":"${PATH}"
```

### NebulaGraph

To place NebulaGraph docker-compose.yaml under `${HOME}`:

```bash
scripts/get-nebula-to-home.sh
```

Start NebulaGraph server by Docker and load `basketballplayer` data:

```bash
scripts/deploy-nebula-default-data.sh
```

Use [NebulaGraph Studio](https://docs.nebula-graph.com.cn/master/nebula-studio/deploy-connect/st-ug-deploy/#docker_studio) to check the graph data, the username is ``root`` and the password is ``nebula``.

### Building the project

Run:

```bash
scripts/build.sh
```

### Running the Nebula2GraphAr example

```bash
scripts/run-nebula2graphar.sh
```

The example will convert the basketballplayer data in NebulaGraph to GraphAr data and save it to the directory ``/tmp/graphar/nebula2graphar``.

### Running the GraphAr2Nebula example

We can also import the basketballplayer graph from GraphAr to NebulaGraph.

First clear the NebulaGraph's basketballplayer graph space to show the import result clearly:

```bash
docker run \
--rm \
--name nebula-console-loader \
--network nebula-docker-env_nebula-net \
vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e "use basketballplayer; clear space basketballplayer;"
```

Then run the example:

```bash
scripts/run-graphar2nebula.sh
```

The example will import the basketballplayer graph from GraphAr to NebulaGraph and you can check the result in NebulaGraph Studio.

## How to use

Please refer to our [GraphAr Spark Library Documentation](https://alibaba.github.io/GraphAr/user-guide/spark-lib.html).
10 changes: 10 additions & 0 deletions spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,16 @@
<artifactId>neo4j-connector-apache-spark_2.12</artifactId>
<version>5.0.0_for_spark_3</version>
</dependency>
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-spark-connector_3.0</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_2.12</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
<build>
<plugins>
Expand Down
31 changes: 31 additions & 0 deletions spark/scripts/deploy-nebula-default-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with this
# work for additional information regarding copyright ownership. The ASF
# licenses this file to You under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

set -eu

nebula_env_dir="${HOME}/nebula-docker-env"
cd ${nebula_env_dir}

docker compose up -d
sleep 30

docker run \
--rm \
--name nebula-console-loader \
--network nebula-docker-env_nebula-net \
vesoft/nebula-console:nightly -addr 172.28.3.1 -port 9669 -u root -p nebula -e ":play basketballplayer"
32 changes: 32 additions & 0 deletions spark/scripts/get-nebula-to-home.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with this
# work for additional information regarding copyright ownership. The ASF
# licenses this file to You under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

set -eu
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

nebula_env_dir="${HOME}/nebula-docker-env"
if [[ ! -d ${nebula_env_dir} ]]; then
mkdir ${nebula_env_dir}
else
echo ${nebula_env_dir} already exist.
fi
cd ${nebula_env_dir}

curl -s \
-o docker-compose.yaml \
https://raw.githubusercontent.com/vesoft-inc/nebula-spark-connector/master/nebula-spark-connector_3.0/src/test/resources/docker-compose.yaml
26 changes: 26 additions & 0 deletions spark/scripts/run-graphar2nebula.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with this
# work for additional information regarding copyright ownership. The ASF
# licenses this file to You under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

set -eu

cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
jar_file="${cur_dir}/../target/graphar-0.1.0-SNAPSHOT-shaded.jar"

graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/nebula2graphar/basketballplayergraph.graph.yml}"
spark-submit --class com.alibaba.graphar.example.GraphAr2Nebula ${jar_file} \
${graph_info_path}
28 changes: 28 additions & 0 deletions spark/scripts/run-nebula2graphar.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with this
# work for additional information regarding copyright ownership. The ASF
# licenses this file to You under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

set -eu

cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
jar_file="${cur_dir}/../target/graphar-0.1.0-SNAPSHOT-shaded.jar"

vertex_chunk_size=100
edge_chunk_size=1024
file_type="parquet"
spark-submit --class com.alibaba.graphar.example.Nebula2GraphAr ${jar_file} \
"/tmp/graphar/nebula2graphar" ${vertex_chunk_size} ${edge_chunk_size} ${file_type}
Loading

0 comments on commit 78f9226

Please sign in to comment.