Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[merge on 720]add-dashboard-ent&explorer-updates-to-3.1.0 #1514

Merged
merged 2 commits into from
Jul 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions docs-2.0/graph-computing/0.deploy-controller-analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Dag Controller

Dag Controller is a task scheduling tool that can schedule the jobs which type is DAG (directed acyclic graph). The job consists of multiple tasks to form a directed acyclic graph, and there is a dependency between the tasks.

The Dag Controller can perform complex graph computing with Nebula Analytics. For example, the Dag Controller sends an algorithm request to Nebula Analytics, which saves the result to Nebula Graph or HDFS. The Dag Controller then takes the result as input to the next algorithmic task to create a new task.

This topic describes how to use the Dag Controller.

!!! enterpriseonly

Only available for the Nebula Graph Enterprise Edition.

## Prerequisites

- The [HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html) 2.2.x or later has been deployed.

- The JDK 1.8 has been deployed.

## Preparations

There are some differences between installation packages and commands in different environments. The preparations are as follows

- The operating system is CentOS 7.
- If the Nebula Analytics and the Dag Controller are deployed on multiple machines, ensure network connectivity between the machines.
- If the Nebula Analytics is a cluster with distributed architecture, ensure the paths and ports are configured identically for each machine.

## Precautions

- The BFS and SSSP algorithms need to verify the parameter `root`. They support only one upstream component and must specify rows and columns. If multiple files exist, a random file is selected. If a row, column, or file is not found, an error will be reported.

- The similarity algorithm does not restrict the format of the upstream component, but it must specify columns. If multiple files exist, the file will be superimposed randomly, and the first N rows of data will be processed. If rows and columns are specified, or the specified column does not exist, an error will be reported.

## Deploy Nebula Analytics

1. Install libatomic and psmisc.

```
sudo yum -y install libatomic psmisc
```

2. Install the Nebula Analytics.

```
sudo rpm -ivh <analytics_package_name> --prefix <install_path>
sudo chown <user>:<user> -R <install path>
```

For example:

```
sudo rpm -ivh nebula-analytics-{{plato.release}}-centos.x86_64.rpm --prefix=/home/vesoft/nebula-analytics
sudo chown vesoft:vesoft -R /home/vesoft/nebula-analytics
```

3. Configure the correct Hadoop path and JDK path in the file `set_env.sh`, the file path is `nebula-analytics/scripts/set_env.sh`. If there are multiple machines, ensure that the paths are the same.

```
export HADOOP_HOME=<hadoop_path>
export JAVA_HOME=<java_path>
```

## Deploy Dag Controller

1. Complete the SSH password-free configurations so that the Dag Controller machine can log directly into the Nebula Analytics machines and all machines within the Nebula Analytics cluster can connect directly to each other without passwords.

For example, the user in the machine A (Dag Controller) log directly into machine B-1 in the Nebula Analytics cluster over SSH without passwords. Run the following commands on the machine A:

```
//Press Enter to execute the default option to generate the key.
ssh-keygen -t rsa

//After the public key file of machine A is installed to the user of the machine B-1, you can log into the machine B-1 from the machine A without passwords.
ssh-copy-id -i ~/.ssh/id_rsa.pub <B_user>@<B_IP>
```

In the same way, complete the SSH password-free configurations so that the user in the machine A can log directly into the machine B-2, B-3, etc. and all machines within the Nebula Analytics cluster can connect directly to each other without passwords.

2. Add the following to the file `~/.bash_profile` and run the command `source ~/.bash_profile` to make it effective.

```
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa
```

3. Install the Dag Controller.

```
sudo rpm -ivh <analytics_package_name> --prefix <install_path>
sudo chown <user>:<user> -R <install path>
```

For example:

```
sudo rpm -ivh dag-ctrl-{{dag.release}}-centos.x86_64.rpm --prefix=/home/vesoft/dag-ctrl
sudo chown vesoft:vesoft -R /home/vesoft/dag-ctrl
```

4. Configure the username and port of the Nebula Analytics in the file `dag-ctrl-api.yaml`, the file path is `dag-ctrl/etc/dag-ctrl-api.yaml`. If there are multiple machines, ensure that the usernames are the same.

```
# The user name and SSH port of the Nebula Analytics machine.
SSH:
UserName: vesoft
Port: 22

#The parallel thread pool sizes of the tasks and jobs.
JobPool:
Sleep: 3 # Check every 3 seconds for any outstanding jobs.
Size: 3 # Up to 3 jobs can be executed in parallel.
TaskPool:
CheckStatusSleep: 1 # Check the task status every second.
Size: 10 # Up to 10 tasks can be executed in parallel.

Dag:
VarDataListMaxSize: 100 # If HDFS columns are read, the number is limited to 100 at a time.
```

5. Configure the algorithm file path (`exec_file`) in the file `tasks.yaml`, the file path is `dag-ctrl/etc/tasks.yaml`. If there are multiple machines, ensure that the paths are the same.

6. Start the Dag Controller.

```
cd <dag_ctrl_install_path>
./scripts/start.sh
```

7. Check whether the startup is successful. The default port is `9002` which set in the file `dag-Ctrl-api. yaml`.

```
netstat -aon | grep 9002
```

## Next to do

After the Nebula Analytics and the Dag Controller are configured and started, you need to configure resources on the Nebula Explorer to perform complex algorithm computing. For details, see [Prepare resources](../nebula-explorer/workflow/1.prepare-resources.md).

## FAQ

### Will the Dag Controller service crash if the Graph service returns too much result data?

The Dag Controller service only provides scheduling capabilities and will not crash, but the Nebula Analytics service may crash due to insufficient memory when writing too much data to HDFS or Nebula Graph, or reading too much data from HDFS or Nebula Graph.

### Can I continue a job from a failed task?

Not supported. You can only re-execute the entire job.

### How can I speed it up if a task result is saved slowly or data is transferred slowly between tasks?

The Dag Controller contains graph query components and graph computing components. Graph queries send requests to a graph service for queries, so the graph queries can only be accelerated by increasing the memory of the graph service. Graph computing is performed on distributed nodes provided by Nebula Analytics, so graph computing can be accelerated by increasing the size of the Nebula Analytics cluster.

### The HDFS server cannot be connected and the task status is running.

Set the timeout period and times for HDFS connections as follows:

```bash
<configuration>
<property>
<name>ipc.client.connect.timeout</name>
<value>3000</value>
</property>

<property>
<name>ipc.client.connect.max.retries.on.timeouts</name>
<value>3</value>
</property>
</configuration>
```

### How to resolve the error `Err:dial unix: missing address`?

Modify the configuration file `dag-ctrl/etc/dag-ctrl-api.yaml` to configure the `UserName` of the SSH.

### How to resolve the error `bash: /home/xxx/nebula-analytics/scripts/run_algo.sh: No such file or directory`?

Modify the configuration file `dag-ctrl/etc/tasks.yaml`to configure the algorithm execution path parameter `exec_file`.

### How to resolve the error `/lib64/libm.so.6: version 'GLIBC_2.29' not found (required by /home/vesoft/jdk-18.0.1/jre/lib/amd64/server/libjvm.so)`?

Because the operating system version does not support JDK18, the command `YUM` cannot download `GLIBC_2.29`, you can install JDK1.8. Does not forget to change the JDK address in `nebula-analytics/scripts/set_env.sh`.

### How to resolve the error `handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain`?

Reconfigure the permissions to `744` on the folder `.ssh` and `600` on the file `.ssh/authorized_keys`.

### How to resolve the error `There are 0 Nebula Analytics available. clusterSize should be less than or equal to it`?

The possible causes are as follows:

- The Nebula Analytics has not been deployed. Configure the Nebula Analytics as described in this document.

- The Nebula Analytics has been deployed, but can not connect to the Dag Controller. For example, the IP address is incorrect, SSH is not configured, and the startup users of the two services are inconsistent (causing SSH login failures).

### How to resolve the error `broadcast.hpp:193] Check failed: (size_t)recv_bytes >= sizeof(chunk_tail_t) recv message too small: 0`?

The amount of data to be processed is too small, but the number of compute nodes and processes is too large. Smaller `clusterSize` and `processes` need to be set when submitting jobs.
Loading