Skip to content

Commit

Permalink
Refine getting started documentation with information of examples (oa…
Browse files Browse the repository at this point in the history
  • Loading branch information
jerrychenhf authored Nov 10, 2022
1 parent cc4e833 commit 4a14a49
Show file tree
Hide file tree
Showing 7 changed files with 96 additions and 4 deletions.
42 changes: 40 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,45 @@ auth:

Refer to `example/cluster` directory for more cluster configurations examples.

### 6. Managing clusters
### 6. Running Analytics and AI workloads

Once the cluster is started, you can run Spark analytics and AI workloads
which are designed to be distributed and large scale in nature.

#### Running spark PI example

Running a Spark job is very straight forward. Spark PI job for example,

```
cloudtik exec ./your-cluster-config.yaml "spark-submit --master yarn --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.yarn.submit.waitAppCompletion=false \$SPARK_HOME/examples/jars/spark-examples_2.12-3.2.1.jar 12345" --job-waiter=spark
```

Refer to [Run Spark PI Example](example/spark) for more details.

#### Running analytics benchmarks

CloudTik provides ready to use tools for running TPC-DS benchmark
on a CloudTik spark runtime cluster.

Refer to [Run TPC-DS performance benchmark for Spark](tools/benchmarks/spark)
for a detailed step-by-step guide.

### Running machining examples

CloudTik provides ready to run examples for demonstrating
how distributed machine learning and deep learning jobs can be implemented
in CloudTik Spark and ML runtime cluster.

Refer to [Distributed Machine Learning and Deep Learning Examples](example/ml)
for a detailed step-by-step guide.

### Workflow examples
User can integrate CloudTik with external workflows using bash scripts or python
for running on-demand cluster and jobs.

Refer to [Workflow Integration Examples](example/workflows) for example scripts.

### 7. Managing clusters

CloudTik provides very powerful capability to monitor and manage the cluster.

Expand Down Expand Up @@ -284,7 +322,7 @@ Download files or directories from cluster.
cloudtik rsync-down /path/to/your-cluster-config.yaml [source] [target]
```

### 7. Tearing Down
### 8. Tearing Down

#### Terminate a Cluster

Expand Down
42 changes: 40 additions & 2 deletions docs/source/GettingStarted/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,45 @@ auth:

Refer to `example/cluster` directory for more cluster configurations examples.

### 6. Managing clusters
### 6. Running Analytics and AI workloads

Once the cluster is started, you can run Spark analytics and AI workloads
which are designed to be distributed and large scale in nature.

#### Running spark PI example

Running a Spark job is very straight forward. Spark PI job for example,

```
cloudtik exec ./your-cluster-config.yaml "spark-submit --master yarn --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.yarn.submit.waitAppCompletion=false \$SPARK_HOME/examples/jars/spark-examples_2.12-3.2.1.jar 12345" --job-waiter=spark
```

Refer to [Run Spark PI Example](https://github.com/oap-project/cloudtik/tree/main/example/spark) for more details.

#### Running analytics benchmarks

CloudTik provides ready to use tools for running TPC-DS benchmark
on a CloudTik spark runtime cluster.

Refer to [Run TPC-DS performance benchmark for Spark](https://github.com/oap-project/cloudtik/tree/main/tools/benchmarks/spark)
for a detailed step-by-step guide.

### Running machining examples

CloudTik provides ready to run examples for demonstrating
how distributed machine learning and deep learning jobs can be implemented
in CloudTik Spark and ML runtime cluster.

Refer to [Distributed Machine Learning and Deep Learning Examples](https://github.com/oap-project/cloudtik/tree/main/example/ml)
for a detailed step-by-step guide.

### Workflow examples
User can integrate CloudTik with external workflows using bash scripts or python
for running on-demand cluster and jobs.

Refer to [Workflow Integration Examples](https://github.com/oap-project/cloudtik/tree/main/example/workflows) for example scripts.

### 7. Managing clusters

CloudTik provides very powerful capability to monitor and manage the cluster.

Expand Down Expand Up @@ -222,7 +260,7 @@ Download files or directories from cluster.
cloudtik rsync-down /path/to/your-cluster-config.yaml [source] [target]
```

### 7. Tearing Down
### 8. Tearing Down

#### Terminate a Cluster

Expand Down
16 changes: 16 additions & 0 deletions example/spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Running Spark Pi Example

It is quite straight forward for running Spark built-in PI example
once you have a CloudTik with Spark runtime started.

Simple run the following command on your client machine:
```
cloudtik exec ./your-cluster-config.yaml "spark-submit --master yarn --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.yarn.submit.waitAppCompletion=false \$SPARK_HOME/examples/jars/spark-examples_2.12-3.2.1.jar 12345" --job-waiter=spark
```
This will submit a Spark Pi job to the cluster and running in the background.
This command will wait for the job to finish by specifying Spark job waiter "--job-waiter=spark".

If you want to run the job fully foreground, you can execute:
```
cloudtik exec ./your-cluster-config.yaml "spark-submit --master yarn --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi \$SPARK_HOME/examples/jars/spark-examples_2.12-3.2.1.jar 12345"
```

0 comments on commit 4a14a49

Please sign in to comment.