Add Ray arguments in init_orca_context to Doc page (intel-analytics#5686

) * add ray docs * fix * fix * fix style * minor
ForJadeForest · Sep 20, 2022 · 71ee532 · 71ee532
1 parent b49d029
commit 71ee532
Show file tree

Hide file tree

Showing 3 changed files with 19 additions and 5 deletions.
diff --git a/docs/readthedocs/source/doc/Ray/Overview/ray.md b/docs/readthedocs/source/doc/Ray/Overview/ray.md
@@ -36,6 +36,20 @@ from bigdl.orca import init_orca_context
 sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)
 ```
 
+You can input the following RayOnSpark related arguments when you `init_orca_context` for Ray configurations:
+- `redis_port`: The redis port for the ray head node. The value would be randomly picked if not specified.
+- `redis_password`: The password for redis. The value would be ray's default password if not specified.
+- `object_store_memory`: The memory size for ray object_store in string. This can be specified in bytes(b), kilobytes(k), megabytes(m) or gigabytes(g). For example, "50b", "100k", "250m", "30g".
+- `verbose`: True for more logs when starting ray. Default is False.
+- `env`: The environment variable dict for running ray processes. Default is None.
+- `extra_params`: The key value dict for extra options to launch ray. For example, `extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}`.
+- `include_webui`: Default is True for including web ui when starting ray.
+- `system_config`: The key value dict for overriding RayConfig defaults. Mainly for testing purposes. An example for system_config could be: `{"object_spilling_config":"{\"type\":\"filesystem\", \"params\":{\"directory_path\":\"/tmp/spill\"}}"}`.
+- `num_ray_nodes`: The number of ray processes to start across the cluster. For Spark local mode, you don't need to specify this value. 
+For Spark cluster mode, it is default to be the number of Spark executors. If spark.executor.instances can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that num_ray_nodes is not larger than the number of Spark executors to make sure there are enough resources in your cluster.
+- `ray_node_cpu_cores`: The number of available cores for each ray process. For Spark local mode, it is default to be the number of Spark local cores. 
+For Spark cluster mode, it is default to be the number of cores for each Spark executor. If spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that ray_node_cpu_cores is not larger than the number of cores for each Spark executor to make sure there are enough resources in your cluster.
+
 By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
 
 ```python

diff --git a/docs/readthedocs/source/doc/Ray/QuickStart/ray-quickstart.md b/docs/readthedocs/source/doc/Ray/QuickStart/ray-quickstart.md
@@ -33,7 +33,7 @@ elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
     sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
 ```
 
-This is the only place where you need to specify local or distributed mode.
+This is the only place where you need to specify local or distributed mode. See [here](./../../Ray/Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
 
 By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
 

diff --git a/python/orca/src/bigdl/orca/ray/ray_on_spark_context.py b/python/orca/src/bigdl/orca/ray/ray_on_spark_context.py
@@ -353,15 +353,15 @@ def __init__(self, sc, redis_port=None, password="123456", object_store_memory=N
         :param verbose: True for more logs when starting ray. Default is False.
         :param env: The environment variable dict for running ray processes. Default is None.
         :param extra_params: The key value dict for extra options to launch ray.
-        For example, extra_params={"temp-dir": "/tmp/ray/"}
-        :param include_webui: True for including web ui when starting ray. Default is False.
-        :param num_ray_nodes: The number of raylets to start across the cluster.
+        For example, extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}.
+        :param include_webui: Default is True for including web ui when starting ray.
+        :param num_ray_nodes: The number of ray processes to start across the cluster.
         For Spark local mode, you don't need to specify this value.
         For Spark cluster mode, it is default to be the number of Spark executors. If
         spark.executor.instances can't be detected in your SparkContext, you need to explicitly
         specify this. It is recommended that num_ray_nodes is not larger than the number of
         Spark executors to make sure there are enough resources in your cluster.
-        :param ray_node_cpu_cores: The number of available cores for each raylet.
+        :param ray_node_cpu_cores: The number of available cores for each ray process.
         For Spark local mode, it is default to be the number of Spark local cores.
         For Spark cluster mode, it is default to be the number of cores for each Spark executor. If
         spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to