diff --git a/docs/docs/APIGuide/Ray/rayonspark.md b/docs/docs/APIGuide/Ray/rayonspark.md index df468652dcb..38f4163ea31 100644 --- a/docs/docs/APIGuide/Ray/rayonspark.md +++ b/docs/docs/APIGuide/Ray/rayonspark.md @@ -4,9 +4,10 @@ AnalyticsZoo has already provided a mechanism to deploy Python dependencies and across yarn cluster,meaning python user would be able to run `Analytics-Zoo` or `Ray` in a pythonic way on yarn without `spark-submit` or installing Analytics-Zoo or Ray across all cluster nodes. + ## Here are the steps to run RayOnSpark: -1) You should install Conda first and create a conda-env named "ray36" +1) You should install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env named "ray36" 2) Install some essential dependencies inthe conda env @@ -91,3 +92,5 @@ print([ray.get(actor.ip.remote()) for actor in actors]) ray_ctx.stop() ``` + + diff --git a/docs/docs/PythonUserGuide/install.md b/docs/docs/PythonUserGuide/install.md index f36119d2ea5..e4809a909fb 100644 --- a/docs/docs/PythonUserGuide/install.md +++ b/docs/docs/PythonUserGuide/install.md @@ -3,7 +3,7 @@ For Python users, Analytics Zoo can be installed either [from pip](#install-from **NOTE**: Only __Python 2.7__, __Python 3.5__ and __Python 3.6__ are supported for now. --- -## **Install from pip** +## **Install from pip for local usage** You can use the following command to install the latest release version of __analytics-zoo__ via pip easily: ```bash @@ -24,13 +24,46 @@ sc = init_nncontext() ``` **Remarks:** - 1. We've tested this package with pip 9.0.1. `pip install --upgrade pip` if necessary. 2. Pip install supports __Mac__ and __Linux__ platforms. -3. Pip install only supports __local__ mode. Cluster mode might be supported in the future. For those who want to use Analytics Zoo in cluster mode, please try to [install without pip](#install-without-pip). -4. You need to install Java __>= JDK8__ before running Analytics Zoo, which is required by `pyspark`. -5. `pyspark==2.4.3`, `bigdl==0.8.0` and their dependencies will automatically be installed if they haven't been detected in the current Python environment. +3. You need to install Java __>= JDK8__ before running Analytics Zoo, which is required by `pyspark`. +4. `pyspark==2.4.3`, `bigdl==0.8.0` and their dependencies will automatically be installed if they haven't been detected in the current Python environment. + +## **Install from pip for yarn cluster** + +You only need to following these steps on your driver node and we only support yarn-client mode for now. + +1) Install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env(i.e in the name of "zoo") + +2) Install Analytics-Zoo into the created conda-env + +``` +source activate zoo +pip install analytics-zoo +``` +3) Download JDK8 and set the environment variable: JAVA_HOME (recommended). + - You can also install JDK via conda without setting the JAVA_HOME manually: + `conda install -c anaconda openjdk=8.0.152` + +4) Start python and then execute the following code for verification. + +- Create a SparkContext on Yarn + +``` python + +from zoo import init_spark_on_yarn + +sc = init_spark_on_yarn( + hadoop_conf="path to the yarn configuration folder", + conda_name="zoo", # The name of the created conda-env + num_executor=2, + executor_cores=4, + executor_memory="8g", + driver_memory="2g", + driver_cores=4, + extra_executor_memory_for_ray="10g") +``` --- ## **Install without pip** diff --git a/docs/docs/PythonUserGuide/run.md b/docs/docs/PythonUserGuide/run.md index 7a75c23fb92..3f44fe0a17a 100644 --- a/docs/docs/PythonUserGuide/run.md +++ b/docs/docs/PythonUserGuide/run.md @@ -3,6 +3,7 @@ You need to first [install](install.md) analytics-zoo, either [from pip](install **NOTE**: Only __Python 2.7__, __Python 3.5__ and __Python 3.6__ are supported for now. --- + ## **Run after pip install** **Important:** @@ -48,6 +49,28 @@ export BIGDL_JARS=... export BIGDL_PACKAGES=... ``` +## **Run on yarn after pip install + +Start python and then execute the following code: +Caveat: You should use `init_spark_on_yarn` rather than `init_nncontext()` here. +- Create a SparkContext on Yarn + +``` python + +from zoo import init_spark_on_yarn + +sc = init_spark_on_yarn( + hadoop_conf="path to the yarn configuration folder", + conda_name="zoo", # The name of the created conda-env + num_executor=2, + executor_cores=4, + executor_memory="8g", + driver_memory="2g", + driver_cores=4, + extra_executor_memory_for_ray="10g") + +``` + --- ## **Run without pip install** - Note that __Python 3.6__ is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and >=2.2.0. See [this issue](https://issues.apache.org/jira/browse/SPARK-19019) for more discussion. diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index b69200ec01c..a4ab13002f4 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -67,6 +67,8 @@ pages: - Relation: APIGuide/FeatureEngineering/relation.md - Image: APIGuide/FeatureEngineering/image.md - Text: APIGuide/FeatureEngineering/text.md + - Ray: + - RayOnSpark: APIGuide/Ray/rayonspark.md - Models: - Object Detection: APIGuide/Models/object-detection.md - Image Classification: APIGuide/Models/image-classification.md diff --git a/pyzoo/dev/release/release.sh b/pyzoo/dev/release/release.sh index b60e0cee1db..24f66500be8 100755 --- a/pyzoo/dev/release/release.sh +++ b/pyzoo/dev/release/release.sh @@ -24,16 +24,23 @@ echo $ANALYTICS_ZOO_HOME ANALYTICS_ZOO_PYTHON_DIR="$(cd ${RUN_SCRIPT_DIR}/../../../pyzoo; pwd)" echo $ANALYTICS_ZOO_PYTHON_DIR -if (( $# < 1)); then - echo "Bad parameters. Usage example: bash release.sh linux" +if (( $# < 2)); then + echo "Bad parameters" + echo "Usage example: bash release.sh linux default" + echo "Usage example: bash release.sh linux 0.6.0.dev0" echo "If needed, you can also add other profiles such as: -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x" exit -1 fi platform=$1 -profiles=${*:2} -version=`cat $ANALYTICS_ZOO_PYTHON_DIR/zoo/__init__.py | grep "__version__" | awk '{print $NF}' | tr -d '"'` +version=$2 +profiles=${*:3} +if [ "${version}" == "default" ]; then + version=`cat $ANALYTICS_ZOO_PYTHON_DIR/zoo/__init__.py | grep "__version__" | awk '{print $NF}' | tr -d '"'` +fi + +echo "Using version: ${version}" cd ${ANALYTICS_ZOO_HOME} if [ "$platform" == "mac" ]; then diff --git a/pyzoo/setup.py b/pyzoo/setup.py index 5272cf8b127..464012d1229 100755 --- a/pyzoo/setup.py +++ b/pyzoo/setup.py @@ -94,7 +94,7 @@ def setup_package(): license='Apache License, Version 2.0', url='https://github.com/intel-analytics/analytics-zoo', packages=packages, - install_requires=['pyspark==2.4.3', 'bigdl==0.8.0'], + install_requires=['pyspark==2.4.3', 'bigdl==0.8.0', 'conda-pack==0.3.1'], dependency_links=['https://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz'], include_package_data=True, package_data={"zoo.share": ['lib/analytics-zoo*with-dependencies.jar', 'conf/*', 'bin/*', diff --git a/pyzoo/test/zoo/ray/integration/ray_on_yarn.py b/pyzoo/test/zoo/ray/integration/ray_on_yarn.py index cc9207bf087..8c60c965381 100644 --- a/pyzoo/test/zoo/ray/integration/ray_on_yarn.py +++ b/pyzoo/test/zoo/ray/integration/ray_on_yarn.py @@ -30,10 +30,12 @@ executor_memory="10g", driver_memory="2g", driver_cores=4, - extra_executor_memory_for_ray="30g") + extra_executor_memory_for_ray="30g", + spark_conf={"hello": "world"}) ray_ctx = RayContext(sc=sc, object_store_memory="25g", + extra_params={"temp-dir": "/tmp/hello/"}, env={"http_proxy": "http://child-prc.intel.com:913", "http_proxys": "http://child-prc.intel.com:913"}) ray_ctx.init() diff --git a/pyzoo/test/zoo/ray/test_ray_on_local.py b/pyzoo/test/zoo/ray/test_ray_on_local.py index 689ac0ba8c2..e381c028080 100644 --- a/pyzoo/test/zoo/ray/test_ray_on_local.py +++ b/pyzoo/test/zoo/ray/test_ray_on_local.py @@ -47,8 +47,6 @@ def test_local(self): for process_info in ray_ctx.ray_processesMonitor.process_infos: for pid in process_info.pids: assert not psutil.pid_exists(pid) - sc.stop() - if __name__ == "__main__": pytest.main([__file__]) diff --git a/pyzoo/zoo/common/nncontext.py b/pyzoo/zoo/common/nncontext.py index d47120de30b..b7d544624e0 100644 --- a/pyzoo/zoo/common/nncontext.py +++ b/pyzoo/zoo/common/nncontext.py @@ -52,7 +52,8 @@ def init_spark_on_yarn(hadoop_conf, hadoop_user_name="root", spark_yarn_archive=None, spark_log_level="WARN", - redirect_spark_log=True): + redirect_spark_log=True, + spark_conf=None): """ Create a SparkContext with Zoo configuration on Yarn cluster on "Yarn-client" mode. You should create a conda env and install the python dependencies in that env. @@ -72,8 +73,10 @@ def init_spark_on_yarn(hadoop_conf, `conda_name`, but you can also pass the path to a packed file in "tar.gz" format here. :param hadoop_user_name: User name for running in yarn cluster. Default value is: root :param spark_log_level: Log level of Spark - :param redirect_spark_log: - :return: Direct the Spark log to local file or not. + :param redirect_spark_log: Direct the Spark log to local file or not. + :param spark_conf: You can append extra spark conf here in key value format. + i.e spark_conf={"spark.executor.extraJavaOptions": "-XX:+PrintGCDetails"} + :return: SparkContext """ from zoo.ray.util.spark import SparkRunner sparkrunner = SparkRunner(spark_log_level=spark_log_level, @@ -91,7 +94,8 @@ def init_spark_on_yarn(hadoop_conf, penv_archive=penv_archive, hadoop_user_name=hadoop_user_name, spark_yarn_archive=spark_yarn_archive, - jars=None) + jars=None, + spark_conf=spark_conf) return sc diff --git a/pyzoo/zoo/ray/util/raycontext.py b/pyzoo/zoo/ray/util/raycontext.py index 890deb72259..7ba75842da0 100755 --- a/pyzoo/zoo/ray/util/raycontext.py +++ b/pyzoo/zoo/ray/util/raycontext.py @@ -92,7 +92,8 @@ def _prepare_env(self, cores=None): return modified_env def __init__(self, python_loc, redis_port, ray_node_cpu_cores, mkl_cores, - password, object_store_memory, waitting_time_sec=6, verbose=False, env=None): + password, object_store_memory, waitting_time_sec=6, verbose=False, env=None, + extra_params=None): """object_store_memory: integer in bytes""" self.env = env self.python_loc = python_loc @@ -103,6 +104,7 @@ def __init__(self, python_loc, redis_port, ray_node_cpu_cores, mkl_cores, self.ray_exec = self._get_ray_exec() self.object_store_memory = object_store_memory self.waiting_time_sec = waitting_time_sec + self.extra_params = extra_params self.verbose = verbose self.labels = """--resources='{"trainer": %s, "ps": %s }' """ % (1, 1) @@ -115,22 +117,25 @@ def _stop(iter): return _stop - def _gen_master_command(self): - command = "{} start --head " \ - "--include-webui --redis-port {} \ - --redis-password {} --num-cpus {} ". \ - format(self.ray_exec, self.redis_port, self.password, self.ray_node_cpu_cores) + def _enrich_command(self, command): if self.object_store_memory: command = command + "--object-store-memory {} ".format(str(self.object_store_memory)) + if self.extra_params: + for pair in self.extra_params.items(): + command = command + " --{} {} ".format(pair[0], pair[1]) return command + def _gen_master_command(self): + command = "{} start --head " \ + "--include-webui --redis-port {} " \ + "--redis-password {} --num-cpus {} ". \ + format(self.ray_exec, self.redis_port, self.password, self.ray_node_cpu_cores) + return self._enrich_command(command) + def _get_raylet_command(self, redis_address): command = "{} start --redis-address {} --redis-password {} --num-cpus {} {} ".format( self.ray_exec, redis_address, self.password, self.ray_node_cpu_cores, self.labels) - - if self.object_store_memory: - command = command + "--object-store-memory {} ".format(str(self.object_store_memory)) - return command + return self._enrich_command(command) def _start_ray_node(self, command, tag, wait_before=5, wait_after=5): modified_env = self._prepare_env(self.mkl_cores) @@ -180,18 +185,24 @@ def _start_ray_services(iter): class RayContext(object): def __init__(self, sc, redis_port=None, password="123456", object_store_memory=None, - verbose=False, env=None, local_ray_node_num=2, waitting_time_sec=8): + verbose=False, env=None, local_ray_node_num=2, waiting_time_sec=8, + extra_params=None): """ - The RayContext would init a ray cluster on top of the configuration of the SparkContext. + The RayContext would init a ray cluster on top of the configuration of SparkContext. For spark cluster mode: The number of raylets is equal to number of executors. For Spark local mode: The number of raylets is controlled by local_ray_node_num. - CPU cores for each raylet equals to spark_cores/local_ray_node_num. + CPU cores for each is raylet equals to spark_cores/local_ray_node_num. :param sc: :param redis_port: redis port for the "head" node. - The value would be randomly picked if not specified - :param local_ray_node_num number of raylets to be created. + The value would be randomly picked if not specified. + :param password: [optional] password for the redis. :param object_store_memory: Memory size for the object_store. + :param verbose: True for more logs. :param env: The environment variable dict for running Ray. + :param local_ray_node_num number of raylets to be created. + :param waiting_time_sec: Waiting time for the raylets before connecting to redis. + :param extra_params: key value dictionary for extra options to launch Ray. + i.e extra_params={"temp-dir": "/tmp/ray2/"} """ self.sc = sc self.stopped = False @@ -212,7 +223,8 @@ def __init__(self, sc, redis_port=None, password="123456", object_store_memory=N object_store_memory=self._enrich_object_sotre_memory(sc, object_store_memory), verbose=verbose, env=env, - waitting_time_sec=waitting_time_sec) + waitting_time_sec=waiting_time_sec, + extra_params=extra_params) self._gather_cluster_ips() from bigdl.util.common import init_executor_gateway print("Start to launch the JVM guarding process") diff --git a/pyzoo/zoo/ray/util/spark.py b/pyzoo/zoo/ray/util/spark.py index b0c58c84aef..802a8bad8df 100644 --- a/pyzoo/zoo/ray/util/spark.py +++ b/pyzoo/zoo/ray/util/spark.py @@ -98,6 +98,18 @@ def _detect_python_location(self): "Cannot detect current python location. Please set it manually by python_location") return process_info.out + def _gather_essential_jars(self): + from bigdl.util.engine import get_bigdl_classpath + from zoo.util.engine import get_analytics_zoo_classpath + bigdl_classpath = get_bigdl_classpath() + zoo_classpath = get_analytics_zoo_classpath() + assert bigdl_classpath, "Cannot find bigdl classpath" + assert zoo_classpath, "Cannot find Analytics-Zoo classpath" + if bigdl_classpath == zoo_classpath: + return [zoo_classpath] + else: + return [zoo_classpath, bigdl_classpath] + def init_spark_on_local(self, cores, conf=None, python_location=None): print("Start to getOrCreate SparkContext") os.environ['PYSPARK_PYTHON'] =\ @@ -124,29 +136,24 @@ def init_spark_on_yarn(self, penv_archive=None, hadoop_user_name="root", spark_yarn_archive=None, + spark_conf=None, jars=None): os.environ["HADOOP_CONF_DIR"] = hadoop_conf os.environ['HADOOP_USER_NAME'] = hadoop_user_name os.environ['PYSPARK_PYTHON'] = "python_env/bin/python" def _yarn_opt(jars): - from zoo.util.engine import get_analytics_zoo_classpath command = " --archives {}#python_env --num-executors {} " \ " --executor-cores {} --executor-memory {}".\ format(penv_archive, num_executor, executor_cores, executor_memory) - path_to_zoo_jar = get_analytics_zoo_classpath() + jars_list = self._gather_essential_jars() + if jars: + jars_list.append(jars) if extra_python_lib: command = command + " --py-files {} ".format(extra_python_lib) - if jars: - command = command + " --jars {},{} ".format(jars, path_to_zoo_jar) - elif path_to_zoo_jar: - command = command + " --jars {} ".format(path_to_zoo_jar) - - if path_to_zoo_jar: - command = command + " --conf spark.driver.extraClassPath={} ".\ - format(get_analytics_zoo_classpath()) + command = command + " --jars {}".format(",".join(jars_list)) return command def _submit_opt(): @@ -158,7 +165,7 @@ def _submit_opt(): conf["spark.executor.memoryOverhead"] = extra_executor_memory_for_ray if spark_yarn_archive: conf.insert("spark.yarn.archive", spark_yarn_archive) - return " --master yarn " + _yarn_opt(jars) + 'pyspark-shell', conf + return " --master yarn --deploy-mode client" + _yarn_opt(jars) + ' pyspark-shell ', conf pack_env = False assert penv_archive or conda_name, \ @@ -169,6 +176,9 @@ def _submit_opt(): pack_env = True submit_args, conf = _submit_opt() + if spark_conf: + for item in spark_conf.items(): + conf[str(item[0])] = str(item[1]) sc = self._create_sc(submit_args, conf) finally: if conda_name and penv_archive and pack_env: