diff --git a/docs/docs/DeveloperGuide/python.md b/docs/docs/DeveloperGuide/python.md new file mode 100644 index 00000000000..b74442bc271 --- /dev/null +++ b/docs/docs/DeveloperGuide/python.md @@ -0,0 +1,63 @@ +This page gives some general instructions and tips to build and develop Analytics Zoo for Python developers. + +You are very welcome to add customized functionalities to Analytics Zoo to meet your own demands. +You are also highly encouraged to contribute to Analytics Zoo for extra features so that other community users would get benefits as well. + +--- +## **Download Analytics Zoo Source Code** +Analytics Zoo source code is available at [GitHub](https://github.com/intel-analytics/analytics-zoo): + +```bash +git clone https://github.com/intel-analytics/analytics-zoo.git +``` + +By default, `git clone` will download the development version of Analytics Zoo. If you want a release version, you can use the command `git checkout` to change the specified version. + + +--- +## **Build whl package for pip install** +If you have modified some Python code and want to newly generate the [whl](https://pythonwheels.com/) package for pip install, you can run the following script: + +```bash +bash analytics-zoo/pyzoo/dev/build.sh linux default +``` + +**Arguments:** + +- The first argument is the __platform__ to build for. Either 'linux' or 'mac'. +- The second argument is the analytics-zoo __version__ to build for. 'default' means the default version for the current branch. You can also specify a different version if you wish, e.g., '0.6.0.dev1'. +- You can also add other profiles to build the package, especially Spark and BigDL versions. +For example, under the situation that `pyspark==2.4.3` is a dependency, you need to add profiles `-Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x` to build Analytics Zoo for Spark 2.4.3. + + +After running the above command, you will find a `whl` file under the folder `analytics-zoo/pyzoo/dist/`. You can then directly pip install it to your local Python environment: +```bash +pip install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl # for Python 2.7 +pip3 install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl # for Python 3.5 and Python 3.6 +``` + +See [here](../PythonUserGuide/install/#install-from-pip-for-local-usage) for more remarks related to pip install. + +See [here](../PythonUserGuide/run/#run-after-pip-install) for more instructions to run analytics-zoo after pip install. + + +--- +## **Run in IDE** +You need to do the following preparations before starting the Integrated Development Environment (IDE) to successfully run an Analytics Zoo Python program in the IDE: + +- Build Analytics Zoo. See [here](../ScalaUserGuide/install/#build-with-script-recommended) for more instructions. +- Prepare Spark environment by either setting `SPARK_HOME` as the environment variable or pip install `pyspark`. Note that the Spark version should match the one you build Analytics Zoo on. +- Set BIGDL_CLASSPATH: +```bash +export BIGDL_CLASSPATH=analytics-zoo/dist/lib/analytics-zoo-*-jar-with-dependencies.jar +``` + +- Prepare BigDL Python environment by either downloading BigDL from [GitHub](https://github.com/intel-analytics/BigDL) or pip install `bigdl`. Note that the BigDL version should match the one you build Analytics Zoo on. +- Add `pyzoo` and `spark-analytics-zoo.conf` to `PYTHONPATH`: +```bash +export PYTHONPATH=analytics-zoo/pyzoo:analytics-zoo/dist/conf/spark-analytics-zoo.conf:$PYTHONPATH +``` +If you download BigDL from [GitHub](https://github.com/intel-analytics/BigDL), you also need to add `BigDL/pyspark` to `PYTHONPATH`: +```bash +export PYTHONPATH=BigDL/pyspark:$PYTHONPATH +``` diff --git a/docs/docs/PythonUserGuide/install.md b/docs/docs/PythonUserGuide/install.md index e4809a909fb..b663e9243f5 100644 --- a/docs/docs/PythonUserGuide/install.md +++ b/docs/docs/PythonUserGuide/install.md @@ -24,34 +24,33 @@ sc = init_nncontext() ``` **Remarks:** + 1. We've tested this package with pip 9.0.1. `pip install --upgrade pip` if necessary. 2. Pip install supports __Mac__ and __Linux__ platforms. 3. You need to install Java __>= JDK8__ before running Analytics Zoo, which is required by `pyspark`. 4. `pyspark==2.4.3`, `bigdl==0.8.0` and their dependencies will automatically be installed if they haven't been detected in the current Python environment. +--- ## **Install from pip for yarn cluster** You only need to following these steps on your driver node and we only support yarn-client mode for now. -1) Install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env(i.e in the name of "zoo") +1) Install [Conda](https://docs.conda.io/projects/conda/en/latest/commands/install.html) and create a conda-env (i.e in the name of "zoo"). -2) Install Analytics-Zoo into the created conda-env +2) Install Analytics-Zoo into the created conda-env. ``` source activate zoo pip install analytics-zoo - ``` 3) Download JDK8 and set the environment variable: JAVA_HOME (recommended). + - You can also install JDK via conda without setting the JAVA_HOME manually: `conda install -c anaconda openjdk=8.0.152` -4) Start python and then execute the following code for verification. - -- Create a SparkContext on Yarn +4) Start python and then execute the following code to create a SparkContext on Yarn for verification. ``` python - from zoo import init_spark_on_yarn sc = init_spark_on_yarn( diff --git a/docs/docs/PythonUserGuide/run.md b/docs/docs/PythonUserGuide/run.md index 3f44fe0a17a..609d2152671 100644 --- a/docs/docs/PythonUserGuide/run.md +++ b/docs/docs/PythonUserGuide/run.md @@ -49,14 +49,14 @@ export BIGDL_JARS=... export BIGDL_PACKAGES=... ``` -## **Run on yarn after pip install +--- +## **Run on yarn after pip install** + +You should use `init_spark_on_yarn` rather than `init_nncontext()` here to create a SparkContext on Yarn. Start python and then execute the following code: -Caveat: You should use `init_spark_on_yarn` rather than `init_nncontext()` here. -- Create a SparkContext on Yarn ``` python - from zoo import init_spark_on_yarn sc = init_spark_on_yarn( @@ -68,7 +68,6 @@ sc = init_spark_on_yarn( driver_memory="2g", driver_cores=4, extra_executor_memory_for_ray="10g") - ``` --- diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 5e37195ca56..ff3bc0b7618 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -25,6 +25,8 @@ pages: - Install: ScalaUserGuide/install.md - Run: ScalaUserGuide/run.md - Examples: ScalaUserGuide/examples.md +- Developer Guide: + - For Python Developers: DeveloperGuide/python.md - Programming Guide: - Pipeline APIs: - DataFrame and ML Pipeline: ProgrammingGuide/nnframes.md diff --git a/pyzoo/dev/build.sh b/pyzoo/dev/build.sh new file mode 100644 index 00000000000..3c715ab42a0 --- /dev/null +++ b/pyzoo/dev/build.sh @@ -0,0 +1,35 @@ +#!/usr/bin/env bash + +# +# Copyright 2018 Analytics Zoo Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +set -e +RUN_SCRIPT_DIR=$(cd $(dirname $0) ; pwd) +echo $RUN_SCRIPT_DIR + +if (( $# < 2)); then + echo "Usage: build.sh platform version mvn_parameters" + echo "Usage example: bash release.sh linux default" + echo "Usage example: bash release.sh linux 0.6.0.dev0" + echo "If needed, you can also add other profiles such as: -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x" + exit -1 +fi + +platform=$1 +version=$2 +profiles=${*:3} + +bash ${RUN_SCRIPT_DIR}/release.sh ${platform} ${version} false ${profiles} diff --git a/pyzoo/dev/release/release.sh b/pyzoo/dev/release.sh similarity index 91% rename from pyzoo/dev/release/release.sh rename to pyzoo/dev/release.sh index 161a92b896e..8e6da23851c 100755 --- a/pyzoo/dev/release/release.sh +++ b/pyzoo/dev/release.sh @@ -19,15 +19,15 @@ set -e RUN_SCRIPT_DIR=$(cd $(dirname $0) ; pwd) echo $RUN_SCRIPT_DIR -export ANALYTICS_ZOO_HOME="$(cd ${RUN_SCRIPT_DIR}/../../../; pwd)" +export ANALYTICS_ZOO_HOME="$(cd ${RUN_SCRIPT_DIR}/../../; pwd)" echo $ANALYTICS_ZOO_HOME -ANALYTICS_ZOO_PYTHON_DIR="$(cd ${RUN_SCRIPT_DIR}/../../../pyzoo; pwd)" +ANALYTICS_ZOO_PYTHON_DIR="$(cd ${RUN_SCRIPT_DIR}/../../pyzoo; pwd)" echo $ANALYTICS_ZOO_PYTHON_DIR if (( $# < 3)); then echo "Usage: release.sh platform version upload mvn_parameters" echo "Usage example: bash release.sh linux default true" - echo "Usage example: bash release.sh linux 0.6.0.dev0 true -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x" + echo "Usage example: bash release.sh linux 0.6.0.dev0 true" echo "If needed, you can also add other profiles such as: -Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.x" exit -1 fi @@ -92,4 +92,3 @@ if [ ${upload} == true ]; then echo "Command for uploading to pypi: $upload_command" $upload_command fi -