diff --git a/README.md b/README.md
index bac29b4ddd..602a986674 100644
--- a/README.md
+++ b/README.md
@@ -20,10 +20,10 @@ DeePMD-kit is a package written in Python/C++, designed to minimize the effort r
 For more information, check the [documentation](https://deepmd.readthedocs.io/).
 
 # Highlights in DeePMD-kit v2.0
-* [Model compression](doc/getting-started.md#compress-a-model). Accelerate the efficiency of model inference for 4-15 times.
-* [New descriptors](doc/getting-started.md#write-the-input-script). Including [`se_e2_r`](doc/train-se-e2-r.md) and [`se_e3`](doc/train-se-e3.md).
-* [Hybridization of descriptors](doc/train-hybrid.md). Hybrid descriptor constructed from concatenation of several descriptors.
-* [Atom type embedding](doc/train-se-e2-a-tebd.md). Enable atom type embedding to decline training complexity and refine performance.
+* [Model compression](doc/freeze/compress.md). Accelerate the efficiency of model inference for 4-15 times.
+* [New descriptors](doc/model/overall.md). Including [`se_e2_r`](doc/model/train-se-e2-r.md) and [`se_e3`](doc/model/train-se-e3.md).
+* [Hybridization of descriptors](doc/model/train-hybrid.md). Hybrid descriptor constructed from concatenation of several descriptors.
+* [Atom type embedding](doc/model/train-se-e2-a-tebd.md). Enable atom type embedding to decline training complexity and refine performance.
 * Training and inference the dipole (vector) and polarizability (matrix).
 * Split of training and validation dataset.
 * Optimized training on GPUs. 
@@ -51,28 +51,66 @@ In addition to building up potential energy models, DeePMD-kit can also be used
 
 # Download and install
 
-Please follow our [github](https://github.com/deepmodeling/deepmd-kit) webpage to download the [latest released version](https://github.com/deepmodeling/deepmd-kit/tree/master) and [development version](https://github.com/deepmodeling/deepmd-kit/tree/devel).
+Please follow our [GitHub](https://github.com/deepmodeling/deepmd-kit) webpage to download the [latest released version](https://github.com/deepmodeling/deepmd-kit/tree/master) and [development version](https://github.com/deepmodeling/deepmd-kit/tree/devel).
 
-DeePMD-kit offers multiple installation methods. It is recommend using easily methods like [offline packages](doc/install.md#offline-packages), [conda](doc/install.md#with-conda) and [docker](doc/install.md#with-docker). 
+DeePMD-kit offers multiple installation methods. It is recommend using easily methods like [offline packages](doc/install/easy-install.md#offline-packages), [conda](doc/install/easy-install.md#with-conda) and [docker](doc/install/easy-install.md#with-docker). 
 
-One may manually install DeePMD-kit by following the instuctions on [installing the python interface](doc/install.md#install-the-python-interface) and [installing the C++ interface](doc/install.md#install-the-c-interface). The C++ interface is necessary when using DeePMD-kit with LAMMPS and i-PI.
+One may manually install DeePMD-kit by following the instuctions on [installing the Python interface](doc/install/install-from-source.md#install-the-python-interface) and [installing the C++ interface](doc/install/install-from-source.md#install-the-c-interface). The C++ interface is necessary when using DeePMD-kit with LAMMPS and i-PI.
 
 
 # Use DeePMD-kit
 
-The typical procedure of using DeePMD-kit includes the following steps 
-
-1. [Prepare data](doc/getting-started.md#prepare-data)
-2. [Train a model](doc/getting-started.md#train-a-model)
-3. [Analyze training with Tensorboard](doc/tensorboard.md)
-4. [Freeze the model](doc/getting-started.md#freeze-a-model)
-5. [Test the model](doc/getting-started.md#test-a-model)
-6. [Compress the model](doc/getting-started.md#compress-a-model)
-7. [Inference the model in python](doc/getting-started.md#model-inference) or using the model in other molecular simulation packages like [LAMMPS](doc/getting-started.md#run-md-with-lammps), [i-PI](doc/getting-started.md#run-path-integral-md-with-i-pi) or [ASE](doc/getting-started.md#use-deep-potential-with-ase).
-
-A quick-start on using DeePMD-kit can be found [here](doc/getting-started.md).
-
-A full [document](doc/train-input-auto.rst) on options in the training input script is available.
+A quick-start on using DeePMD-kit can be found as follows:
+
+- [Prepare data with dpdata](doc/data/dpdata.md)
+- [Training a model](doc/train/training.md)
+- [Freeze a model](doc/freeze/freeze.md)
+- [Test a model](doc/test/test.md)
+- [Running MD with LAMMPS](doc/third-party/lammps.md)
+
+A full [document](doc/train/train-input-auto.rst) on options in the training input script is available.
+
+# Advanced
+
+- [Installation](doc/install/index.md)
+    - [Easy install](doc/install/easy-install.md)
+    - [Install from source code](doc/install/install-from-source.md)
+    - [Install LAMMPS](doc/install/install-lammps.md)
+    - [Install i-PI](doc/install/install-ipi.md)
+    - [Building conda packages](doc/install/build-conda.md)
+- [Data](doc/data/index.md)
+    - [Data conversion](doc/data/data-conv.md)
+    - [Prepare data with dpdata](doc/data/dpdata.md)
+- [Model](doc/model/index.md)
+    - [Overall](doc/model/overall.md)
+    - [Descriptor `"se_e2_a"`](doc/model/train-se-e2-a.md)
+    - [Descriptor `"se_e2_r"`](doc/model/train-se-e2-r.md)
+    - [Descriptor `"se_e3"`](doc/model/train-se-e3.md)
+    - [Descriptor `"hybrid"`](doc/model/train-hybrid.md)
+    - [Fit energy](doc/model/train-energy.md)
+    - [Fit `tensor` like `Dipole` and `Polarizability`](doc/model/train-fitting-tensor.md)
+    - [Train a Deep Potential model using `type embedding` approach](doc/model/train-se-e2-a-tebd.md)
+- [Training](doc/train/index.md)
+    - [Training a model](doc/train/training.md)
+    - [Advanced options](doc/train/training-advanced.md)
+    - [Parallel training](doc/train/parallel-training.md)
+    - [TensorBoard Usage](doc/train/tensorboard.md)
+    - [Known limitations of using GPUs](doc/train/gpu-limitations.md)
+    - [Training Parameters](doc/train/train-input-auto.rst)
+- [Freeze and Compress](doc/freeze/index.rst)
+    - [Freeze a model](doc/freeze/freeze.md)
+    - [Compress a model](doc/freeze/compress.md)
+- [Test](doc/test/index.rst)
+    - [Test a model](doc/test/test.md)
+    - [Calculate Model Deviation](doc/test/model-deviation.md)
+- [Inference](doc/inference/index.rst)
+    - [Python interface](doc/inference/python.md)
+    - [C++ interface](doc/inference/cxx.md)
+- [Integrate with third-party packages](doc/third-party/index.rst)
+    - [Use deep potential with ASE](doc/third-party/ase.md)
+    - [Running MD with LAMMPS](doc/third-party/lammps.md)
+    - [LAMMPS commands](doc/third-party/lammps-command.md)
+    - [Run path-integral MD with i-PI](doc/third-party/ipi.md)
 
 
 # Code structure
@@ -97,7 +135,14 @@ The code is organized as follows:
 
 # Troubleshooting
 
-See the [troubleshooting page](doc/troubleshooting/index.md).
+- [Model compatibility](doc/troubleshooting/model-compatability.md)
+- [Installation](doc/troubleshooting/installation.md)
+- [The temperature undulates violently during early stages of MD](doc/troubleshooting/md-energy-undulation.md)
+- [MD: cannot run LAMMPS after installing a new version of DeePMD-kit](doc/troubleshooting/md-version-compatibility.md)
+- [Do we need to set rcut < half boxsize?](doc/troubleshooting/howtoset-rcut.md)
+- [How to set sel?](doc/troubleshooting/howtoset-sel.md)
+- [How to control the number of nodes used by a job?](doc/troubleshooting/howtoset_num_nodes.md)
+- [How to tune Fitting/embedding-net size?](doc/troubleshooting/howtoset_netsize.md)
 
 
 # Contributing
diff --git a/doc/conf.py b/doc/conf.py
index 71d583d6ae..9d84763d40 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -106,8 +106,8 @@ def classify_index_TS():
 # -- Project information -----------------------------------------------------
 
 project = 'DeePMD-kit'
-copyright = '2020, Deep Potential'
-author = 'Deep Potential'
+copyright = '2017-2021, Deep Modeling'
+author = 'Deep Modeling'
 
 def run_doxygen(folder):
     """Run the doxygen make command in the designated folder"""
@@ -148,9 +148,9 @@ def setup(app):
 #     'sphinx.ext.autosummary'
 # ]
 
-mkindex("troubleshooting")
-mkindex("development")
-classify_index_TS()
+#mkindex("troubleshooting")
+#mkindex("development")
+#classify_index_TS()
 
 extensions = [
     "sphinx_rtd_theme",
diff --git a/doc/data/data-conv.md b/doc/data/data-conv.md
new file mode 100644
index 0000000000..e185c1efd9
--- /dev/null
+++ b/doc/data/data-conv.md
@@ -0,0 +1,52 @@
+# Data conversion
+
+One needs to provide the following information to train a model: the atom type, the simulation box, the atom coordinate, the atom force, system energy and virial. A snapshot of a system that contains these information is called a **frame**. We use the following convention of units:
+
+
+Property | Unit 
+---|---
+Time     | ps   
+Length   | Å    
+Energy   | eV   
+Force    | eV/Å 
+Virial   | eV   
+Pressure | Bar  
+
+
+The frames of the system are stored in two formats. A raw file is a plain text file with each information item written in one file and one frame written on one line. The default files that provide box, coordinate, force, energy and virial are `box.raw`, `coord.raw`, `force.raw`, `energy.raw` and `virial.raw`, respectively. *We recommend you use these file names*. Here is an example of force.raw:
+```bash
+$ cat force.raw
+-0.724  2.039 -0.951  0.841 -0.464  0.363
+ 6.737  1.554 -5.587 -2.803  0.062  2.222
+-1.968 -0.163  1.020 -0.225 -0.789  0.343
+```
+This `force.raw` contains 3 frames with each frame having the forces of 2 atoms, thus it has 3 lines and 6 columns. Each line provides all the 3 force components of 2 atoms in 1 frame. The first three numbers are the 3 force components of the first atom, while the second three numbers are the 3 force components of the second atom. The coordinate file `coord.raw` is organized similarly. In `box.raw`, the 9 components of the box vectors should be provided on each line. In `virial.raw`, the 9 components of the virial tensor should be provided on each line in the order `XX XY XZ YX YY YZ ZX ZY ZZ`. The number of lines of all raw files should be identical.
+
+We assume that the atom types do not change in all frames. It is provided by `type.raw`, which has one line with the types of atoms written one by one. The atom types should be integers. For example the `type.raw` of a system that has 2 atoms with 0 and 1:
+```bash
+$ cat type.raw
+0 1
+```
+
+Sometimes one needs to map the integer types to atom name. The mapping can be given by the file `type_map.raw`. For example
+```bash
+$ cat type_map.raw
+O H
+```
+The type `0` is named by `"O"` and the type `1` is named by `"H"`.
+
+The second format is the data sets of `numpy` binary data that are directly used by the training program. User can use the script `$deepmd_source_dir/data/raw/raw_to_set.sh` to convert the prepared raw files to data sets. For example, if we have a raw file that contains 6000 frames, 
+```bash
+$ ls 
+box.raw  coord.raw  energy.raw  force.raw  type.raw  virial.raw
+$ $deepmd_source_dir/data/raw/raw_to_set.sh 2000
+nframe is 6000
+nline per set is 2000
+will make 3 sets
+making set 0 ...
+making set 1 ...
+making set 2 ...
+$ ls 
+box.raw  coord.raw  energy.raw  force.raw  set.000  set.001  set.002  type.raw  virial.raw
+```
+It generates three sets `set.000`, `set.001` and `set.002`, with each set contains 2000 frames. One do not need to take care of the binary data files in each of the `set.*` directories. The path containing `set.*` and `type.raw` is called a *system*. 
diff --git a/doc/data-conv.md b/doc/data/dpdata.md
similarity index 51%
rename from doc/data-conv.md
rename to doc/data/dpdata.md
index 1dd1f5a2cb..2e540ea5c2 100644
--- a/doc/data-conv.md
+++ b/doc/data/dpdata.md
@@ -1,21 +1,6 @@
-# Data
+# Prepare data with dpdata
 
-
-In this example we will convert the DFT labeled data stored in VASP `OUTCAR` format into the data format used by DeePMD-kit. The example `OUTCAR` can be found in the directory. 
-```bash
-$deepmd_source_dir/examples/data_conv
-```
-
-
-## Definition
-
-The DeePMD-kit organize data in **`systems`**. Each `system` is composed by a number of **`frames`**. One may roughly view a `frame` as a snap short on an MD trajectory, but it does not necessary come from an MD simulation. A `frame` records the coordinates and types of atoms, cell vectors if the periodic boundary condition is assumed, energy, atomic forces and virial. It is noted that the `frames` in one `system` share the same number of atoms with the same type. 
-
-
-
-## Data conversion
-
-It is conveninent to use [dpdata](https://github.com/deepmodeling/dpdata) to convert data generated by DFT packages to the data format used by DeePMD-kit.
+One can use the a convenient tool [`dpdata`](https://github.com/deepmodeling/dpdata) to convert data directly from the output of first priciple packages to the DeePMD-kit format.
 
 To install one can execute 
 ```bash
diff --git a/doc/data/index.md b/doc/data/index.md
new file mode 100644
index 0000000000..d54f52cd8e
--- /dev/null
+++ b/doc/data/index.md
@@ -0,0 +1,8 @@
+# Data
+
+In this section, we will introduce how to convert the DFT labeled data into the data format used by DeePMD-kit.
+
+The DeePMD-kit organize data in `systems`. Each `system` is composed by a number of `frames`. One may roughly view a `frame` as a snap short on an MD trajectory, but it does not necessary come from an MD simulation. A `frame` records the coordinates and types of atoms, cell vectors if the periodic boundary condition is assumed, energy, atomic forces and virial. It is noted that the `frames` in one `system` share the same number of atoms with the same type.
+
+- [Data conversion](data-conv.md)
+- [Prepare data with dpdata](dpdata.md)
diff --git a/doc/data/index.rst b/doc/data/index.rst
new file mode 100644
index 0000000000..0631727546
--- /dev/null
+++ b/doc/data/index.rst
@@ -0,0 +1,11 @@
+Data
+====
+In this section, we will introduce how to convert the DFT labeled data into the data format used by DeePMD-kit.
+
+The DeePMD-kit organize data in :code:`systems`. Each :code:`system` is composed by a number of :code:`frames`. One may roughly view a :code:`frame` as a snap short on an MD trajectory, but it does not necessary come from an MD simulation. A :code:`frame` records the coordinates and types of atoms, cell vectors if the periodic boundary condition is assumed, energy, atomic forces and virial. It is noted that the :code:`frames` in one :code:`system` share the same number of atoms with the same type.
+
+.. toctree::
+   :maxdepth: 1
+
+   data-conv
+   dpdata
diff --git a/doc/api.rst b/doc/development/api.rst
similarity index 100%
rename from doc/api.rst
rename to doc/development/api.rst
diff --git a/doc/development/index.md b/doc/development/index.md
deleted file mode 100644
index 79fab0d980..0000000000
--- a/doc/development/index.md
+++ /dev/null
@@ -1,6 +0,0 @@
-# Developer Guide
-   
-- [Python API](../api.rst)
-- [C++ API](../API_CC/api_cc.rst)
-- [Coding Conventions](coding-conventions.rst)
-- [Atom Type Embedding](type-embedding.md)
diff --git a/doc/development/type-embedding.md b/doc/development/type-embedding.md
index 17c8a63ba5..928d2e513d 100644
--- a/doc/development/type-embedding.md
+++ b/doc/development/type-embedding.md
@@ -35,12 +35,12 @@ The difference between two variants above is whether using the information of ce
 ## How to use
 A detailed introduction can be found at [`se_e2_a_tebd`](../train-se-e2-a-tebd.md). Looking for a fast start up, you can simply add a `type_embedding` section in the input json file as displayed in the following, and the algorithm will adopt atom type embedding algorithm automatically.
 An example of `type_embedding` is like
-```json=
+```json
     "type_embedding":{
-                "neuron":Type[2, 4, 8],
-                        "resnet_dt":Atomfalse,
-                                "seed":Type1
-                                    }
+       "neuron":    [2, 4, 8],
+       "resnet_dt": false,
+       "seed":      1
+    }
 ```
 
 
diff --git a/doc/freeze/compress.md b/doc/freeze/compress.md
new file mode 100644
index 0000000000..6aeff8ab7f
--- /dev/null
+++ b/doc/freeze/compress.md
@@ -0,0 +1,81 @@
+# Compress a model
+
+Once the frozen model is obtained from deepmd-kit, we can get the neural network structure and its parameters (weights, biases, etc.) from the trained model, and compress it in the following way:
+```bash
+dp compress -i graph.pb -o graph-compress.pb
+```
+where `-i` gives the original frozen model, `-o` gives the compressed model. Several other command line options can be passed to `dp compress`, which can be checked with
+```bash
+$ dp compress --help
+```
+An explanation will be provided
+```
+usage: dp compress [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-l LOG_PATH]
+                   [-m {master,collect,workers}] [-i INPUT] [-o OUTPUT]
+                   [-s STEP] [-e EXTRAPOLATE] [-f FREQUENCY]
+                   [-c CHECKPOINT_FOLDER]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
+                        set verbosity level by string or number, 0=ERROR,
+                        1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
+  -l LOG_PATH, --log-path LOG_PATH
+                        set log file to log messages to disk, if not
+                        specified, the logs will only be output to console
+                        (default: None)
+  -m {master,collect,workers}, --mpi-log {master,collect,workers}
+                        Set the manner of logging when running with MPI.
+                        'master' logs only on main process, 'collect'
+                        broadcasts logs from workers to master and 'workers'
+                        means each process will output its own log (default:
+                        master)
+  -i INPUT, --input INPUT
+                        The original frozen model, which will be compressed by
+                        the code (default: frozen_model.pb)
+  -o OUTPUT, --output OUTPUT
+                        The compressed model (default:
+                        frozen_model_compressed.pb)
+  -s STEP, --step STEP  Model compression uses fifth-order polynomials to
+                        interpolate the embedding-net. It introduces two
+                        tables with different step size to store the
+                        parameters of the polynomials. The first table covers
+                        the range of the training data, while the second table
+                        is an extrapolation of the training data. The domain
+                        of each table is uniformly divided by a given step
+                        size. And the step(parameter) denotes the step size of
+                        the first table and the second table will use 10 *
+                        step as it's step size to save the memory. Usually the
+                        value ranges from 0.1 to 0.001. Smaller step means
+                        higher accuracy and bigger model size (default: 0.01)
+  -e EXTRAPOLATE, --extrapolate EXTRAPOLATE
+                        The domain range of the first table is automatically
+                        detected by the code: [d_low, d_up]. While the second
+                        table ranges from the first table's upper
+                        boundary(d_up) to the extrapolate(parameter) * d_up:
+                        [d_up, extrapolate * d_up] (default: 5)
+  -f FREQUENCY, --frequency FREQUENCY
+                        The frequency of tabulation overflow check(Whether the
+                        input environment matrix overflow the first or second
+                        table range). By default do not check the overflow
+                        (default: -1)
+  -c CHECKPOINT_FOLDER, --checkpoint-folder CHECKPOINT_FOLDER
+                        path to checkpoint folder (default: .)
+  -t TRAINING_SCRIPT, --training-script TRAINING_SCRIPT
+                        The training script of the input frozen model
+                        (default: None)
+```
+**Parameter explanation**
+
+Model compression, which including tabulating the embedding-net.
+The table is composed of fifth-order polynomial coefficients and is assembled from two sub-tables. The first sub-table takes the stride(parameter) as it's uniform stride, while the second sub-table takes 10 * stride as it's uniform stride.
+The range of the first table is automatically detected by deepmd-kit, while the second table ranges from the first table's upper boundary(upper) to the extrapolate(parameter) * upper.
+Finally, we added a check frequency parameter. It indicates how often the program checks for overflow(if the input environment matrix overflow the first or second table range) during the MD inference.
+
+**Justification of model compression**
+
+Model compression, with little loss of accuracy, can greatly speed up MD inference time. According to different simulation systems and training parameters, the speedup can reach more than 10 times at both CPU and GPU devices. At the same time, model compression can greatly change the memory usage, reducing as much as 20 times under the same hardware conditions.
+
+**Acceptable original model version**
+
+The model compression method requires that the version of DeePMD-kit used in original model generation should be 1.3 or above. If one has a frozen 1.2 model, one can first use the convenient conversion interface of DeePMD-kit-v1.2.4 to get a 1.3 executable model.(eg: ```dp convert-to-1.3 -i frozen_1.2.pb -o frozen_1.3.pb```) 
\ No newline at end of file
diff --git a/doc/freeze/freeze.md b/doc/freeze/freeze.md
new file mode 100644
index 0000000000..fdb2f2cc97
--- /dev/null
+++ b/doc/freeze/freeze.md
@@ -0,0 +1,7 @@
+# Freeze a model
+
+The trained neural network is extracted from a checkpoint and dumped into a database. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does
+```bash
+$ dp freeze -o graph.pb
+```
+in the folder where the model is trained. The output database is called `graph.pb`.
\ No newline at end of file
diff --git a/doc/freeze/index.md b/doc/freeze/index.md
new file mode 100644
index 0000000000..0bc3664144
--- /dev/null
+++ b/doc/freeze/index.md
@@ -0,0 +1,4 @@
+# Freeze and Compress
+
+- [Freeze a model](freeze.md)
+- [Compress a model](compress.md)
diff --git a/doc/freeze/index.rst b/doc/freeze/index.rst
new file mode 100644
index 0000000000..0a2f3df0f1
--- /dev/null
+++ b/doc/freeze/index.rst
@@ -0,0 +1,8 @@
+Freeze and Compress
+===================
+
+.. toctree::
+   :maxdepth: 1
+
+   freeze
+   compress
\ No newline at end of file
diff --git a/doc/getting-started.md b/doc/getting-started.md
deleted file mode 100644
index 4b0e0a3442..0000000000
--- a/doc/getting-started.md
+++ /dev/null
@@ -1,582 +0,0 @@
-# Getting Started
-In this text, we will call the deep neural network that is used to represent the interatomic interactions (Deep Potential) the **model**. The typical procedure of using DeePMD-kit is 
-
-1. [Prepare data](#prepare-data)
-2. [Train a model](#train-a-model)
-    - [Write the input script](#write-the-input-script)
-    - [Training](#training)
-    - [Parallel training](#parallel-training)
-    - [Training analysis with Tensorboard](#training-analysis-with-tensorboard)
-3. [Freeze a model](#freeze-a-model)
-4. [Test a model](#test-a-model)
-5. [Compress a model](#compress-a-model)
-6. [Model inference](#model-inference)
-    - [Python interface](#python-interface)
-    - [C++ interface](#c-interface)
-7. [Run MD](#run-md)
-    - [Run MD with LAMMPS](#run-md-with-lammps)
-    - [Run path-integral MD with i-PI](#run-path-integral-md-with-i-pi)
-    - [Use deep potential with ASE](#use-deep-potential-with-ase)
-8. [Known limitations](#known-limitations)
-
-
-## Prepare data
-One needs to provide the following information to train a model: the atom type, the simulation box, the atom coordinate, the atom force, system energy and virial. A snapshot of a system that contains these information is called a **frame**. We use the following convention of units:
-
-
-Property | Unit 
----|---
-Time     | ps   
-Length   | Å    
-Energy   | eV   
-Force    | eV/Å 
-Virial   | eV   
-Pressure | Bar  
-
-
-The frames of the system are stored in two formats. A raw file is a plain text file with each information item written in one file and one frame written on one line. The default files that provide box, coordinate, force, energy and virial are `box.raw`, `coord.raw`, `force.raw`, `energy.raw` and `virial.raw`, respectively. *We recommend you use these file names*. Here is an example of force.raw:
-```bash
-$ cat force.raw
--0.724  2.039 -0.951  0.841 -0.464  0.363
- 6.737  1.554 -5.587 -2.803  0.062  2.222
--1.968 -0.163  1.020 -0.225 -0.789  0.343
-```
-This `force.raw` contains 3 frames with each frame having the forces of 2 atoms, thus it has 3 lines and 6 columns. Each line provides all the 3 force components of 2 atoms in 1 frame. The first three numbers are the 3 force components of the first atom, while the second three numbers are the 3 force components of the second atom. The coordinate file `coord.raw` is organized similarly. In `box.raw`, the 9 components of the box vectors should be provided on each line. In `virial.raw`, the 9 components of the virial tensor should be provided on each line in the order `XX XY XZ YX YY YZ ZX ZY ZZ`. The number of lines of all raw files should be identical.
-
-We assume that the atom types do not change in all frames. It is provided by `type.raw`, which has one line with the types of atoms written one by one. The atom types should be integers. For example the `type.raw` of a system that has 2 atoms with 0 and 1:
-```bash
-$ cat type.raw
-0 1
-```
-
-Sometimes one needs to map the integer types to atom name. The mapping can be given by the file `type_map.raw`. For example
-```bash
-$ cat type_map.raw
-O H
-```
-The type `0` is named by `"O"` and the type `1` is named by `"H"`.
-
-The second format is the data sets of `numpy` binary data that are directly used by the training program. User can use the script `$deepmd_source_dir/data/raw/raw_to_set.sh` to convert the prepared raw files to data sets. For example, if we have a raw file that contains 6000 frames, 
-```bash
-$ ls 
-box.raw  coord.raw  energy.raw  force.raw  type.raw  virial.raw
-$ $deepmd_source_dir/data/raw/raw_to_set.sh 2000
-nframe is 6000
-nline per set is 2000
-will make 3 sets
-making set 0 ...
-making set 1 ...
-making set 2 ...
-$ ls 
-box.raw  coord.raw  energy.raw  force.raw  set.000  set.001  set.002  type.raw  virial.raw
-```
-It generates three sets `set.000`, `set.001` and `set.002`, with each set contains 2000 frames. One do not need to take care of the binary data files in each of the `set.*` directories. The path containing `set.*` and `type.raw` is called a *system*. 
-
-### Data preparation with dpdata
-
-One can use the a convenient tool `dpdata` to convert data directly from the output of first priciple packages to the DeePMD-kit format. One may follow the [example](data-conv.md) of using `dpdata` to find out how to use it.
-
-## Train a model
-
-### Write the input script
-
-A model has two parts, a descriptor that maps atomic configuration to a set of symmetry invariant features, and a fitting net that takes descriptor as input and predicts the atomic contribution to the target physical property.
-
-DeePMD-kit implements the following descriptors:
-1. [`se_e2_a`](train-se-e2-a.md#descriptor): DeepPot-SE constructed from all information (both angular and radial) of atomic configurations. The embedding takes the distance between atoms as input.
-2. [`se_e2_r`](train-se-e2-r.md): DeepPot-SE constructed from radial information of atomic configurations. The embedding takes the distance between atoms as input.
-3. [`se_e3`](train-se-e3.md): DeepPot-SE constructed from all information (both angular and radial) of atomic configurations. The embedding takes angles between two neighboring atoms as input.
-4. `loc_frame`: Defines a local frame at each atom, and the compute the descriptor as local coordinates under this frame.
-5. [`hybrid`](train-hybrid.md): Concate a list of descriptors to form a new descriptor.
-
-The fitting of the following physical properties are supported
-1. [`ener`](train-se-e2-a.md#fitting): Fitting the energy of the system. The force (derivative with atom positions) and the virial (derivative with the box tensor) can also be trained. See [the example](train-se-e2-a.md#loss).
-2. `dipole`: The dipole moment.
-3. `polar`: The polarizability.
-
-
-### Training
-
-The training can be invoked by
-```bash
-$ dp train input.json
-```
-where `input.json` is the name of the input script. See [the example](train-se-e2-a.md#train-a-deep-potential-model) for more details.
-
-During the training, checkpoints will be written to files with prefix `save_ckpt` every `save_freq` training steps. 
-
-Several command line options can be passed to `dp train`, which can be checked with
-```bash
-$ dp train --help
-```
-An explanation will be provided
-```
-positional arguments:
-  INPUT                 the input json database
-
-optional arguments:
-  -h, --help            show this help message and exit
-  --init-model INIT_MODEL
-                        Initialize a model by the provided checkpoint
-  --restart RESTART     Restart the training from the provided checkpoint
-```
-
-**`--init-model model.ckpt`**, initializes the model training with an existing model that is stored in the checkpoint `model.ckpt`, the network architectures should match.
-
-**`--restart model.ckpt`**, continues the training from the checkpoint `model.ckpt`.
-
-On some resources limited machines, one may want to control the number of threads used by DeePMD-kit. This is achieved by three environmental variables: `OMP_NUM_THREADS`, `TF_INTRA_OP_PARALLELISM_THREADS` and `TF_INTER_OP_PARALLELISM_THREADS`. `OMP_NUM_THREADS` controls the multithreading of DeePMD-kit implemented operations. `TF_INTRA_OP_PARALLELISM_THREADS` and `TF_INTER_OP_PARALLELISM_THREADS` controls `intra_op_parallelism_threads` and `inter_op_parallelism_threads`, which are  Tensorflow configurations for multithreading. An explanation is found [here](https://stackoverflow.com/questions/41233635/meaning-of-inter-op-parallelism-threads-and-intra-op-parallelism-threads).
-
-For example if you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:
-```bash
-export OMP_NUM_THREADS=6
-export TF_INTRA_OP_PARALLELISM_THREADS=3
-export TF_INTER_OP_PARALLELISM_THREADS=2
-dp train input.json
-```
-
-One can set other environmental variables:
-
-| Environment variables | Allowed value          | Default value | Usage                      |
-| --------------------- | ---------------------- | ------------- | -------------------------- |
-| DP_INTERFACE_PREC     | `high`, `low`          | `high`        | Control high (double) or low (float) precision of training. |
-
-
-### Parallel training
-
-Currently, parallel training is enabled in a sychoronized way with help of [Horovod](https://github.com/horovod/horovod). DeePMD-kit will decide parallel training or not according to MPI context. Thus, there is no difference in your json/yaml input file.
-
-Testing `examples/water/se_e2_a` on a 8-GPU host, linear acceleration can be observed with increasing number of cards.
-| Num of GPU cards | Seconds every 100 samples | Samples per second | Speed up |
-|  --  | -- | -- | -- |
-| 1  | 1.6116 | 62.05 | 1.00 |
-| 2  | 1.6310 | 61.31 | 1.98 |
-| 4  | 1.6168 | 61.85 | 3.99 |
-| 8  | 1.6212 | 61.68 | 7.95 |
-
-To experience this powerful feature, please intall Horovod and [mpi4py](https://github.com/mpi4py/mpi4py) first. For better performance on GPU, please follow tuning steps in [Horovod on GPU](https://github.com/horovod/horovod/blob/master/docs/gpus.rst).
-```bash
-# By default, MPI is used as communicator.
-HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod mpi4py
-```
-
-Horovod works in the data-parallel mode resulting a larger global batch size. For example, the real batch size is 8 when `batch_size` is set to 2 in the input file and you lauch 4 workers. Thus, `learning_rate` is automatically scaled by the number of workers for better convergence. Technical details of such heuristic rule are discussed at [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).
-
-With dependencies installed, have a quick try!
-```bash
-# Launch 4 processes on the same host
-CUDA_VISIBLE_DEVICES=4,5,6,7 horovodrun -np 4 \
-    dp train --mpi-log=workers input.json
-```
-
-Need to mention, environment variable `CUDA_VISIBLE_DEVICES` must be set to control parallelism on the occupied host where one process is bound to one GPU card.
-
-What's more, 2 command-line arguments are defined to control the logging behvaior.
-```
-optional arguments:
-  -l LOG_PATH, --log-path LOG_PATH
-                        set log file to log messages to disk, if not
-                        specified, the logs will only be output to console
-                        (default: None)
-  -m {master,collect,workers}, --mpi-log {master,collect,workers}
-                        Set the manner of logging when running with MPI.
-                        'master' logs only on main process, 'collect'
-                        broadcasts logs from workers to master and 'workers'
-                        means each process will output its own log (default:
-                        master)
-```
-
-### Training analysis with Tensorboard
-
-If enbled in json/yaml input file DeePMD-kit will create log files which can be
-used to analyze training procedure with Tensorboard. For a short tutorial
-please read this [document](tensorboard.md).
-
-## Freeze a model
-
-The trained neural network is extracted from a checkpoint and dumped into a database. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does
-```bash
-$ dp freeze -o graph.pb
-```
-in the folder where the model is trained. The output database is called `graph.pb`.
-
-
-## Test a model
-
-The frozen model can be used in many ways. The most straightforward test can be performed using `dp test`. A typical usage of `dp test` is 
-```bash
-dp test -m graph.pb -s /path/to/system -n 30
-```
-where `-m` gives the tested model, `-s` the path to the tested system and `-n` the number of tested frames. Several other command line options can be passed to `dp test`, which can be checked with
-```bash
-$ dp test --help
-```
-An explanation will be provided
-```
-usage: dp test [-h] [-m MODEL] [-s SYSTEM] [-S SET_PREFIX] [-n NUMB_TEST]
-               [-r RAND_SEED] [--shuffle-test] [-d DETAIL_FILE]
-
-optional arguments:
-  -h, --help            show this help message and exit
-  -m MODEL, --model MODEL
-                        Frozen model file to import
-  -s SYSTEM, --system SYSTEM
-                        The system dir
-  -S SET_PREFIX, --set-prefix SET_PREFIX
-                        The set prefix
-  -n NUMB_TEST, --numb-test NUMB_TEST
-                        The number of data for test
-  -r RAND_SEED, --rand-seed RAND_SEED
-                        The random seed
-  --shuffle-test        Shuffle test data
-  -d DETAIL_FILE, --detail-file DETAIL_FILE
-                        The file containing details of energy force and virial
-                        accuracy
-```
-
-### Calculate Model Deviation
-
-One can also use a subcommand to calculate deviation of prediced forces or virials for a bunch of models in the following way:
-```bash
-dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out
-```
-where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command:
-```bash
-usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]
-                     [-l LOG_PATH] [-m MODELS [MODELS ...]] [-s SYSTEM]
-                     [-S SET_PREFIX] [-o OUTPUT] [-f FREQUENCY] [-i ITEMS]
-
-optional arguments:
-  -h, --help            show this help message and exit
-  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
-                        set verbosity level by string or number, 0=ERROR,
-                        1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
-  -l LOG_PATH, --log-path LOG_PATH
-                        set log file to log messages to disk, if not
-                        specified, the logs will only be output to console
-                        (default: None)
-  -m MODELS [MODELS ...], --models MODELS [MODELS ...]
-                        Frozen models file to import (default:
-                        ['graph.000.pb', 'graph.001.pb', 'graph.002.pb',
-                        'graph.003.pb'])
-  -s SYSTEM, --system SYSTEM
-                        The system directory, not support recursive detection.
-                        (default: .)
-  -S SET_PREFIX, --set-prefix SET_PREFIX
-                        The set prefix (default: set)
-  -o OUTPUT, --output OUTPUT
-                        The output file for results of model deviation
-                        (default: model_devi.out)
-  -f FREQUENCY, --frequency FREQUENCY
-                        The trajectory frequency of the system (default: 1)
-```
-
-For more details with respect to definition of model deviation and its application, please refer to `Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 253, 107206.`
-
-## Compress a model
-
-Once the frozen model is obtained from deepmd-kit, we can get the neural network structure and its parameters (weights, biases, etc.) from the trained model, and compress it in the following way:
-```bash
-dp compress -i graph.pb -o graph-compress.pb
-```
-where `-i` gives the original frozen model, `-o` gives the compressed model. Several other command line options can be passed to `dp compress`, which can be checked with
-```bash
-$ dp compress --help
-```
-An explanation will be provided
-```
-usage: dp compress [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-l LOG_PATH]
-                   [-m {master,collect,workers}] [-i INPUT] [-o OUTPUT]
-                   [-s STEP] [-e EXTRAPOLATE] [-f FREQUENCY]
-                   [-c CHECKPOINT_FOLDER]
-
-optional arguments:
-  -h, --help            show this help message and exit
-  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
-                        set verbosity level by string or number, 0=ERROR,
-                        1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
-  -l LOG_PATH, --log-path LOG_PATH
-                        set log file to log messages to disk, if not
-                        specified, the logs will only be output to console
-                        (default: None)
-  -m {master,collect,workers}, --mpi-log {master,collect,workers}
-                        Set the manner of logging when running with MPI.
-                        'master' logs only on main process, 'collect'
-                        broadcasts logs from workers to master and 'workers'
-                        means each process will output its own log (default:
-                        master)
-  -i INPUT, --input INPUT
-                        The original frozen model, which will be compressed by
-                        the code (default: frozen_model.pb)
-  -o OUTPUT, --output OUTPUT
-                        The compressed model (default:
-                        frozen_model_compressed.pb)
-  -s STEP, --step STEP  Model compression uses fifth-order polynomials to
-                        interpolate the embedding-net. It introduces two
-                        tables with different step size to store the
-                        parameters of the polynomials. The first table covers
-                        the range of the training data, while the second table
-                        is an extrapolation of the training data. The domain
-                        of each table is uniformly divided by a given step
-                        size. And the step(parameter) denotes the step size of
-                        the first table and the second table will use 10 *
-                        step as it's step size to save the memory. Usually the
-                        value ranges from 0.1 to 0.001. Smaller step means
-                        higher accuracy and bigger model size (default: 0.01)
-  -e EXTRAPOLATE, --extrapolate EXTRAPOLATE
-                        The domain range of the first table is automatically
-                        detected by the code: [d_low, d_up]. While the second
-                        table ranges from the first table's upper
-                        boundary(d_up) to the extrapolate(parameter) * d_up:
-                        [d_up, extrapolate * d_up] (default: 5)
-  -f FREQUENCY, --frequency FREQUENCY
-                        The frequency of tabulation overflow check(Whether the
-                        input environment matrix overflow the first or second
-                        table range). By default do not check the overflow
-                        (default: -1)
-  -c CHECKPOINT_FOLDER, --checkpoint-folder CHECKPOINT_FOLDER
-                        path to checkpoint folder (default: .)
-  -t TRAINING_SCRIPT, --training-script TRAINING_SCRIPT
-                        The training script of the input frozen model
-                        (default: None)
-```
-**Parameter explanation**
-
-Model compression, which including tabulating the embedding-net.
-The table is composed of fifth-order polynomial coefficients and is assembled from two sub-tables. The first sub-table takes the stride(parameter) as it's uniform stride, while the second sub-table takes 10 * stride as it's uniform stride.
-The range of the first table is automatically detected by deepmd-kit, while the second table ranges from the first table's upper boundary(upper) to the extrapolate(parameter) * upper.
-Finally, we added a check frequency parameter. It indicates how often the program checks for overflow(if the input environment matrix overflow the first or second table range) during the MD inference.
-
-**Justification of model compression**
-
-Model compression, with little loss of accuracy, can greatly speed up MD inference time. According to different simulation systems and training parameters, the speedup can reach more than 10 times at both CPU and GPU devices. At the same time, model compression can greatly change the memory usage, reducing as much as 20 times under the same hardware conditions.
-
-**Acceptable original model version**
-
-The model compression method requires that the version of DeePMD-kit used in original model generation should be 1.3 or above. If one has a frozen 1.2 model, one can first use the convenient conversion interface of DeePMD-kit-v1.2.4 to get a 1.3 executable model.(eg: ```dp convert-to-1.3 -i frozen_1.2.pb -o frozen_1.3.pb```) 
-
-## Model inference 
-
-Note that the model for inference is required to be compatible with the DeePMD-kit package. See [Model compatibility](troubleshooting/model-compatability.md) for details. 
-
-### Python interface
-One may use the python interface of DeePMD-kit for model inference, an example is given as follows
-```python
-from deepmd.infer import DeepPot
-import numpy as np
-dp = DeepPot('graph.pb')
-coord = np.array([[1,0,0], [0,0,1.5], [1,0,3]]).reshape([1, -1])
-cell = np.diag(10 * np.ones(3)).reshape([1, -1])
-atype = [1,0,1]
-e, f, v = dp.eval(coord, cell, atype)
-```
-where `e`, `f` and `v` are predicted energy, force and virial of the system, respectively.
-
-Furthermore, one can use the python interface to calulate model deviation.
-```python
-from deepmd.infer import calc_model_devi
-from deepmd.infer import DeepPot as DP
-import numpy as np
-
-coord = np.array([[1,0,0], [0,0,1.5], [1,0,3]]).reshape([1, -1])
-cell = np.diag(10 * np.ones(3)).reshape([1, -1])
-atype = [1,0,1]
-graphs = [DP("graph.000.pb"), DP("graph.001.pb")]
-model_devi = calc_model_devi(coord, cell, atype, graphs)
-```
-
-### C++ interface
-The C++ interface of DeePMD-kit is also avaiable for model interface, which is considered faster than Python interface. An example `infer_water.cpp` is given below:
-```cpp
-#include "deepmd/DeepPot.h"
-
-int main(){
-  deepmd::DeepPot dp ("graph.pb");
-  std::vector<double > coord = {1., 0., 0., 0., 0., 1.5, 1. ,0. ,3.};
-  std::vector<double > cell = {10., 0., 0., 0., 10., 0., 0., 0., 10.};
-  std::vector<int > atype = {1, 0, 1};
-  double e;
-  std::vector<double > f, v;
-  dp.compute (e, f, v, coord, atype, cell);
-}
-```
-where `e`, `f` and `v` are predicted energy, force and virial of the system, respectively.
-
-You can compile `infer_water.cpp` using `gcc`:
-```sh
-gcc infer_water.cpp -D HIGH_PREC -L $deepmd_root/lib -L $tensorflow_root/lib -I $deepmd_root/include -I $tensorflow_root/include -Wl,--no-as-needed -ldeepmd_cc -lstdc++ -Wl,-rpath=$deepmd_root/lib -Wl,-rpath=$tensorflow_root/lib -o infer_water
-```
-and then run the program:
-```sh
-./infer_water
-```
-
-## Run MD
-
-Note that the model for MD simulations is required to be compatible with the DeePMD-kit package. See [Model compatibility](troubleshooting/model-compatability.md) for details. 
-
-### Run MD with LAMMPS
-
-#### Enable DeePMD-kit plugin (plugin mode)
-
-If you are using the plugin mode, enable DeePMD-kit package in LAMMPS with `plugin` command:
-
-```
-plugin load path/to/deepmd/lib/libdeepmd_lmp.so
-```
-
-The built-in mode doesn't need this step.
-
-#### pair_style `deepmd`
-
-The DeePMD-kit package provides the pair_style `deepmd`
-
-```
-pair_style deepmd models ... keyword value ...
-```
-- deepmd = style of this pair_style
-- models = frozen model(s) to compute the interaction. If multiple models are provided, then the model deviation will be computed
-- keyword = *out_file* or *out_freq* or *fparam* or *atomic* or *relative*
-<pre>
-    <i>out_file</i> value = filename
-        filename = The file name for the model deviation output. Default is model_devi.out
-    <i>out_freq</i> value = freq
-        freq = Frequency for the model deviation output. Default is 100.
-    <i>fparam</i> value = parameters
-        parameters = one or more frame parameters required for model evaluation.
-    <i>atomic</i> = no value is required. 
-        If this keyword is set, the model deviation of each atom will be output.
-    <i>relative</i> value = level
-        level = The level parameter for computing the relative model deviation
-</pre>
-
-##### Examples
-```
-pair_style deepmd graph.pb
-pair_style deepmd graph.pb fparam 1.2
-pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0
-```
-
-##### Description
-Evaluate the interaction of the system by using [Deep Potential][DP] or [Deep Potential Smooth Edition][DP-SE]. It is noticed that deep potential is not a "pairwise" interaction, but a multi-body interaction. 
-
-This pair style takes the deep potential defined in a model file that usually has the .pb extension. The model can be trained and frozen by package [DeePMD-kit](https://github.com/deepmodeling/deepmd-kit).
-
-The model deviation evalulate the consistency of the force predictions from multiple models. By default, only the maximal, minimal and averge model deviations are output. If the key `atomic` is set, then the model deviation of force prediction of each atom will be output.
-
-By default, the model deviation is output in absolute value. If the keyword `relative` is set, then the relative model deviation will be output. The relative model deviation of the force on atom `i` is defined by
-```math
-           |Df_i|
-Ef_i = -------------
-       |f_i| + level
-```
-where `Df_i` is the absolute model deviation of the force on atom `i`, `|f_i|` is the norm of the the force and `level` is provided as the parameter of the keyword `relative`.
-
-##### Restrictions
-- The `deepmd` pair style is provided in the USER-DEEPMD package, which is compiled from the DeePMD-kit, visit the [DeePMD-kit website](https://github.com/deepmodeling/deepmd-kit) for more information.
-
-
-#### Compute tensorial prperties
-
-The DeePMD-kit package provide the compute `deeptensor/atom` for computing atomic tensorial properties. 
-
-```
-compute ID group-ID deeptensor/atom model_file
-```
-- ID: user-assigned name of the computation
-- group-ID: ID of the group of atoms to compute
-- deeptensor/atom: the style of this compute
-- model_file: the name of the binary model file.
-
-##### Examples
-```
-compute         dipole all deeptensor/atom dipole.pb
-```
-The result of the compute can be dump to trajctory file by 
-```
-dump            1 all custom 100 water.dump id type c_dipole[1] c_dipole[2] c_dipole[3] 
-```
-
-##### Restrictions
-- The `deeptensor/atom` compute is provided in the USER-DEEPMD package, which is compiled from the DeePMD-kit, visit the [DeePMD-kit website](https://github.com/deepmodeling/deepmd-kit) for more information.
-
-
-#### Long-range interaction
-The reciprocal space part of the long-range interaction can be calculated by LAMMPS command `kspace_style`. To use it with DeePMD-kit, one writes 
-```bash
-pair_style	deepmd graph.pb
-pair_coeff
-kspace_style	pppm 1.0e-5
-kspace_modify	gewald 0.45
-```
-Please notice that the DeePMD does nothing to the direct space part of the electrostatic interaction, because this part is assumed to be fitted in the DeePMD model (the direct space cut-off is thus the cut-off of the DeePMD model). The splitting parameter `gewald` is modified by the `kspace_modify` command.
-
-### Run path-integral MD with i-PI
-The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` (or `dp_ipi_low` for low precision) that computes the interactions (including energy, force and virial). The server and client communicates via the Unix domain socket or the Internet socket. Installation instructions of i-PI can be found [here](install.md#install-i-pi). The client can be started by
-```bash
-i-pi input.xml &
-dp_ipi water.json
-```
-It is noted that multiple instances of the client is allow for computing, in parallel, the interactions of multiple replica of the path-integral MD.
-
-`water.json` is the parameter file for the client `dp_ipi`, and an example is provided:
-```json
-{
-    "verbose":		false,
-    "use_unix":		true,
-    "port":		31415,
-    "host":		"localhost",
-    "graph_file":	"graph.pb",
-    "coord_file":	"conf.xyz",
-    "atom_type" : {
-	"OW":		0, 
-	"HW1":		1,
-	"HW2":		1
-    }
-}
-```
-The option **`use_unix`** is set to `true` to activate the Unix domain socket, otherwise, the Internet socket is used.
-
-The option **`port`** should be the same as that in input.xml:
-```xml
-<port>31415</port>
-```
-
-The option **`graph_file`** provides the file name of the frozen model.
-
-The `dp_ipi` gets the atom names from an [XYZ file](https://en.wikipedia.org/wiki/XYZ_file_format) provided by **`coord_file`** (meanwhile ignores all coordinates in it), and translates the names to atom types by rules provided by **`atom_type`**.
-
-### Use deep potential with ASE
-
-Deep potential can be set up as a calculator with ASE to obtain potential energies and forces.
-```python
-from ase import Atoms
-from deepmd.calculator import DP
-
-water = Atoms('H2O',
-              positions=[(0.7601, 1.9270, 1),
-                         (1.9575, 1, 1),
-                         (1., 1., 1.)],
-              cell=[100, 100, 100],
-              calculator=DP(model="frozen_model.pb"))
-print(water.get_potential_energy())
-print(water.get_forces())
-```
-
-Optimization is also available:
-```python
-from ase.optimize import BFGS
-dyn = BFGS(water)
-dyn.run(fmax=1e-6)
-print(water.get_positions())
-```
-
-## Known limitations
-If you use deepmd-kit in a GPU environment, the acceptable value range of some variables are additionally restricted compared to the CPU environment due to the software's GPU implementations: 
-1. The number of atom type of a given system must be less than 128.
-2. The maximum distance between an atom and it's neighbors must be less than 128. It can be controlled by setting the rcut value of training parameters.
-3. Theoretically, the maximum number of atoms that a single GPU can accept is about 10,000,000. However, this value is actually limited by the GPU memory size currently, usually within 1000,000 atoms even at the model compression mode.
-4. The total sel value of training parameters(in model/descriptor section) must be less than 4096.
-
-[DP]:https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.143001
-[DP-SE]:https://dl.acm.org/doi/10.5555/3327345.3327356
diff --git a/doc/getting-started/data.md b/doc/getting-started/data.md
new file mode 120000
index 0000000000..917d9dc750
--- /dev/null
+++ b/doc/getting-started/data.md
@@ -0,0 +1 @@
+../data/dpdata.md
\ No newline at end of file
diff --git a/doc/getting-started/freeze.md b/doc/getting-started/freeze.md
new file mode 120000
index 0000000000..fdcca573e1
--- /dev/null
+++ b/doc/getting-started/freeze.md
@@ -0,0 +1 @@
+../freeze/freeze.md
\ No newline at end of file
diff --git a/doc/getting-started/index.rst b/doc/getting-started/index.rst
new file mode 100644
index 0000000000..d5d5651003
--- /dev/null
+++ b/doc/getting-started/index.rst
@@ -0,0 +1,15 @@
+Getting Started
+===============
+
+In this text, we will call the deep neural network that is used to represent the interatomic interactions (Deep Potential) the model. The typical procedure of using DeePMD-kit is
+
+.. toctree::
+   :maxdepth: 1
+   :numbered:
+
+   install
+   data
+   training
+   freeze
+   test
+   lammps
\ No newline at end of file
diff --git a/doc/getting-started/install.md b/doc/getting-started/install.md
new file mode 120000
index 0000000000..01e3e51fa2
--- /dev/null
+++ b/doc/getting-started/install.md
@@ -0,0 +1 @@
+../install/easy-install.md
\ No newline at end of file
diff --git a/doc/getting-started/lammps.md b/doc/getting-started/lammps.md
new file mode 120000
index 0000000000..4e077c20d1
--- /dev/null
+++ b/doc/getting-started/lammps.md
@@ -0,0 +1 @@
+../third-party/lammps.md
\ No newline at end of file
diff --git a/doc/getting-started/test.md b/doc/getting-started/test.md
new file mode 120000
index 0000000000..036e05fe94
--- /dev/null
+++ b/doc/getting-started/test.md
@@ -0,0 +1 @@
+../test/test.md
\ No newline at end of file
diff --git a/doc/getting-started/training.md b/doc/getting-started/training.md
new file mode 120000
index 0000000000..ebf38300a6
--- /dev/null
+++ b/doc/getting-started/training.md
@@ -0,0 +1 @@
+../train/training.md
\ No newline at end of file
diff --git a/doc/index.rst b/doc/index.rst
index f38e0c6689..7cd8968f01 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -11,41 +11,46 @@ DeePMD-kit is a package written in Python/C++, designed to minimize the effort r
 
 .. Important:: The project DeePMD-kit is licensed under `GNU LGPLv3.0 <https://github.com/deepmodeling/deepmd-kit/blob/master/LICENSE>`_. If you use this code in any future publications, please cite this using *Han Wang, Linfeng Zhang, Jiequn Han, and Weinan E. "DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics." Computer Physics Communications 228 (2018): 178-184.*
 
-.. _user-guide:
+.. _getting-started:
 
 .. toctree::
-   :maxdepth: 2
-   :caption: User Guide
+   :maxdepth: 3
+   :caption: Getting Started
    
-   install
-   getting-started
-   tensorboard
-   troubleshooting/index
-
+   getting-started/index
 
-.. _data-and-parameters:
+.. _advanced:
 
 .. toctree::
-   :maxdepth: 2
-   :caption: Data and Parameters
-
-   data-conv
-   train-input
-
+   :maxdepth: 3
+   :numbered:
+   :caption: Advanced
+
+   install/index
+   data/index
+   model/index
+   train/index
+   freeze/index
+   test/index
+   inference/index
+   third-party/index
+   troubleshooting/index
 
 .. _developer-guide:
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
    :caption: Developer Guide
-   
-   development/index
+   :glob:
+
+   development/*
+   API_CC/api_cc
 
 
 .. _project-details:
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
    :caption: Project Details
 
    license
diff --git a/doc/inference/cxx.md b/doc/inference/cxx.md
new file mode 100644
index 0000000000..201341d416
--- /dev/null
+++ b/doc/inference/cxx.md
@@ -0,0 +1,25 @@
+# C++ interface
+The C++ interface of DeePMD-kit is also avaiable for model interface, which is considered faster than Python interface. An example `infer_water.cpp` is given below:
+```cpp
+#include "deepmd/DeepPot.h"
+
+int main(){
+  deepmd::DeepPot dp ("graph.pb");
+  std::vector<double > coord = {1., 0., 0., 0., 0., 1.5, 1. ,0. ,3.};
+  std::vector<double > cell = {10., 0., 0., 0., 10., 0., 0., 0., 10.};
+  std::vector<int > atype = {1, 0, 1};
+  double e;
+  std::vector<double > f, v;
+  dp.compute (e, f, v, coord, atype, cell);
+}
+```
+where `e`, `f` and `v` are predicted energy, force and virial of the system, respectively.
+
+You can compile `infer_water.cpp` using `gcc`:
+```sh
+gcc infer_water.cpp -D HIGH_PREC -L $deepmd_root/lib -L $tensorflow_root/lib -I $deepmd_root/include -I $tensorflow_root/include -Wl,--no-as-needed -ldeepmd_cc -lstdc++ -Wl,-rpath=$deepmd_root/lib -Wl,-rpath=$tensorflow_root/lib -o infer_water
+```
+and then run the program:
+```sh
+./infer_water
+```
\ No newline at end of file
diff --git a/doc/inference/index.md b/doc/inference/index.md
new file mode 100644
index 0000000000..bf0bf54a3e
--- /dev/null
+++ b/doc/inference/index.md
@@ -0,0 +1,6 @@
+# Inference
+
+Note that the model for inference is required to be compatible with the DeePMD-kit package. See [Model compatibility](../troubleshooting/model-compatability.html) for details.
+
+- [Python interface](python.md)
+- [C++ interface](cxx.md)
\ No newline at end of file
diff --git a/doc/inference/index.rst b/doc/inference/index.rst
new file mode 100644
index 0000000000..5b591e309c
--- /dev/null
+++ b/doc/inference/index.rst
@@ -0,0 +1,10 @@
+Inference
+=========
+
+Note that the model for inference is required to be compatible with the DeePMD-kit package. See `Model compatibility <../troubleshooting/model-compatability.html>`_ for details. 
+
+.. toctree::
+   :maxdepth: 1
+
+   python
+   cxx
diff --git a/doc/inference/python.md b/doc/inference/python.md
new file mode 100644
index 0000000000..bdf9074a86
--- /dev/null
+++ b/doc/inference/python.md
@@ -0,0 +1,26 @@
+# Python interface
+
+One may use the python interface of DeePMD-kit for model inference, an example is given as follows
+```python
+from deepmd.infer import DeepPot
+import numpy as np
+dp = DeepPot('graph.pb')
+coord = np.array([[1,0,0], [0,0,1.5], [1,0,3]]).reshape([1, -1])
+cell = np.diag(10 * np.ones(3)).reshape([1, -1])
+atype = [1,0,1]
+e, f, v = dp.eval(coord, cell, atype)
+```
+where `e`, `f` and `v` are predicted energy, force and virial of the system, respectively.
+
+Furthermore, one can use the python interface to calulate model deviation.
+```python
+from deepmd.infer import calc_model_devi
+from deepmd.infer import DeepPot as DP
+import numpy as np
+
+coord = np.array([[1,0,0], [0,0,1.5], [1,0,3]]).reshape([1, -1])
+cell = np.diag(10 * np.ones(3)).reshape([1, -1])
+atype = [1,0,1]
+graphs = [DP("graph.000.pb"), DP("graph.001.pb")]
+model_devi = calc_model_devi(coord, cell, atype, graphs)
+```
\ No newline at end of file
diff --git a/doc/install.md b/doc/install.md
deleted file mode 100644
index 4d8168d945..0000000000
--- a/doc/install.md
+++ /dev/null
@@ -1,319 +0,0 @@
-# Installation
-
-- [Easy installation methods](#easy-installation-methods)
-- [Install from source code](#install-from-source-code)
-- [Install third-party packages](#install-third-party-packages)
-- [Building conda packages](#building-conda-packages)
-
-## Easy installation methods
-
-There various easy methods to install DeePMD-kit. Choose one that you prefer. If you want to build by yourself, jump to the next two sections.
-
-After your easy installation, DeePMD-kit (`dp`) and LAMMPS (`lmp`) will be available to execute. You can try `dp -h` and `lmp -h` to see the help. `mpirun` is also available considering you may want to run LAMMPS in parallel.
-
-- [Install off-line packages](#install-off-line-packages)
-- [Install with conda](#install-with-conda)
-- [Install with docker](#install-with-docker)
-
-
-### Install off-line packages
-Both CPU and GPU version offline packages are avaiable in [the Releases page](https://github.com/deepmodeling/deepmd-kit/releases).
-
-Some packages are splited into two files due to size limit of GitHub. One may merge them into one after downloading:
-```bash
-cat deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh.0 deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh.1 > deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh
-```
-
-### Install with conda
-DeePMD-kit is avaiable with [conda](https://github.com/conda/conda). Install [Anaconda](https://www.anaconda.com/distribution/#download-section) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) first.
-
-One may create an environment that contains the CPU version of DeePMD-kit and LAMMPS:
-```bash
-conda create -n deepmd deepmd-kit=*=*cpu libdeepmd=*=*cpu lammps-dp -c https://conda.deepmodeling.org
-```
-
-Or one may want to create a GPU environment containing [CUDA Toolkit](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver):
-```bash
-conda create -n deepmd deepmd-kit=*=*gpu libdeepmd=*=*gpu lammps-dp cudatoolkit=11.3 -c https://conda.deepmodeling.org
-```
-One could change the CUDA Toolkit version from `10.1` or `11.3`.
-
-One may speficy the DeePMD-kit version such as `2.0.0` using
-```bash
-conda create -n deepmd deepmd-kit=2.0.0=*cpu libdeepmd=2.0.0=*cpu lammps-dp=2.0.0 -c https://conda.deepmodeling.org
-```
-
-One may enable the environment using
-```bash
-conda activate deepmd
-```
-
-### Install with docker
-A docker for installing the DeePMD-kit is available [here](https://github.com/orgs/deepmodeling/packages/container/package/deepmd-kit).
-
-To pull the CPU version:
-```bash
-docker pull ghcr.io/deepmodeling/deepmd-kit:2.0.0_cpu
-```
-
-To pull the GPU version:
-```bash
-docker pull ghcr.io/deepmodeling/deepmd-kit:2.0.0_cuda10.1_gpu
-```
-
-
-## Install from source code
-
-Please follow our [github](https://github.com/deepmodeling/deepmd-kit) webpage to download the [latest released version](https://github.com/deepmodeling/deepmd-kit/tree/master) and [development version](https://github.com/deepmodeling/deepmd-kit/tree/devel).
-
-Or get the DeePMD-kit source code by `git clone`
-```bash
-cd /some/workspace
-git clone --recursive https://github.com/deepmodeling/deepmd-kit.git deepmd-kit
-```
-The `--recursive` option clones all [submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) needed by DeePMD-kit.
-
-For convenience, you may want to record the location of source to a variable, saying `deepmd_source_dir` by
-```bash
-cd deepmd-kit
-deepmd_source_dir=`pwd`
-```
-- [Install the python interaction](#install-the-python-interface)
-    - [Install the Tensorflow's python interface](#install-the-tensorflows-python-interface)
-    - [Install the DeePMD-kit's python interface](#install-the-deepmd-kits-python-interface)
-- [Install the C++ interface](#install-the-c-interface)
-    - [Install the Tensorflow's C++ interface](#install-the-tensorflows-c-interface)
-    - [Install the DeePMD-kit's C++ interface](#install-the-deepmd-kits-c-interface)
-- [Install LAMMPS's DeePMD-kit module](#install-lammpss-deepmd-kit-module)
-
-
-### Install the python interface 
-#### Install the Tensorflow's python interface
-First, check the python version on your machine 
-```bash
-python --version
-```
-
-We follow the virtual environment approach to install the tensorflow's Python interface. The full instruction can be found on [the tensorflow's official website](https://www.tensorflow.org/install/pip). Now we assume that the Python interface will be installed to virtual environment directory `$tensorflow_venv`
-```bash
-virtualenv -p python3 $tensorflow_venv
-source $tensorflow_venv/bin/activate
-pip install --upgrade pip
-pip install --upgrade tensorflow==2.3.0
-```
-It is notice that everytime a new shell is started and one wants to use `DeePMD-kit`, the virtual environment should be activated by 
-```bash
-source $tensorflow_venv/bin/activate
-```
-if one wants to skip out of the virtual environment, he/she can do
-```bash
-deactivate
-```
-If one has multiple python interpreters named like python3.x, it can be specified by, for example
-```bash
-virtualenv -p python3.7 $tensorflow_venv
-```
-If one does not need the GPU support of deepmd-kit and is concerned about package size, the CPU-only version of tensorflow should be installed by	
-```bash	
-pip install --upgrade tensorflow-cpu==2.3.0	
-```
-To verify the installation, run
-```bash
-python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
-```
-One should remember to activate the virtual environment every time he/she uses deepmd-kit.
-
-#### Install the DeePMD-kit's python interface
-
-Execute
-```bash
-cd $deepmd_source_dir
-pip install .
-```
-
-One may set the following environment variables before executing `pip`:
-
-| Environment variables | Allowed value          | Default value | Usage                      |
-| --------------------- | ---------------------- | ------------- | -------------------------- |
-| DP_VARIANT            | `cpu`, `cuda`, `rocm`  | `cpu`         | Build CPU variant or GPU variant with CUDA or ROCM support. |
-| CUDA_TOOLKIT_ROOT_DIR | Path                   | Detected automatically | The path to the CUDA toolkit directory. |
-| ROCM_ROOT             | Path                   | Detected automatically | The path to the ROCM toolkit directory. |
-
-To test the installation, one should firstly jump out of the source directory
-```
-cd /some/other/workspace
-```
-then execute
-```bash
-dp -h
-```
-It will print the help information like
-```text
-usage: dp [-h] {train,freeze,test} ...
-
-DeePMD-kit: A deep learning package for many-body potential energy
-representation and molecular dynamics
-
-optional arguments:
-  -h, --help           show this help message and exit
-
-Valid subcommands:
-  {train,freeze,test}
-    train              train a model
-    freeze             freeze the model
-    test               test the model
-```
-
-### Install the C++ interface 
-
-If one does not need to use DeePMD-kit with Lammps or I-Pi, then the python interface installed in the previous section does everything and he/she can safely skip this section. 
-
-#### Install the Tensorflow's C++ interface
-
-Check the compiler version on your machine
-
-```
-gcc --version
-```
-
-The C++ interface of DeePMD-kit was tested with compiler gcc >= 4.8. It is noticed that the I-Pi support is only compiled with gcc >= 4.9.
-
-First the C++ interface of Tensorflow should be installed. It is noted that the version of Tensorflow should be in consistent with the python interface. You may follow [the instruction](install-tf.2.3.md) to install the corresponding C++ interface.
-
-#### Install the DeePMD-kit's C++ interface
-
-Now goto the source code directory of DeePMD-kit and make a build place.
-```bash
-cd $deepmd_source_dir/source
-mkdir build 
-cd build
-```
-I assume you want to install DeePMD-kit into path `$deepmd_root`, then execute cmake
-```bash
-cmake -DTENSORFLOW_ROOT=$tensorflow_root -DCMAKE_INSTALL_PREFIX=$deepmd_root ..
-```
-where the variable `tensorflow_root` stores the location where the TensorFlow's C++ interface is installed. 
-
-One may add the following arguments to `cmake`:
-
-| CMake Aurgements         | Allowed value       | Default value | Usage                   |
-| ------------------------ | ------------------- | ------------- | ------------------------|
-| -DTENSORFLOW_ROOT=&lt;value&gt;  | Path              | -             | The Path to TensorFlow's C++ interface. |
-| -DCMAKE_INSTALL_PREFIX=&lt;value&gt; | Path          | -             | The Path where DeePMD-kit will be installed. |
-| -DUSE_CUDA_TOOLKIT=&lt;value&gt; | `TRUE` or `FALSE` | `FALSE`       | If `TRUE`, Build GPU support with CUDA toolkit. |
-| -DCUDA_TOOLKIT_ROOT_DIR=&lt;value&gt; | Path         | Detected automatically | The path to the CUDA toolkit directory. |
-| -DUSE_ROCM_TOOLKIT=&lt;value&gt; | `TRUE` or `FALSE` | `FALSE`       | If `TRUE`, Build GPU support with ROCM toolkit. |
-| -DROCM_ROOT=&lt;value&gt; | Path         | Detected automatically | The path to the ROCM toolkit directory. |
-| -DLAMMPS_VERSION_NUMBER=&lt;value&gt; | Number         | `20201029` | Only neccessary for LAMMPS built-in mode. The version number of LAMMPS (yyyymmdd). |
-| -DLAMMPS_SOURCE_ROOT=&lt;value&gt; | Path         | - | Only neccessary for LAMMPS plugin mode. The path to the LAMMPS source code (later than 8Apr2021). If not assigned, the plugin mode will not be enabled. |
-
-If the cmake has executed successfully, then 
-```bash
-make -j4
-make install
-```
-The option `-j4` means using 4 processes in parallel. You may want to use a different number according to your hardware. 
-
-If everything works fine, you will have the following executable and libraries installed in `$deepmd_root/bin` and `$deepmd_root/lib`
-```bash
-$ ls $deepmd_root/bin
-dp_ipi      dp_ipi_low
-$ ls $deepmd_root/lib
-libdeepmd_cc_low.so  libdeepmd_ipi_low.so  libdeepmd_lmp_low.so  libdeepmd_low.so          libdeepmd_op_cuda.so  libdeepmd_op.so
-libdeepmd_cc.so      libdeepmd_ipi.so      libdeepmd_lmp.so      libdeepmd_op_cuda_low.so  libdeepmd_op_low.so   libdeepmd.so
-```
-
-## Install third-party packages
-### Install LAMMPS's DeePMD-kit module (built-in mode)
-DeePMD-kit provide module for running MD simulation with LAMMPS. Now make the DeePMD-kit module for LAMMPS. If you want to use the plugin mode instead of the built-in mode, you can directly go to the next section.
-```bash
-cd $deepmd_source_dir/source/build
-make lammps
-```
-DeePMD-kit will generate a module called `USER-DEEPMD` in the `build` directory. If you need low precision version, move `env_low.sh` to `env.sh` in the directory. Now download the LAMMPS code (`29Oct2020` or later), and uncompress it:
-```bash
-cd /some/workspace
-wget https://github.com/lammps/lammps/archive/stable_29Oct2020.tar.gz
-tar xf stable_29Oct2020.tar.gz
-```
-The source code of LAMMPS is stored in directory `lammps-stable_29Oct2020`. Now go into the LAMMPS code and copy the DeePMD-kit module like this
-```bash
-cd lammps-stable_29Oct2020/src/
-cp -r $deepmd_source_dir/source/build/USER-DEEPMD .
-```
-Now build LAMMPS
-```bash
-make yes-kspace
-make yes-user-deepmd
-make mpi -j4
-```
-
-If everything works fine, you will end up with an executable `lmp_mpi`.
-```bash
-./lmp_mpi -h
-```
-
-The DeePMD-kit module can be removed from LAMMPS source code by 
-```bash
-make no-user-deepmd
-```
-
-### Install LAMMPS (plugin mode)
-Starting from `8Apr2021`, LAMMPS also provides a plugin mode, allowing one build LAMMPS and a plugin separately. You can skip the section if you are using the built-in mode.
-
-Now download the LAMMPS code (`8Apr2021` or later), and uncompress it:
-```bash
-cd /some/workspace
-wget https://github.com/lammps/lammps/archive/patch_30Jul2021.tar.gz
-tar xf patch_30Jul2021.tar.gz
-```
-The source code of LAMMPS is stored in directory `lammps-patch_30Jul2021`. Now go into the LAMMPS code and create a directory called `build`
-```bash
-mkdir -p lammps-patch_30Jul2021/build/
-cd lammps-patch_30Jul2021/build/
-```
-Now build LAMMPS. Note that `PLUGIN` and `KSPACE` package must be enabled, and `BUILD_SHARED_LIBS` must be set to `yes`. You can install any other package you want.
-```bash
-cmake -D PKG_PLUGIN=ON -D PKG_KSPACE=ON -D LAMMPS_INSTALL_RPATH=ON -D BUILD_SHARED_LIBS=yes -D CMAKE_INSTALL_PREFIX=${deepmd_root} ../cmake
-make -j4
-make install
-```
-
-If everything works fine, you will end up with an executable `${deepmd_root}/lmp`.
-```bash
-${deepmd_root}/lmp -h
-```
-
-### Install i-PI
-The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` that computes the interactions (including energy, force and virial). The server and client communicates via the Unix domain socket or the Internet socket. A full instruction of i-PI can be found [here](http://ipi-code.org/). The source code and a complete installation instructions of i-PI can be found [here](https://github.com/i-pi/i-pi).
-To use i-PI with already existing drivers, install and update using Pip:
-```bash
-pip install -U i-PI
-```
-
-Test with Pytest:
-```bash
-pip install pytest
-pytest --pyargs ipi.tests
-```
-
-## Building conda packages
-
-One may want to keep both convenience and personalization of the DeePMD-kit. To achieve this goal, one can consider builing conda packages. We provide building scripts in [deepmd-kit-recipes organization](https://github.com/deepmd-kit-recipes/). These building tools are driven by [conda-build](https://github.com/conda/conda-build) and [conda-smithy](https://github.com/conda-forge/conda-smithy).
-
-For example, if one wants to turn on `MPIIO` package in LAMMPS, go to [`lammps-dp-feedstock`](https://github.com/deepmd-kit-recipes/lammps-dp-feedstock/) repository and modify `recipe/build.sh`. `-D PKG_MPIIO=OFF` should be changed to `-D PKG_MPIIO=ON`. Then go to the main directory and executing
-
-```sh
-./build-locally.py
-```
-
-This requires the Docker has been installed. After the building, the packages will be generated in `build_artifacts/linux-64` and `build_artifacts/noarch`, and then one can install then execuating
-```sh
-conda create -n deepmd lammps-dp -c file:///path/to/build_artifacts -c https://conda.deepmodeling.org -c nvidia
-```
-
-One may also upload packages to one's Anaconda channel, so they can be installed on other machines:
-
-```sh
-anaconda upload /path/to/build_artifacts/linux-64/*.tar.bz2 /path/to/build_artifacts/noarch/*.tar.bz2
-```
diff --git a/doc/install/build-conda.md b/doc/install/build-conda.md
new file mode 100644
index 0000000000..aae9c64a38
--- /dev/null
+++ b/doc/install/build-conda.md
@@ -0,0 +1,20 @@
+# Building conda packages
+
+One may want to keep both convenience and personalization of the DeePMD-kit. To achieve this goal, one can consider builing conda packages. We provide building scripts in [deepmd-kit-recipes organization](https://github.com/deepmd-kit-recipes/). These building tools are driven by [conda-build](https://github.com/conda/conda-build) and [conda-smithy](https://github.com/conda-forge/conda-smithy).
+
+For example, if one wants to turn on `MPIIO` package in LAMMPS, go to [`lammps-dp-feedstock`](https://github.com/deepmd-kit-recipes/lammps-dp-feedstock/) repository and modify `recipe/build.sh`. `-D PKG_MPIIO=OFF` should be changed to `-D PKG_MPIIO=ON`. Then go to the main directory and executing
+
+```sh
+./build-locally.py
+```
+
+This requires the Docker has been installed. After the building, the packages will be generated in `build_artifacts/linux-64` and `build_artifacts/noarch`, and then one can install then execuating
+```sh
+conda create -n deepmd lammps-dp -c file:///path/to/build_artifacts -c https://conda.deepmodeling.org -c nvidia
+```
+
+One may also upload packages to one's Anaconda channel, so they can be installed on other machines:
+
+```sh
+anaconda upload /path/to/build_artifacts/linux-64/*.tar.bz2 /path/to/build_artifacts/noarch/*.tar.bz2
+```
diff --git a/doc/install/easy-install.md b/doc/install/easy-install.md
new file mode 100644
index 0000000000..e1e7c7d7b8
--- /dev/null
+++ b/doc/install/easy-install.md
@@ -0,0 +1,55 @@
+# Easy install
+
+There various easy methods to install DeePMD-kit. Choose one that you prefer. If you want to build by yourself, jump to the next two sections.
+
+After your easy installation, DeePMD-kit (`dp`) and LAMMPS (`lmp`) will be available to execute. You can try `dp -h` and `lmp -h` to see the help. `mpirun` is also available considering you may want to run LAMMPS in parallel.
+
+- [Install off-line packages](#install-off-line-packages)
+- [Install with conda](#install-with-conda)
+- [Install with docker](#install-with-docker)
+
+
+## Install off-line packages
+Both CPU and GPU version offline packages are avaiable in [the Releases page](https://github.com/deepmodeling/deepmd-kit/releases).
+
+Some packages are splited into two files due to size limit of GitHub. One may merge them into one after downloading:
+```bash
+cat deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh.0 deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh.1 > deepmd-kit-2.0.0-cuda11.3_gpu-Linux-x86_64.sh
+```
+
+## Install with conda
+DeePMD-kit is avaiable with [conda](https://github.com/conda/conda). Install [Anaconda](https://www.anaconda.com/distribution/#download-section) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) first.
+
+One may create an environment that contains the CPU version of DeePMD-kit and LAMMPS:
+```bash
+conda create -n deepmd deepmd-kit=*=*cpu libdeepmd=*=*cpu lammps-dp -c https://conda.deepmodeling.org
+```
+
+Or one may want to create a GPU environment containing [CUDA Toolkit](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver):
+```bash
+conda create -n deepmd deepmd-kit=*=*gpu libdeepmd=*=*gpu lammps-dp cudatoolkit=11.3 -c https://conda.deepmodeling.org
+```
+One could change the CUDA Toolkit version from `10.1` or `11.3`.
+
+One may speficy the DeePMD-kit version such as `2.0.0` using
+```bash
+conda create -n deepmd deepmd-kit=2.0.0=*cpu libdeepmd=2.0.0=*cpu lammps-dp=2.0.0 -c https://conda.deepmodeling.org
+```
+
+One may enable the environment using
+```bash
+conda activate deepmd
+```
+
+## Install with docker
+A docker for installing the DeePMD-kit is available [here](https://github.com/orgs/deepmodeling/packages/container/package/deepmd-kit).
+
+To pull the CPU version:
+```bash
+docker pull ghcr.io/deepmodeling/deepmd-kit:2.0.0_cpu
+```
+
+To pull the GPU version:
+```bash
+docker pull ghcr.io/deepmodeling/deepmd-kit:2.0.0_cuda10.1_gpu
+```
\ No newline at end of file
diff --git a/doc/install/index.md b/doc/install/index.md
new file mode 100644
index 0000000000..997efe7a35
--- /dev/null
+++ b/doc/install/index.md
@@ -0,0 +1,7 @@
+# Installation
+
+- [Easy install](easy-install.md)
+- [Install from source code](install-from-source.md)
+- [Install LAMMPS](install-lammps.md)
+- [Install i-PI](install-ipi.md)
+- [Building conda packages](build-conda.md)
\ No newline at end of file
diff --git a/doc/install/index.rst b/doc/install/index.rst
new file mode 100644
index 0000000000..b36b3e68ff
--- /dev/null
+++ b/doc/install/index.rst
@@ -0,0 +1,11 @@
+Installation
+============
+
+.. toctree::
+   :maxdepth: 1
+
+   easy-install
+   install-from-source
+   install-lammps
+   install-ipi
+   build-conda
diff --git a/doc/install/install-from-source.md b/doc/install/install-from-source.md
new file mode 100644
index 0000000000..9a20a565e3
--- /dev/null
+++ b/doc/install/install-from-source.md
@@ -0,0 +1,152 @@
+# Install from source code
+
+Please follow our [github](https://github.com/deepmodeling/deepmd-kit) webpage to download the [latest released version](https://github.com/deepmodeling/deepmd-kit/tree/master) and [development version](https://github.com/deepmodeling/deepmd-kit/tree/devel).
+
+Or get the DeePMD-kit source code by `git clone`
+```bash
+cd /some/workspace
+git clone --recursive https://github.com/deepmodeling/deepmd-kit.git deepmd-kit
+```
+The `--recursive` option clones all [submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) needed by DeePMD-kit.
+
+For convenience, you may want to record the location of source to a variable, saying `deepmd_source_dir` by
+```bash
+cd deepmd-kit
+deepmd_source_dir=`pwd`
+```
+
+## Install the python interface 
+### Install the Tensorflow's python interface
+First, check the python version on your machine 
+```bash
+python --version
+```
+
+We follow the virtual environment approach to install the tensorflow's Python interface. The full instruction can be found on [the tensorflow's official website](https://www.tensorflow.org/install/pip). Now we assume that the Python interface will be installed to virtual environment directory `$tensorflow_venv`
+```bash
+virtualenv -p python3 $tensorflow_venv
+source $tensorflow_venv/bin/activate
+pip install --upgrade pip
+pip install --upgrade tensorflow==2.3.0
+```
+It is notice that everytime a new shell is started and one wants to use `DeePMD-kit`, the virtual environment should be activated by 
+```bash
+source $tensorflow_venv/bin/activate
+```
+if one wants to skip out of the virtual environment, he/she can do
+```bash
+deactivate
+```
+If one has multiple python interpreters named like python3.x, it can be specified by, for example
+```bash
+virtualenv -p python3.7 $tensorflow_venv
+```
+If one does not need the GPU support of deepmd-kit and is concerned about package size, the CPU-only version of tensorflow should be installed by	
+```bash	
+pip install --upgrade tensorflow-cpu==2.3.0	
+```
+To verify the installation, run
+```bash
+python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
+```
+One should remember to activate the virtual environment every time he/she uses deepmd-kit.
+
+### Install the DeePMD-kit's python interface
+
+Execute
+```bash
+cd $deepmd_source_dir
+pip install .
+```
+
+One may set the following environment variables before executing `pip`:
+
+| Environment variables | Allowed value          | Default value | Usage                      |
+| --------------------- | ---------------------- | ------------- | -------------------------- |
+| DP_VARIANT            | `cpu`, `cuda`, `rocm`  | `cpu`         | Build CPU variant or GPU variant with CUDA or ROCM support. |
+| CUDA_TOOLKIT_ROOT_DIR | Path                   | Detected automatically | The path to the CUDA toolkit directory. |
+| ROCM_ROOT             | Path                   | Detected automatically | The path to the ROCM toolkit directory. |
+
+To test the installation, one should firstly jump out of the source directory
+```
+cd /some/other/workspace
+```
+then execute
+```bash
+dp -h
+```
+It will print the help information like
+```text
+usage: dp [-h] {train,freeze,test} ...
+
+DeePMD-kit: A deep learning package for many-body potential energy
+representation and molecular dynamics
+
+optional arguments:
+  -h, --help           show this help message and exit
+
+Valid subcommands:
+  {train,freeze,test}
+    train              train a model
+    freeze             freeze the model
+    test               test the model
+```
+
+## Install the C++ interface 
+
+If one does not need to use DeePMD-kit with Lammps or I-Pi, then the python interface installed in the previous section does everything and he/she can safely skip this section. 
+
+### Install the Tensorflow's C++ interface
+
+Check the compiler version on your machine
+
+```
+gcc --version
+```
+
+The C++ interface of DeePMD-kit was tested with compiler gcc >= 4.8. It is noticed that the I-Pi support is only compiled with gcc >= 4.9.
+
+First the C++ interface of Tensorflow should be installed. It is noted that the version of Tensorflow should be in consistent with the python interface. You may follow [the instruction](install-tf.2.3.md) to install the corresponding C++ interface.
+
+### Install the DeePMD-kit's C++ interface
+
+Now goto the source code directory of DeePMD-kit and make a build place.
+```bash
+cd $deepmd_source_dir/source
+mkdir build 
+cd build
+```
+I assume you want to install DeePMD-kit into path `$deepmd_root`, then execute cmake
+```bash
+cmake -DTENSORFLOW_ROOT=$tensorflow_root -DCMAKE_INSTALL_PREFIX=$deepmd_root ..
+```
+where the variable `tensorflow_root` stores the location where the TensorFlow's C++ interface is installed. 
+
+One may add the following arguments to `cmake`:
+
+| CMake Aurgements         | Allowed value       | Default value | Usage                   |
+| ------------------------ | ------------------- | ------------- | ------------------------|
+| -DTENSORFLOW_ROOT=&lt;value&gt;  | Path              | -             | The Path to TensorFlow's C++ interface. |
+| -DCMAKE_INSTALL_PREFIX=&lt;value&gt; | Path          | -             | The Path where DeePMD-kit will be installed. |
+| -DUSE_CUDA_TOOLKIT=&lt;value&gt; | `TRUE` or `FALSE` | `FALSE`       | If `TRUE`, Build GPU support with CUDA toolkit. |
+| -DCUDA_TOOLKIT_ROOT_DIR=&lt;value&gt; | Path         | Detected automatically | The path to the CUDA toolkit directory. |
+| -DUSE_ROCM_TOOLKIT=&lt;value&gt; | `TRUE` or `FALSE` | `FALSE`       | If `TRUE`, Build GPU support with ROCM toolkit. |
+| -DROCM_ROOT=&lt;value&gt; | Path         | Detected automatically | The path to the ROCM toolkit directory. |
+| -DLAMMPS_VERSION_NUMBER=&lt;value&gt; | Number         | `20201029` | Only neccessary for LAMMPS built-in mode. The version number of LAMMPS (yyyymmdd). |
+| -DLAMMPS_SOURCE_ROOT=&lt;value&gt; | Path         | - | Only neccessary for LAMMPS plugin mode. The path to the LAMMPS source code (later than 8Apr2021). If not assigned, the plugin mode will not be enabled. |
+
+If the cmake has executed successfully, then 
+```bash
+make -j4
+make install
+```
+The option `-j4` means using 4 processes in parallel. You may want to use a different number according to your hardware. 
+
+If everything works fine, you will have the following executable and libraries installed in `$deepmd_root/bin` and `$deepmd_root/lib`
+```bash
+$ ls $deepmd_root/bin
+dp_ipi      dp_ipi_low
+$ ls $deepmd_root/lib
+libdeepmd_cc_low.so  libdeepmd_ipi_low.so  libdeepmd_lmp_low.so  libdeepmd_low.so          libdeepmd_op_cuda.so  libdeepmd_op.so
+libdeepmd_cc.so      libdeepmd_ipi.so      libdeepmd_lmp.so      libdeepmd_op_cuda_low.so  libdeepmd_op_low.so   libdeepmd.so
+```
\ No newline at end of file
diff --git a/doc/install/install-ipi.md b/doc/install/install-ipi.md
new file mode 100644
index 0000000000..2317d299f4
--- /dev/null
+++ b/doc/install/install-ipi.md
@@ -0,0 +1,12 @@
+# Install i-PI
+The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` that computes the interactions (including energy, force and virial). The server and client communicates via the Unix domain socket or the Internet socket. A full instruction of i-PI can be found [here](http://ipi-code.org/). The source code and a complete installation instructions of i-PI can be found [here](https://github.com/i-pi/i-pi).
+To use i-PI with already existing drivers, install and update using Pip:
+```bash
+pip install -U i-PI
+```
+
+Test with Pytest:
+```bash
+pip install pytest
+pytest --pyargs ipi.tests
+```
\ No newline at end of file
diff --git a/doc/install/install-lammps.md b/doc/install/install-lammps.md
new file mode 100644
index 0000000000..d86a7e909f
--- /dev/null
+++ b/doc/install/install-lammps.md
@@ -0,0 +1,64 @@
+# Install LAMMPS
+
+There are two ways to install LAMMPS: the built-in mode and the plugin mode. The built-in mode builds LAMMPS along with the DeePMD-kit and DeePMD-kit will be loaded automatically when running LAMMPS. The plugin mode builds LAMMPS and a plugin separately, so one need to use `plugin load` command to load the DeePMD-kit's LAMMPS plugin library. 
+
+## Install LAMMPS's DeePMD-kit module (built-in mode)
+DeePMD-kit provide module for running MD simulation with LAMMPS. Now make the DeePMD-kit module for LAMMPS.
+
+```bash
+cd $deepmd_source_dir/source/build
+make lammps
+```
+DeePMD-kit will generate a module called `USER-DEEPMD` in the `build` directory. If you need low precision version, move `env_low.sh` to `env.sh` in the directory. Now download the LAMMPS code (`29Oct2020` or later), and uncompress it:
+```bash
+cd /some/workspace
+wget https://github.com/lammps/lammps/archive/stable_29Oct2020.tar.gz
+tar xf stable_29Oct2020.tar.gz
+```
+The source code of LAMMPS is stored in directory `lammps-stable_29Oct2020`. Now go into the LAMMPS code and copy the DeePMD-kit module like this
+```bash
+cd lammps-stable_29Oct2020/src/
+cp -r $deepmd_source_dir/source/build/USER-DEEPMD .
+```
+Now build LAMMPS
+```bash
+make yes-kspace
+make yes-user-deepmd
+make mpi -j4
+```
+
+If everything works fine, you will end up with an executable `lmp_mpi`.
+```bash
+./lmp_mpi -h
+```
+
+The DeePMD-kit module can be removed from LAMMPS source code by 
+```bash
+make no-user-deepmd
+```
+
+## Install LAMMPS (plugin mode)
+Starting from `8Apr2021`, LAMMPS also provides a plugin mode, allowing one build LAMMPS and a plugin separately.
+
+Now download the LAMMPS code (`8Apr2021` or later), and uncompress it:
+```bash
+cd /some/workspace
+wget https://github.com/lammps/lammps/archive/patch_30Jul2021.tar.gz
+tar xf patch_30Jul2021.tar.gz
+```
+The source code of LAMMPS is stored in directory `lammps-patch_30Jul2021`. Now go into the LAMMPS code and create a directory called `build`
+```bash
+mkdir -p lammps-patch_30Jul2021/build/
+cd lammps-patch_30Jul2021/build/
+```
+Now build LAMMPS. Note that `PLUGIN` and `KSPACE` package must be enabled, and `BUILD_SHARED_LIBS` must be set to `yes`. You can install any other package you want.
+```bash
+cmake -D PKG_PLUGIN=ON -D PKG_KSPACE=ON -D LAMMPS_INSTALL_RPATH=ON -D BUILD_SHARED_LIBS=yes -D CMAKE_INSTALL_PREFIX=${deepmd_root} ../cmake
+make -j4
+make install
+```
+
+If everything works fine, you will end up with an executable `${deepmd_root}/lmp`.
+```bash
+${deepmd_root}/lmp -h
+```
\ No newline at end of file
diff --git a/doc/install-tf.1.12.md b/doc/install/install-tf.1.12.md
similarity index 100%
rename from doc/install-tf.1.12.md
rename to doc/install/install-tf.1.12.md
diff --git a/doc/install-tf.1.14-gpu.md b/doc/install/install-tf.1.14-gpu.md
similarity index 100%
rename from doc/install-tf.1.14-gpu.md
rename to doc/install/install-tf.1.14-gpu.md
diff --git a/doc/install-tf.1.14.md b/doc/install/install-tf.1.14.md
similarity index 100%
rename from doc/install-tf.1.14.md
rename to doc/install/install-tf.1.14.md
diff --git a/doc/install-tf.1.8.md b/doc/install/install-tf.1.8.md
similarity index 100%
rename from doc/install-tf.1.8.md
rename to doc/install/install-tf.1.8.md
diff --git a/doc/install-tf.2.3.md b/doc/install/install-tf.2.3.md
similarity index 100%
rename from doc/install-tf.2.3.md
rename to doc/install/install-tf.2.3.md
diff --git a/doc/model/index.md b/doc/model/index.md
new file mode 100644
index 0000000000..715860f7b1
--- /dev/null
+++ b/doc/model/index.md
@@ -0,0 +1,10 @@
+# Model
+
+- [Overall](overall.md)
+- [Descriptor `"se_e2_a"`](train-se-e2-a.md)
+- [Descriptor `"se_e2_r"`](train-se-e2-r.md)
+- [Descriptor `"se_e3"`](train-se-e3.md)
+- [Descriptor `"hybrid"`](train-hybrid.md)
+- [Fit energy](train-energy.md)
+- [Fit `tensor` like `Dipole` and `Polarizability`](train-fitting-tensor.md)
+- [Train a Deep Potential model using `type embedding` approach](train-se-e2-a-tebd.md)
\ No newline at end of file
diff --git a/doc/model/index.rst b/doc/model/index.rst
new file mode 100644
index 0000000000..b264d95eca
--- /dev/null
+++ b/doc/model/index.rst
@@ -0,0 +1,14 @@
+Model
+=====
+
+.. toctree::
+   :maxdepth: 1
+
+   overall
+   train-se-e2-a
+   train-se-e2-r
+   train-se-e3
+   train-hybrid
+   train-energy
+   train-fitting-tensor
+   train-se-e2-a-tebd
\ No newline at end of file
diff --git a/doc/model/overall.md b/doc/model/overall.md
new file mode 100644
index 0000000000..87827363d7
--- /dev/null
+++ b/doc/model/overall.md
@@ -0,0 +1,30 @@
+# Overall
+
+A model has two parts, a descriptor that maps atomic configuration to a set of symmetry invariant features, and a fitting net that takes descriptor as input and predicts the atomic contribution to the target physical property. It's defined in the `model` section of the `input.json`, for example
+```json
+    "model": {
+        "type_map":	["O", "H"],
+        "descriptor" :{
+            "...": "..."
+        },
+        "fitting_net" : {
+            "...": "..."
+        }
+    }
+```
+
+Assume that we are looking for a model for water, we will have two types of atoms. The atom types are recorded as integers. In this example, we denote `0` for oxygen and `1` for hydrogen. A mapping from the atom type to their names is provided by `type_map`. 
+
+The model has two subsections `descritpor` and `fitting_net`, which defines the descriptor and the fitting net, respectively. The `type_map` is optional, which provides the element names (but not necessarily to be the element name) of the corresponding atom types.
+
+DeePMD-kit implements the following descriptors:
+1. [`se_e2_a`](train-se-e2-a.md): DeepPot-SE constructed from all information (both angular and radial) of atomic configurations. The embedding takes the distance between atoms as input.
+2. [`se_e2_r`](train-se-e2-r.md): DeepPot-SE constructed from radial information of atomic configurations. The embedding takes the distance between atoms as input.
+3. [`se_e3`](train-se-e3.md): DeepPot-SE constructed from all information (both angular and radial) of atomic configurations. The embedding takes angles between two neighboring atoms as input.
+4. `loc_frame`: Defines a local frame at each atom, and the compute the descriptor as local coordinates under this frame.
+5. [`hybrid`](train-hybrid.md): Concate a list of descriptors to form a new descriptor.
+
+The fitting of the following physical properties are supported
+1. [`ener`](train-energy.md): Fitting the energy of the system. The force (derivative with atom positions) and the virial (derivative with the box tensor) can also be trained. See [the example](train-se-e2-a.md#loss).
+2. [`dipole`](train-fitting-tensor.md): The dipole moment.
+3. [`polar`](train-fitting-tensor.md): The polarizability.
diff --git a/doc/model/train-energy.md b/doc/model/train-energy.md
new file mode 100644
index 0000000000..65f80f85e7
--- /dev/null
+++ b/doc/model/train-energy.md
@@ -0,0 +1,44 @@
+# Fit energy
+
+In this section, we will take `$deepmd_source_dir/examples/water/se_e2_a/input.json` as an example of the input file.
+
+## Fitting network
+
+The construction of the fitting net is give by section `fitting_net`
+```json
+	"fitting_net" : {
+	    "neuron":		[240, 240, 240],
+	    "resnet_dt":	true,
+	    "seed":		1
+	},
+```
+* `neuron` specifies the size of the fitting net. If two neighboring layers are of the same size, then a [ResNet architecture](https://arxiv.org/abs/1512.03385) is built between them. 
+* If the option `resnet_dt` is set `true`, then a timestep is used in the ResNet. 
+* `seed` gives the random seed that is used to generate random numbers when initializing the model parameters.
+
+## Loss
+
+The loss function for training energy is given by
+```
+loss = pref_e * loss_e + pref_f * loss_f + pref_v * loss_v
+```
+where `loss_e`, `loss_f` and `loss_v` denote the loss in energy, force and virial, respectively. `pref_e`, `pref_f` and `pref_v` give the prefactors of the energy, force and virial losses. The prefectors may not be a constant, rather it changes linearly with the learning rate. Taking the force prefactor for example, at training step `t`, it is given by
+```math
+pref_f(t) = start_pref_f * ( lr(t) / start_lr ) + limit_pref_f * ( 1 - lr(t) / start_lr )
+```
+where `lr(t)` denotes the learning rate at step `t`. `start_pref_f` and `limit_pref_f` specifies the `pref_f` at the start of the training and at the limit of `t -> inf`.
+
+The `loss` section in the `input.json` is 
+```json
+    "loss" : {
+	"start_pref_e":	0.02,
+	"limit_pref_e":	1,
+	"start_pref_f":	1000,
+	"limit_pref_f":	1,
+	"start_pref_v":	0,
+	"limit_pref_v":	0
+    }
+```
+The options `start_pref_e`, `limit_pref_e`, `start_pref_f`, `limit_pref_f`, `start_pref_v` and `limit_pref_v` determine the start and limit prefactors of energy, force and virial, respectively.
+
+If one does not want to train with virial, then he/she may set the virial prefactors `start_pref_v` and `limit_pref_v` to 0.
diff --git a/doc/train-fitting-tensor.md b/doc/model/train-fitting-tensor.md
similarity index 93%
rename from doc/train-fitting-tensor.md
rename to doc/model/train-fitting-tensor.md
index 2f301c30b0..d67ad244da 100644
--- a/doc/train-fitting-tensor.md
+++ b/doc/model/train-fitting-tensor.md
@@ -1,4 +1,4 @@
-# Train a Deep Potential model to fit `tensor` like `Dipole` and `Polarizability`
+# Fit `tensor` like `Dipole` and `Polarizability`
 
 Unlike `energy` which is a scalar, one may want to fit some high dimensional physical quantity, like `dipole` (vector) and `polarizability` (matrix, shorted as `polar`). Deep Potential has provided different API to allow this. In this example we will show you how to train a model to fit them for a water system. A complete training input script of the examples can be found in 
 
@@ -9,19 +9,9 @@ $deepmd_source_dir/examples/water_tensor/polar/polar_input.json
 
 The training and validation data are also provided our examples. But note that **the data provided along with the examples are of limited amount, and should not be used to train a productive model.**
 
-
-
-The directory of this examples:
-
--   [The training input script](#the-training-input-script)
--   [Training Data Preparation](#training-data-preparation)
-- 	[Train the Model](#train-the-model)
-
-## The training input script
-
 Similar to the `input.json` used in `ener` mode, training json is also divided into `model`, `learning_rate`, `loss` and `training`. Most keywords remains the same as `ener` mode, and their meaning can be found [here](train-se-e2-a.md). To fit a tensor, one need to modify `model.fitting_net` and `loss`.
 
-### Model
+## Fitting Network
 
 The `fitting_net` section tells DP which fitting net to use.
 
@@ -53,7 +43,7 @@ The json of `polar` type should be provided like
 -   `sel_type` is a list specifying which type of atoms have the quantity you want to fit. For example, in water system, `sel_type` is `[0]` since `0` represents for atom `O`. If left unset, all type of atoms will be fitted.
 -   The rest `args` has the same meaning as they do in `ener` mode.
 
-### Loss
+## Loss
 
 DP supports a combinational training of global system (only a global `tensor` label, i.e. dipole or polar, is provided in a frame) and atomic system (labels for **each** atom included in `sel_type` are provided). In a global system, each frame has just **one** `tensor` label. For example, when fitting `polar`, each frame will just provide a `1 x 9` vector which gives the elements of the polarizability tensor of that frame in order XX, XY, XZ, YX, YY, YZ, XZ, ZY, ZZ. By contrast, in a atomic system, each atom in `sel_type` has a `tensor` label. For example, when fitting dipole, each frame will provide a `#sel_atom x 3` matrix, where `#sel_atom` is the number of atoms whose type are in `sel_type`.
 
@@ -69,7 +59,7 @@ The loss section should be provided like
 	"loss" : {
 		"type":		"tensor",
 		"pref":		1.0,
-		"pref_atomic":	1.0,
+		"pref_atomic":	1.0
 	},
 ```
 
diff --git a/doc/train-hybrid.md b/doc/model/train-hybrid.md
similarity index 92%
rename from doc/train-hybrid.md
rename to doc/model/train-hybrid.md
index 8c2097caa8..4ae8806867 100644
--- a/doc/train-hybrid.md
+++ b/doc/model/train-hybrid.md
@@ -1,9 +1,9 @@
-# Train a Deep Potential model using descriptor `"hybrid"`
+# Descriptor `"hybrid"`
 
 This descriptor hybridize multiple descriptors to form a new descriptor. For example we have a list of descriptor denoted by D_1, D_2, ..., D_N, the hybrid descriptor this the concatenation of the list, i.e. D = (D_1, D_2, ..., D_N).
 
 To use the descriptor in DeePMD-kit, one firstly set the `type` to `"hybrid"`, then provide the definitions of the descriptors by the items in the `list`,
-```json=
+```json
         "descriptor" :{
             "type": "hybrid",
             "list" : [
diff --git a/doc/train-se-e2-a-tebd.md b/doc/model/train-se-e2-a-tebd.md
similarity index 90%
rename from doc/train-se-e2-a-tebd.md
rename to doc/model/train-se-e2-a-tebd.md
index 2179b8b598..82815b6956 100644
--- a/doc/train-se-e2-a-tebd.md
+++ b/doc/model/train-se-e2-a-tebd.md
@@ -1,12 +1,12 @@
-# Train a Deep Potential model using `type embedding` approach
+# Type embedding approach
  
 We generate specific type embedding vector for each atom type, so that we can share one descriptor embedding net and one fitting net in total, which decline training complexity largely. 
 
 The training input script is similar to that of [`se_e2_a`](train-se-e2-a.md#the-training-input-script), but different by adding the `type_embedding` section. 
 
-### Type embedding net
+## Type embedding net
 The `model` defines how the model is constructed, adding a section of type embedding net:
-```json=
+```json
     "model": {
 	"type_map":	["O", "H"],
 	"type_embedding":{
@@ -23,7 +23,7 @@ The `model` defines how the model is constructed, adding a section of type embed
 Model will automatically apply type embedding approach and generate type embedding vectors. If type embedding vector is detected, descriptor and fitting net would take it as a part of input.
 
 The construction of type embedding net is given by `type_embedding`. An example of `type_embedding` is provided as follows
-```json=
+```json
 	"type_embedding":{
 	    "neuron":		[2, 4, 8],
 	    "resnet_dt":	false,
@@ -39,6 +39,6 @@ A complete training input script of this example can be find in the directory.
 ```bash
 $deepmd_source_dir/examples/water/se_e2_a_tebd/input.json
 ```
-See [here](development/type-embedding.md) for further explanation of `type embedding`.
+See [here](../development/type-embedding.md) for further explanation of `type embedding`.
 
 **P.S.: You can't apply compression method while using atom type embedding**
diff --git a/doc/model/train-se-e2-a.md b/doc/model/train-se-e2-a.md
new file mode 100644
index 0000000000..6c17329c0d
--- /dev/null
+++ b/doc/model/train-se-e2-a.md
@@ -0,0 +1,39 @@
+# Descriptor `"se_e2_a"`
+
+The notation of `se_e2_a` is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from all information (both angular and radial) of atomic configurations. The `e2` stands for the embedding with two-atoms information. This descriptor was described in detail in [the DeepPot-SE paper](https://arxiv.org/abs/1805.09003).
+
+In this example we will train a DeepPot-SE model for a water system.  A complete training input script of this example can be find in the directory. 
+```bash
+$deepmd_source_dir/examples/water/se_e2_a/input.json
+```
+With the training input script, data (please read the [warning](#warning)) are also provided in the example directory. One may train the model with the DeePMD-kit from the directory.
+
+The contents of the example:
+- [The training input](#the-training-input-script)
+- [Warning](#warning)
+
+
+#### Descriptor
+The construction of the descriptor is given by section `descriptor`. An example of the descriptor is provided as follows
+```json
+	"descriptor" :{
+	    "type":		"se_e2_a",
+	    "rcut_smth":	0.50,
+	    "rcut":		6.00,
+	    "sel":		[46, 92],
+	    "neuron":		[25, 50, 100],
+	    "type_one_side":	true,
+	    "axis_neuron":	16,
+	    "resnet_dt":	false,
+	    "seed":		1
+	}
+```
+* The `type` of the descriptor is set to `"se_e2_a"`. 
+* `rcut` is the cut-off radius for neighbor searching, and the `rcut_smth` gives where the smoothing starts. 
+* `sel` gives the maximum possible number of neighbors in the cut-off radius. It is a list, the length of which is the same as the number of atom types in the system, and `sel[i]` denote the maximum possible number of neighbors with type `i`. 
+* The `neuron` specifies the size of the embedding net. From left to right the members denote the sizes of each hidden layer from input end to the output end, respectively. If the outer layer is of twice size as the inner layer, then the inner layer is copied and concatenated, then a [ResNet architecture](https://arxiv.org/abs/1512.03385) is built between them.
+* If the option `type_one_side` is set to `true`, then descriptor will consider the types of neighbor atoms. Otherwise, both the types of centric and  neighbor atoms are considered.
+* The `axis_neuron` specifies the size of submatrix of the embedding matrix, the axis matrix as explained in the [DeepPot-SE paper](https://arxiv.org/abs/1805.09003) 
+* If the option `resnet_dt` is set `true`, then a timestep is used in the ResNet.
+* `seed` gives the random seed that is used to generate random numbers when initializing the model parameters.
+
diff --git a/doc/train-se-e2-r.md b/doc/model/train-se-e2-r.md
similarity index 92%
rename from doc/train-se-e2-r.md
rename to doc/model/train-se-e2-r.md
index af456f1eaf..997f32f2b9 100644
--- a/doc/train-se-e2-r.md
+++ b/doc/model/train-se-e2-r.md
@@ -1,4 +1,4 @@
-# Train a Deep Potential model using descriptor `"se_e2_r"`
+# Descriptor `"se_e2_r"`
 
 The notation of `se_e2_r` is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from the radial information of atomic configurations. The `e2` stands for the embedding with two-atom information. 
 
@@ -8,7 +8,7 @@ $deepmd_source_dir/examples/water/se_e2_r/input.json
 ```
 
 The training input script is very similar to that of [`se_e2_a`](train-se-e2-a.md#the-training-input-script). The only difference lies in the `descriptor` section
-```json=
+```json
 	"descriptor": {
 	    "type":		"se_e2_r",
 	    "sel":		[46, 92],
diff --git a/doc/train-se-e3.md b/doc/model/train-se-e3.md
similarity index 92%
rename from doc/train-se-e3.md
rename to doc/model/train-se-e3.md
index e854c34231..36bbd202db 100644
--- a/doc/train-se-e3.md
+++ b/doc/model/train-se-e3.md
@@ -1,4 +1,4 @@
-# Train a Deep Potential model using descriptor `"se_e3"`
+# Descriptor `"se_e3"`
 
 The notation of `se_e3` is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from all information (both angular and radial) of atomic configurations. The embedding takes angles between two neighboring atoms as input (denoted by `e3`).
 
@@ -8,7 +8,7 @@ $deepmd_source_dir/examples/water/se_e3/input.json
 ```
 
 The training input script is very similar to that of [`se_e2_a`](train-se-e2-a.md#the-training-input-script). The only difference lies in the `descriptor` section
-```json=
+```json
 	"descriptor": {
 	    "type":		"se_e3",
 	    "sel":		[40, 80],
diff --git a/doc/test/index.md b/doc/test/index.md
new file mode 100644
index 0000000000..815989d146
--- /dev/null
+++ b/doc/test/index.md
@@ -0,0 +1,4 @@
+# Test
+
+- [Test a model](test.md)
+- [Calculate Model Deviation](model-deviation.md)
\ No newline at end of file
diff --git a/doc/test/index.rst b/doc/test/index.rst
new file mode 100644
index 0000000000..ba3da8bef2
--- /dev/null
+++ b/doc/test/index.rst
@@ -0,0 +1,8 @@
+Test
+====
+
+.. toctree::
+   :maxdepth: 1
+
+   test
+   model-deviation
diff --git a/doc/test/model-deviation.md b/doc/test/model-deviation.md
new file mode 100644
index 0000000000..edc4e199d1
--- /dev/null
+++ b/doc/test/model-deviation.md
@@ -0,0 +1,38 @@
+# Calculate Model Deviation
+
+One can also use a subcommand to calculate deviation of prediced forces or virials for a bunch of models in the following way:
+```bash
+dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out
+```
+where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command:
+```bash
+usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]
+                     [-l LOG_PATH] [-m MODELS [MODELS ...]] [-s SYSTEM]
+                     [-S SET_PREFIX] [-o OUTPUT] [-f FREQUENCY] [-i ITEMS]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
+                        set verbosity level by string or number, 0=ERROR,
+                        1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
+  -l LOG_PATH, --log-path LOG_PATH
+                        set log file to log messages to disk, if not
+                        specified, the logs will only be output to console
+                        (default: None)
+  -m MODELS [MODELS ...], --models MODELS [MODELS ...]
+                        Frozen models file to import (default:
+                        ['graph.000.pb', 'graph.001.pb', 'graph.002.pb',
+                        'graph.003.pb'])
+  -s SYSTEM, --system SYSTEM
+                        The system directory, not support recursive detection.
+                        (default: .)
+  -S SET_PREFIX, --set-prefix SET_PREFIX
+                        The set prefix (default: set)
+  -o OUTPUT, --output OUTPUT
+                        The output file for results of model deviation
+                        (default: model_devi.out)
+  -f FREQUENCY, --frequency FREQUENCY
+                        The trajectory frequency of the system (default: 1)
+```
+
+For more details with respect to definition of model deviation and its application, please refer to [Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 253, 107206.](https://doi.org/10.1016/j.cpc.2020.107206)
diff --git a/doc/test/test.md b/doc/test/test.md
new file mode 100644
index 0000000000..d097ec2223
--- /dev/null
+++ b/doc/test/test.md
@@ -0,0 +1,32 @@
+# Test a model
+
+The frozen model can be used in many ways. The most straightforward test can be performed using `dp test`. A typical usage of `dp test` is 
+```bash
+dp test -m graph.pb -s /path/to/system -n 30
+```
+where `-m` gives the tested model, `-s` the path to the tested system and `-n` the number of tested frames. Several other command line options can be passed to `dp test`, which can be checked with
+```bash
+$ dp test --help
+```
+An explanation will be provided
+```
+usage: dp test [-h] [-m MODEL] [-s SYSTEM] [-S SET_PREFIX] [-n NUMB_TEST]
+               [-r RAND_SEED] [--shuffle-test] [-d DETAIL_FILE]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -m MODEL, --model MODEL
+                        Frozen model file to import
+  -s SYSTEM, --system SYSTEM
+                        The system dir
+  -S SET_PREFIX, --set-prefix SET_PREFIX
+                        The set prefix
+  -n NUMB_TEST, --numb-test NUMB_TEST
+                        The number of data for test
+  -r RAND_SEED, --rand-seed RAND_SEED
+                        The random seed
+  --shuffle-test        Shuffle test data
+  -d DETAIL_FILE, --detail-file DETAIL_FILE
+                        The file containing details of energy force and virial
+                        accuracy
+```
\ No newline at end of file
diff --git a/doc/third-party/ase.md b/doc/third-party/ase.md
new file mode 100644
index 0000000000..3abb44d997
--- /dev/null
+++ b/doc/third-party/ase.md
@@ -0,0 +1,24 @@
+# Use deep potential with ASE
+
+Deep potential can be set up as a calculator with ASE to obtain potential energies and forces.
+```python
+from ase import Atoms
+from deepmd.calculator import DP
+
+water = Atoms('H2O',
+              positions=[(0.7601, 1.9270, 1),
+                         (1.9575, 1, 1),
+                         (1., 1., 1.)],
+              cell=[100, 100, 100],
+              calculator=DP(model="frozen_model.pb"))
+print(water.get_potential_energy())
+print(water.get_forces())
+```
+
+Optimization is also available:
+```python
+from ase.optimize import BFGS
+dyn = BFGS(water)
+dyn.run(fmax=1e-6)
+print(water.get_positions())
+```
\ No newline at end of file
diff --git a/doc/third-party/index.md b/doc/third-party/index.md
new file mode 100644
index 0000000000..fd38898596
--- /dev/null
+++ b/doc/third-party/index.md
@@ -0,0 +1,8 @@
+# Integrate with third-party packages
+
+Note that the model for inference is required to be compatible with the DeePMD-kit package. See [Model compatibility](../troubleshooting/model-compatability.html) for details.
+
+- [Use deep potential with ASE](ase.md)
+- [Running MD with LAMMPS](lammps.md)
+- [LAMMPS commands](lammps-command.md)
+- [Run path-integral MD with i-PI](ipi.md)
\ No newline at end of file
diff --git a/doc/third-party/index.rst b/doc/third-party/index.rst
new file mode 100644
index 0000000000..6308d9969f
--- /dev/null
+++ b/doc/third-party/index.rst
@@ -0,0 +1,12 @@
+Integrate with third-party packages
+===================================
+
+Note that the model for inference is required to be compatible with the DeePMD-kit package. See `Model compatibility <../troubleshooting/model-compatability.html>`_ for details. 
+
+.. toctree::
+   :maxdepth: 1
+
+   ase
+   lammps
+   lammps-command
+   ipi
diff --git a/doc/third-party/ipi.md b/doc/third-party/ipi.md
new file mode 100644
index 0000000000..cd0448ce90
--- /dev/null
+++ b/doc/third-party/ipi.md
@@ -0,0 +1,34 @@
+### Run path-integral MD with i-PI
+The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` (or `dp_ipi_low` for low precision) that computes the interactions (including energy, force and virial). The server and client communicates via the Unix domain socket or the Internet socket. Installation instructions of i-PI can be found [here](../install/install-ipi.md). The client can be started by
+```bash
+i-pi input.xml &
+dp_ipi water.json
+```
+It is noted that multiple instances of the client is allow for computing, in parallel, the interactions of multiple replica of the path-integral MD.
+
+`water.json` is the parameter file for the client `dp_ipi`, and an example is provided:
+```json
+{
+    "verbose":		false,
+    "use_unix":		true,
+    "port":		31415,
+    "host":		"localhost",
+    "graph_file":	"graph.pb",
+    "coord_file":	"conf.xyz",
+    "atom_type" : {
+	"OW":		0, 
+	"HW1":		1,
+	"HW2":		1
+    }
+}
+```
+The option **`use_unix`** is set to `true` to activate the Unix domain socket, otherwise, the Internet socket is used.
+
+The option **`port`** should be the same as that in input.xml:
+```xml
+<port>31415</port>
+```
+
+The option **`graph_file`** provides the file name of the frozen model.
+
+The `dp_ipi` gets the atom names from an [XYZ file](https://en.wikipedia.org/wiki/XYZ_file_format) provided by **`coord_file`** (meanwhile ignores all coordinates in it), and translates the names to atom types by rules provided by **`atom_type`**.
\ No newline at end of file
diff --git a/doc/third-party/lammps-command.md b/doc/third-party/lammps-command.md
new file mode 100644
index 0000000000..805361120b
--- /dev/null
+++ b/doc/third-party/lammps-command.md
@@ -0,0 +1,98 @@
+# LAMMPS commands
+
+## Enable DeePMD-kit plugin (plugin mode)
+
+If you are using the plugin mode, enable DeePMD-kit package in LAMMPS with `plugin` command:
+
+```
+plugin load path/to/deepmd/lib/libdeepmd_lmp.so
+```
+
+The built-in mode doesn't need this step.
+
+## pair_style `deepmd`
+
+The DeePMD-kit package provides the pair_style `deepmd`
+
+```
+pair_style deepmd models ... keyword value ...
+```
+- deepmd = style of this pair_style
+- models = frozen model(s) to compute the interaction. If multiple models are provided, then the model deviation will be computed
+- keyword = *out_file* or *out_freq* or *fparam* or *atomic* or *relative*
+<pre>
+    <i>out_file</i> value = filename
+        filename = The file name for the model deviation output. Default is model_devi.out
+    <i>out_freq</i> value = freq
+        freq = Frequency for the model deviation output. Default is 100.
+    <i>fparam</i> value = parameters
+        parameters = one or more frame parameters required for model evaluation.
+    <i>atomic</i> = no value is required. 
+        If this keyword is set, the model deviation of each atom will be output.
+    <i>relative</i> value = level
+        level = The level parameter for computing the relative model deviation
+</pre>
+
+### Examples
+```
+pair_style deepmd graph.pb
+pair_style deepmd graph.pb fparam 1.2
+pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0
+```
+
+### Description
+Evaluate the interaction of the system by using [Deep Potential][DP] or [Deep Potential Smooth Edition][DP-SE]. It is noticed that deep potential is not a "pairwise" interaction, but a multi-body interaction. 
+
+This pair style takes the deep potential defined in a model file that usually has the .pb extension. The model can be trained and frozen by package [DeePMD-kit](https://github.com/deepmodeling/deepmd-kit).
+
+The model deviation evalulate the consistency of the force predictions from multiple models. By default, only the maximal, minimal and averge model deviations are output. If the key `atomic` is set, then the model deviation of force prediction of each atom will be output.
+
+By default, the model deviation is output in absolute value. If the keyword `relative` is set, then the relative model deviation will be output. The relative model deviation of the force on atom `i` is defined by
+```math
+           |Df_i|
+Ef_i = -------------
+       |f_i| + level
+```
+where `Df_i` is the absolute model deviation of the force on atom `i`, `|f_i|` is the norm of the the force and `level` is provided as the parameter of the keyword `relative`.
+
+### Restrictions
+- The `deepmd` pair style is provided in the USER-DEEPMD package, which is compiled from the DeePMD-kit, visit the [DeePMD-kit website](https://github.com/deepmodeling/deepmd-kit) for more information.
+
+
+## Compute tensorial properties
+
+The DeePMD-kit package provide the compute `deeptensor/atom` for computing atomic tensorial properties. 
+
+```
+compute ID group-ID deeptensor/atom model_file
+```
+- ID: user-assigned name of the computation
+- group-ID: ID of the group of atoms to compute
+- deeptensor/atom: the style of this compute
+- model_file: the name of the binary model file.
+
+### Examples
+```
+compute         dipole all deeptensor/atom dipole.pb
+```
+The result of the compute can be dump to trajctory file by 
+```
+dump            1 all custom 100 water.dump id type c_dipole[1] c_dipole[2] c_dipole[3] 
+```
+
+### Restrictions
+- The `deeptensor/atom` compute is provided in the USER-DEEPMD package, which is compiled from the DeePMD-kit, visit the [DeePMD-kit website](https://github.com/deepmodeling/deepmd-kit) for more information.
+
+
+## Long-range interaction
+The reciprocal space part of the long-range interaction can be calculated by LAMMPS command `kspace_style`. To use it with DeePMD-kit, one writes 
+```bash
+pair_style	deepmd graph.pb
+pair_coeff
+kspace_style	pppm 1.0e-5
+kspace_modify	gewald 0.45
+```
+Please notice that the DeePMD does nothing to the direct space part of the electrostatic interaction, because this part is assumed to be fitted in the DeePMD model (the direct space cut-off is thus the cut-off of the DeePMD model). The splitting parameter `gewald` is modified by the `kspace_modify` command.
+
+[DP]:https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.143001
+[DP-SE]:https://dl.acm.org/doi/10.5555/3327345.3327356
\ No newline at end of file
diff --git a/doc/third-party/lammps.md b/doc/third-party/lammps.md
new file mode 100644
index 0000000000..39da05d873
--- /dev/null
+++ b/doc/third-party/lammps.md
@@ -0,0 +1,9 @@
+# Running MD with LAMMPS
+
+Running an MD simulation with LAMMPS is simpler. In the LAMMPS input file, one needs to specify the pair style as follows
+
+```
+pair_style     deepmd graph.pb
+pair_coeff     * *
+```
+where `graph.pb` is the file name of the frozen model. It should be noted that LAMMPS counts atom types starting from 1, therefore, all LAMMPS atom type will be firstly subtracted by 1, and then passed into the DeePMD-kit engine to compute the interactions. 
diff --git a/doc/train-se-e2-a.md b/doc/train-se-e2-a.md
deleted file mode 100644
index c69b17a574..0000000000
--- a/doc/train-se-e2-a.md
+++ /dev/null
@@ -1,225 +0,0 @@
-# Train a Deep Potential model using descriptor `"se_e2_a"`
-
-The notation of `se_e2_a` is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from all information (both angular and radial) of atomic configurations. The `e2` stands for the embedding with two-atoms information. This descriptor was described in detail in [the DeepPot-SE paper](https://arxiv.org/abs/1805.09003).
-
-In this example we will train a DeepPot-SE model for a water system.  A complete training input script of this example can be find in the directory. 
-```bash
-$deepmd_source_dir/examples/water/se_e2_a/input.json
-```
-With the training input script, data (please read the [warning](#warning)) are also provided in the example directory. One may train the model with the DeePMD-kit from the directory.
-
-The contents of the example:
-- [The training input](#the-training-input-script)
-- [Train a Deep Potential model](#train-a-deep-potential-model)
-- [Warning](#warning)
-
-## The training input script
-
-A working training script using descriptor `se_e2_a` is provided as `input.json` in the same directory as this README.
-
-The `input.json` is divided in several sections, `model`, `learning_rate`, `loss` and `training`. 
-
-For more information, one can find the [a full documentation](https://deepmd.readthedocs.io/en/master/train-input.html) on the training input script.
-
-### Model
-The `model` defines how the model is constructed, for example
-```json=
-    "model": {
-	"type_map":	["O", "H"],
-	"descriptor" :{
-            ...
-	},
-	"fitting_net" : {
-            ...
-	}
-    }
-```
-We are looking for a model for water, so we have two types of atoms. The atom types are recorded as integers. In this example, we denote `0` for oxygen and `1` for hydrogen. A mapping from the atom type to their names is provided by `type_map`. 
-
-The model has two subsections `descritpor` and `fitting_net`, which defines the descriptor and the fitting net, respectively. The `type_map` is optional, which provides the element names (but not necessarily to be the element name) of the corresponding atom types.
-
-#### Descriptor
-The construction of the descriptor is given by section `descriptor`. An example of the descriptor is provided as follows
-```json=
-	"descriptor" :{
-	    "type":		"se_e2_a",
-	    "rcut_smth":	0.50,
-	    "rcut":		6.00,
-	    "sel":		[46, 92],
-	    "neuron":		[25, 50, 100],
-	    "type_one_side":	true,
-	    "axis_neuron":	16,
-	    "resnet_dt":	false,
-	    "seed":		1
-	}
-```
-* The `type` of the descriptor is set to `"se_e2_a"`. 
-* `rcut` is the cut-off radius for neighbor searching, and the `rcut_smth` gives where the smoothing starts. 
-* `sel` gives the maximum possible number of neighbors in the cut-off radius. It is a list, the length of which is the same as the number of atom types in the system, and `sel[i]` denote the maximum possible number of neighbors with type `i`. 
-* The `neuron` specifies the size of the embedding net. From left to right the members denote the sizes of each hidden layer from input end to the output end, respectively. If the outer layer is of twice size as the inner layer, then the inner layer is copied and concatenated, then a [ResNet architecture](https://arxiv.org/abs/1512.03385) is built between them.
-* If the option `type_one_side` is set to `true`, then descriptor will consider the types of neighbor atoms. Otherwise, both the types of centric and  neighbor atoms are considered.
-* The `axis_neuron` specifies the size of submatrix of the embedding matrix, the axis matrix as explained in the [DeepPot-SE paper](https://arxiv.org/abs/1805.09003) 
-* If the option `resnet_dt` is set `true`, then a timestep is used in the ResNet.
-* `seed` gives the random seed that is used to generate random numbers when initializing the model parameters.
-
-
-#### Fitting
-The construction of the fitting net is give by section `fitting_net`
-```json=
-	"fitting_net" : {
-	    "neuron":		[240, 240, 240],
-	    "resnet_dt":	true,
-	    "seed":		1
-	},
-```
-* `neuron` specifies the size of the fitting net. If two neighboring layers are of the same size, then a [ResNet architecture](https://arxiv.org/abs/1512.03385) is built between them. 
-* If the option `resnet_dt` is set `true`, then a timestep is used in the ResNet. 
-* `seed` gives the random seed that is used to generate random numbers when initializing the model parameters.
-
-### Learning rate
-
-The `learning_rate` section in `input.json` is given as follows
-```json=
-    "learning_rate" :{
-	"type":		"exp",
-	"start_lr":	0.001,
-	"stop_lr":	3.51e-8,
-	"decay_steps":	5000,
-	"_comment":	"that's all"
-    }
-```
-* `start_lr` gives the learning rate at the beginning of the training.
-* `stop_lr` gives the learning rate at the end of the training. It should be small enough to ensure that the network parameters satisfactorily converge. 
-* During the training, the learning rate decays exponentially from `start_lr` to `stop_lr` following the formula.
-    ```
-    lr(t) = start_lr * decay_rate ^ ( t / decay_steps )
-    ```
-    where `t` is the training step.
-       
-### Loss
-
-The loss function of DeePMD-kit is given by
-```
-loss = pref_e * loss_e + pref_f * loss_f + pref_v * loss_v
-```
-where `loss_e`, `loss_f` and `loss_v` denote the loss in energy, force and virial, respectively. `pref_e`, `pref_f` and `pref_v` give the prefactors of the energy, force and virial losses. The prefectors may not be a constant, rather it changes linearly with the learning rate. Taking the force prefactor for example, at training step `t`, it is given by
-```math
-pref_f(t) = start_pref_f * ( lr(t) / start_lr ) + limit_pref_f * ( 1 - lr(t) / start_lr )
-```
-where `lr(t)` denotes the learning rate at step `t`. `start_pref_f` and `limit_pref_f` specifies the `pref_f` at the start of the training and at the limit of `t -> inf`.
-
-The `loss` section in the `input.json` is 
-```json=
-    "loss" : {
-	"start_pref_e":	0.02,
-	"limit_pref_e":	1,
-	"start_pref_f":	1000,
-	"limit_pref_f":	1,
-	"start_pref_v":	0,
-	"limit_pref_v":	0
-    }
-```
-The options `start_pref_e`, `limit_pref_e`, `start_pref_f`, `limit_pref_f`, `start_pref_v` and `limit_pref_v` determine the start and limit prefactors of energy, force and virial, respectively.
-
-If one does not want to train with virial, then he/she may set the virial prefactors `start_pref_v` and `limit_pref_v` to 0.
-    
-### Training parameters
-
-Other training parameters are given in the `training` section.
-```json=
-    "training": {
- 	"training_data": {
-	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
-	    "batch_size":	"auto"
-	},
-	"validation_data":{
-	    "systems":		["../data_water/data_3"],
-	    "batch_size":	1,
-	    "numb_btch":	3
-	},
-
-	"numb_step":	1000000,
-	"seed":		1,
-	"disp_file":	"lcurve.out",
-	"disp_freq":	100,
-	"save_freq":	1000
-    }
-```
-The sections `"training_data"` and `"validation_data"` give the training dataset and validation dataset, respectively. Taking the training dataset for example, the keys are explained below:
-* `systems` provide paths of the training data systems. DeePMD-kit allows you to provide multiple systems. This key can be a `list` or a `str`.
-    * `list`: `systems` gives the training data systems.
-    * `str`: `systems` should be a valid path. DeePMD-kit will recursively search all data systems in this path.
-* At each training step, DeePMD-kit randomly pick `batch_size` frame(s) from one of the systems. The probability of using a system is by default in proportion to the number of batches in the system. More optional are available for automatically determining the probability of using systems. One can set the key `auto_prob` to
-    * `"prob_uniform"` all systems are used with the same probability.
-    * `"prob_sys_size"` the probability of using a system is in proportional to its size (number of frames).
-    * `"prob_sys_size; sidx_0:eidx_0:w_0; sidx_1:eidx_1:w_1;..."` the `list` of systems are divided into blocks. The block `i` has systems ranging from `sidx_i` to `eidx_i`. The probability of using a system from block `i` is in proportional to `w_i`. Within one block, the probability of using a system is in proportional to its size.
-* An example of using `"auto_prob"` is given as below. The probability of using `systems[2]` is 0.4, and the sum of the probabilities of using `systems[0]` and `systems[1]` is 0.6. If the number of frames in `systems[1]` is twice as `system[0]`, then the probability of using `system[1]` is 0.4 and that of `system[0]` is 0.2.
-```json=
- 	"training_data": {
-	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
-	    "auto_prob":	"prob_sys_size; 0:2:0.6; 2:3:0.4",
-	    "batch_size":	"auto"
-	}
-```
-* The probability of using systems can also be specified explicitly with key `"sys_prob"` that is a list having the length of the number of systems. For example
-```json=
- 	"training_data": {
-	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
-	    "sys_prob":	[0.5, 0.3, 0.2],
-	    "batch_size":	"auto:32"
-	}
-```
-* The key `batch_size` specifies the number of frames used to train or validate the model in a training step. It can be set to
-    * `list`: the length of which is the same as the `systems`. The batch size of each system is given by the elements of the list.
-    * `int`: all systems use the same batch size.
-    * `"auto"`: the same as `"auto:32"`, see `"auto:N"`
-    * `"auto:N"`: automatically determines the batch size so that the `batch_size` times the number of atoms in the system is no less than `N`.
-* The key `numb_batch` in `validate_data` gives the number of batches of model validation. Note that the batches may not be from the same system
-
-Other keys in the `training` section are explained below:
-* `numb_step` The number of training steps.
-* `seed` The random seed for getting frames from the training data set.
-* `disp_file` The file for printing learning curve.
-* `disp_freq` The frequency of printing learning curve. Set in the unit of training steps
-* `save_freq` The frequency of saving check point.
-
-
-## Train a Deep Potential model
-When the input script is prepared, one may start training by 
-```bash=
-dp train input.json
-```
-By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems worth special notice. 
-```bash=
-DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
-DEEPMD INFO    found 3 system(s):
-DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
-DEEPMD INFO                         ../data_water/data_0/     192       1      80  0.250    T
-DEEPMD INFO                         ../data_water/data_1/     192       1     160  0.500    T
-DEEPMD INFO                         ../data_water/data_2/     192       1      80  0.250    T
-DEEPMD INFO    --------------------------------------------------------------------------------------
-DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
-DEEPMD INFO    found 1 system(s):
-DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
-DEEPMD INFO                          ../data_water/data_3     192       1      80  1.000    T
-DEEPMD INFO    --------------------------------------------------------------------------------------
-```
-The DeePMD-kit prints detailed informaiton on the training and validation data sets. The data sets are defined by `"training_data"` and `"validation_data"` defined in the `"training"` section of the input script. The training data set is composed by three data systems, while the validation data set is composed by one data system. The number of atoms, batch size, number of batches in the system and the probability of using the system are all shown on the screen. The last column presents if the periodic boundary condition is assumed for the system. 
-
-During the training, the error of the model is tested every `disp_freq` training steps with the batch used to train the model and with `numb_btch` batches from the validating data. The training error and validation error are printed correspondingly in the file `disp_file`. The batch size can be set in the input script by the key `batch_size` in the corresponding sections for training and validation data set. An example of the output 
-```bash=
-#  step      rmse_val    rmse_trn    rmse_e_val  rmse_e_trn    rmse_f_val  rmse_f_trn         lr
-      0      3.33e+01    3.41e+01      1.03e+01    1.03e+01      8.39e-01    8.72e-01    1.0e-03
-    100      2.57e+01    2.56e+01      1.87e+00    1.88e+00      8.03e-01    8.02e-01    1.0e-03
-    200      2.45e+01    2.56e+01      2.26e-01    2.21e-01      7.73e-01    8.10e-01    1.0e-03
-    300      1.62e+01    1.66e+01      5.01e-02    4.46e-02      5.11e-01    5.26e-01    1.0e-03
-    400      1.36e+01    1.32e+01      1.07e-02    2.07e-03      4.29e-01    4.19e-01    1.0e-03
-    500      1.07e+01    1.05e+01      2.45e-03    4.11e-03      3.38e-01    3.31e-01    1.0e-03
-```
-The file contains 8 columns, form right to left, are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by number of atoms in the system.
-
-## Warning
-It is warned that the example water data (in folder `examples/water/data`) is of very limited amount, is provided only for testing purpose, and should not be used to train a productive model.
-
-
-
diff --git a/doc/train/gpu-limitations.md b/doc/train/gpu-limitations.md
new file mode 100644
index 0000000000..a28fe7d400
--- /dev/null
+++ b/doc/train/gpu-limitations.md
@@ -0,0 +1,6 @@
+# Known limitations of using GPUs
+If you use deepmd-kit in a GPU environment, the acceptable value range of some variables are additionally restricted compared to the CPU environment due to the software's GPU implementations: 
+1. The number of atom type of a given system must be less than 128.
+2. The maximum distance between an atom and it's neighbors must be less than 128. It can be controlled by setting the rcut value of training parameters.
+3. Theoretically, the maximum number of atoms that a single GPU can accept is about 10,000,000. However, this value is actually limited by the GPU memory size currently, usually within 1000,000 atoms even at the model compression mode.
+4. The total sel value of training parameters(in model/descriptor section) must be less than 4096.
\ No newline at end of file
diff --git a/doc/train/index.md b/doc/train/index.md
new file mode 100644
index 0000000000..b9086d8322
--- /dev/null
+++ b/doc/train/index.md
@@ -0,0 +1,8 @@
+# Training
+
+- [Training a model](training.md)
+- [Advanced options](training-advanced.md)
+- [Parallel training](parallel-training.md)
+- [TensorBoard Usage](tensorboard.md)
+- [Known limitations of using GPUs](gpu-limitations.md)
+- [Training Parameters](../train-input-auto.rst)
diff --git a/doc/train/index.rst b/doc/train/index.rst
new file mode 100644
index 0000000000..a3114dc844
--- /dev/null
+++ b/doc/train/index.rst
@@ -0,0 +1,12 @@
+Training
+========
+
+.. toctree::
+   :maxdepth: 1
+
+   training
+   training-advanced
+   train-input
+   parallel-training
+   tensorboard
+   gpu-limitations
\ No newline at end of file
diff --git a/doc/train/parallel-training.md b/doc/train/parallel-training.md
new file mode 100644
index 0000000000..609dc8721d
--- /dev/null
+++ b/doc/train/parallel-training.md
@@ -0,0 +1,43 @@
+# Parallel training
+
+Currently, parallel training is enabled in a sychoronized way with help of [Horovod](https://github.com/horovod/horovod). DeePMD-kit will decide parallel training or not according to MPI context. Thus, there is no difference in your json/yaml input file.
+
+Testing `examples/water/se_e2_a` on a 8-GPU host, linear acceleration can be observed with increasing number of cards.
+| Num of GPU cards | Seconds every 100 samples | Samples per second | Speed up |
+|  --  | -- | -- | -- |
+| 1  | 1.6116 | 62.05 | 1.00 |
+| 2  | 1.6310 | 61.31 | 1.98 |
+| 4  | 1.6168 | 61.85 | 3.99 |
+| 8  | 1.6212 | 61.68 | 7.95 |
+
+To experience this powerful feature, please intall Horovod and [mpi4py](https://github.com/mpi4py/mpi4py) first. For better performance on GPU, please follow tuning steps in [Horovod on GPU](https://github.com/horovod/horovod/blob/master/docs/gpus.rst).
+```bash
+# By default, MPI is used as communicator.
+HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod mpi4py
+```
+
+Horovod works in the data-parallel mode resulting a larger global batch size. For example, the real batch size is 8 when `batch_size` is set to 2 in the input file and you lauch 4 workers. Thus, `learning_rate` is automatically scaled by the number of workers for better convergence. Technical details of such heuristic rule are discussed at [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).
+
+With dependencies installed, have a quick try!
+```bash
+# Launch 4 processes on the same host
+CUDA_VISIBLE_DEVICES=4,5,6,7 horovodrun -np 4 \
+    dp train --mpi-log=workers input.json
+```
+
+Need to mention, environment variable `CUDA_VISIBLE_DEVICES` must be set to control parallelism on the occupied host where one process is bound to one GPU card.
+
+What's more, 2 command-line arguments are defined to control the logging behvaior.
+```
+optional arguments:
+  -l LOG_PATH, --log-path LOG_PATH
+                        set log file to log messages to disk, if not
+                        specified, the logs will only be output to console
+                        (default: None)
+  -m {master,collect,workers}, --mpi-log {master,collect,workers}
+                        Set the manner of logging when running with MPI.
+                        'master' logs only on main process, 'collect'
+                        broadcasts logs from workers to master and 'workers'
+                        means each process will output its own log (default:
+                        master)
+```
\ No newline at end of file
diff --git a/doc/tensorboard.md b/doc/train/tensorboard.md
similarity index 85%
rename from doc/tensorboard.md
rename to doc/train/tensorboard.md
index cf1d47d283..64f7cb344f 100644
--- a/doc/tensorboard.md
+++ b/doc/train/tensorboard.md
@@ -63,24 +63,24 @@ tensorboard --logdir path/to/logs
 
 ### Tracking and visualizing loss metrics(red:train, blue:test)
 
-![ALT](./images/l2_loss.png "l2 loss")
+![ALT](../images/l2_loss.png "l2 loss")
 
-![ALT](./images/l2_energy_loss.png "l2 energy loss")
+![ALT](../images/l2_energy_loss.png "l2 energy loss")
 
-![ALT](./images/l2_force_loss.png "l2 force loss")
+![ALT](../images/l2_force_loss.png "l2 force loss")
 
 ### Visualizing deepmd-kit model graph
 
-![ALT](./images/tensorboard-graph.png "deepmd-kit graph")
+![ALT](../images/tensorboard-graph.png "deepmd-kit graph")
 
 ### Viewing histograms of weights, biases, or other tensors as they change over time
 
-![ALT](./images/tensorboard-histograms.png "deepmd-kit histograms")
+![ALT](../images/tensorboard-histograms.png "deepmd-kit histograms")
 
-![ALT](./images/tensorboard-distribution.png "deepmd-kit distribution")
+![ALT](../images/tensorboard-distribution.png "deepmd-kit distribution")
 
 ### Viewing summaries of trainable variables
-![ALT](./images/tensorboard-scalar.png "deepmd-kit scalar")
+![ALT](../images/tensorboard-scalar.png "deepmd-kit scalar")
 
 ## Attention
 
diff --git a/doc/train-input.rst b/doc/train/train-input.rst
similarity index 61%
rename from doc/train-input.rst
rename to doc/train/train-input.rst
index 0a12239597..612cdf3cfa 100644
--- a/doc/train-input.rst
+++ b/doc/train/train-input.rst
@@ -1,3 +1,3 @@
 Training Parameters
 ======================================
-.. include:: train-input-auto.rst
+.. include:: ../train-input-auto.rst
diff --git a/doc/train/training-advanced.md b/doc/train/training-advanced.md
new file mode 100644
index 0000000000..b1e8b73e1c
--- /dev/null
+++ b/doc/train/training-advanced.md
@@ -0,0 +1,121 @@
+# Advanced options
+
+In this section, we will take `$deepmd_source_dir/examples/water/se_e2_a/input.json` as an example of the input file.
+
+## Learning rate
+
+The `learning_rate` section in `input.json` is given as follows
+```json
+    "learning_rate" :{
+	"type":		"exp",
+	"start_lr":	0.001,
+	"stop_lr":	3.51e-8,
+	"decay_steps":	5000,
+	"_comment":	"that's all"
+    }
+```
+* `start_lr` gives the learning rate at the beginning of the training.
+* `stop_lr` gives the learning rate at the end of the training. It should be small enough to ensure that the network parameters satisfactorily converge. 
+* During the training, the learning rate decays exponentially from `start_lr` to `stop_lr` following the formula.
+    ```
+    lr(t) = start_lr * decay_rate ^ ( t / decay_steps )
+    ```
+    where `t` is the training step.
+
+## Training parameters
+
+Other training parameters are given in the `training` section.
+```json
+    "training": {
+ 	"training_data": {
+	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
+	    "batch_size":	"auto"
+	},
+	"validation_data":{
+	    "systems":		["../data_water/data_3"],
+	    "batch_size":	1,
+	    "numb_btch":	3
+	},
+
+	"numb_step":	1000000,
+	"seed":		1,
+	"disp_file":	"lcurve.out",
+	"disp_freq":	100,
+	"save_freq":	1000
+    }
+```
+The sections `"training_data"` and `"validation_data"` give the training dataset and validation dataset, respectively. Taking the training dataset for example, the keys are explained below:
+* `systems` provide paths of the training data systems. DeePMD-kit allows you to provide multiple systems. This key can be a `list` or a `str`.
+    * `list`: `systems` gives the training data systems.
+    * `str`: `systems` should be a valid path. DeePMD-kit will recursively search all data systems in this path.
+* At each training step, DeePMD-kit randomly pick `batch_size` frame(s) from one of the systems. The probability of using a system is by default in proportion to the number of batches in the system. More optional are available for automatically determining the probability of using systems. One can set the key `auto_prob` to
+    * `"prob_uniform"` all systems are used with the same probability.
+    * `"prob_sys_size"` the probability of using a system is in proportional to its size (number of frames).
+    * `"prob_sys_size; sidx_0:eidx_0:w_0; sidx_1:eidx_1:w_1;..."` the `list` of systems are divided into blocks. The block `i` has systems ranging from `sidx_i` to `eidx_i`. The probability of using a system from block `i` is in proportional to `w_i`. Within one block, the probability of using a system is in proportional to its size.
+* An example of using `"auto_prob"` is given as below. The probability of using `systems[2]` is 0.4, and the sum of the probabilities of using `systems[0]` and `systems[1]` is 0.6. If the number of frames in `systems[1]` is twice as `system[0]`, then the probability of using `system[1]` is 0.4 and that of `system[0]` is 0.2.
+```json
+ 	"training_data": {
+	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
+	    "auto_prob":	"prob_sys_size; 0:2:0.6; 2:3:0.4",
+	    "batch_size":	"auto"
+	}
+```
+* The probability of using systems can also be specified explicitly with key `"sys_prob"` that is a list having the length of the number of systems. For example
+```json
+ 	"training_data": {
+	    "systems":		["../data_water/data_0/", "../data_water/data_1/", "../data_water/data_2/"],
+	    "sys_prob":	[0.5, 0.3, 0.2],
+	    "batch_size":	"auto:32"
+	}
+```
+* The key `batch_size` specifies the number of frames used to train or validate the model in a training step. It can be set to
+    * `list`: the length of which is the same as the `systems`. The batch size of each system is given by the elements of the list.
+    * `int`: all systems use the same batch size.
+    * `"auto"`: the same as `"auto:32"`, see `"auto:N"`
+    * `"auto:N"`: automatically determines the batch size so that the `batch_size` times the number of atoms in the system is no less than `N`.
+* The key `numb_batch` in `validate_data` gives the number of batches of model validation. Note that the batches may not be from the same system
+
+Other keys in the `training` section are explained below:
+* `numb_step` The number of training steps.
+* `seed` The random seed for getting frames from the training data set.
+* `disp_file` The file for printing learning curve.
+* `disp_freq` The frequency of printing learning curve. Set in the unit of training steps
+* `save_freq` The frequency of saving check point.
+
+## Options and environment variables
+
+Several command line options can be passed to `dp train`, which can be checked with
+```bash
+$ dp train --help
+```
+An explanation will be provided
+```
+positional arguments:
+  INPUT                 the input json database
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --init-model INIT_MODEL
+                        Initialize a model by the provided checkpoint
+  --restart RESTART     Restart the training from the provided checkpoint
+```
+
+**`--init-model model.ckpt`**, initializes the model training with an existing model that is stored in the checkpoint `model.ckpt`, the network architectures should match.
+
+**`--restart model.ckpt`**, continues the training from the checkpoint `model.ckpt`.
+
+On some resources limited machines, one may want to control the number of threads used by DeePMD-kit. This is achieved by three environmental variables: `OMP_NUM_THREADS`, `TF_INTRA_OP_PARALLELISM_THREADS` and `TF_INTER_OP_PARALLELISM_THREADS`. `OMP_NUM_THREADS` controls the multithreading of DeePMD-kit implemented operations. `TF_INTRA_OP_PARALLELISM_THREADS` and `TF_INTER_OP_PARALLELISM_THREADS` controls `intra_op_parallelism_threads` and `inter_op_parallelism_threads`, which are  Tensorflow configurations for multithreading. An explanation is found [here](https://stackoverflow.com/questions/41233635/meaning-of-inter-op-parallelism-threads-and-intra-op-parallelism-threads).
+
+For example if you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:
+```bash
+export OMP_NUM_THREADS=6
+export TF_INTRA_OP_PARALLELISM_THREADS=3
+export TF_INTER_OP_PARALLELISM_THREADS=2
+dp train input.json
+```
+
+One can set other environmental variables:
+
+| Environment variables | Allowed value          | Default value | Usage                      |
+| --------------------- | ---------------------- | ------------- | -------------------------- |
+| DP_INTERFACE_PREC     | `high`, `low`          | `high`        | Control high (double) or low (float) precision of training. |
\ No newline at end of file
diff --git a/doc/train/training.md b/doc/train/training.md
new file mode 100644
index 0000000000..b9c6bd4cb8
--- /dev/null
+++ b/doc/train/training.md
@@ -0,0 +1,62 @@
+# Training a model
+
+Several examples of training can be found at the `examples` directory:
+```bash
+$ cd $deepmd_source_dir/examples/water/se_e2_a/
+```
+
+After switching to that directory, the training can be invoked by
+```bash
+$ dp train input.json
+```
+where `input.json` is the name of the input script.
+
+By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems worth special notice. 
+```bash
+DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
+DEEPMD INFO    found 3 system(s):
+DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
+DEEPMD INFO                         ../data_water/data_0/     192       1      80  0.250    T
+DEEPMD INFO                         ../data_water/data_1/     192       1     160  0.500    T
+DEEPMD INFO                         ../data_water/data_2/     192       1      80  0.250    T
+DEEPMD INFO    --------------------------------------------------------------------------------------
+DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
+DEEPMD INFO    found 1 system(s):
+DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
+DEEPMD INFO                          ../data_water/data_3     192       1      80  1.000    T
+DEEPMD INFO    --------------------------------------------------------------------------------------
+```
+The DeePMD-kit prints detailed informaiton on the training and validation data sets. The data sets are defined by `"training_data"` and `"validation_data"` defined in the `"training"` section of the input script. The training data set is composed by three data systems, while the validation data set is composed by one data system. The number of atoms, batch size, number of batches in the system and the probability of using the system are all shown on the screen. The last column presents if the periodic boundary condition is assumed for the system. 
+
+During the training, the error of the model is tested every `disp_freq` training steps with the batch used to train the model and with `numb_btch` batches from the validating data. The training error and validation error are printed correspondingly in the file `disp_file` (default is `lcurve.out`). The batch size can be set in the input script by the key `batch_size` in the corresponding sections for training and validation data set. An example of the output 
+```bash
+#  step      rmse_val    rmse_trn    rmse_e_val  rmse_e_trn    rmse_f_val  rmse_f_trn         lr
+      0      3.33e+01    3.41e+01      1.03e+01    1.03e+01      8.39e-01    8.72e-01    1.0e-03
+    100      2.57e+01    2.56e+01      1.87e+00    1.88e+00      8.03e-01    8.02e-01    1.0e-03
+    200      2.45e+01    2.56e+01      2.26e-01    2.21e-01      7.73e-01    8.10e-01    1.0e-03
+    300      1.62e+01    1.66e+01      5.01e-02    4.46e-02      5.11e-01    5.26e-01    1.0e-03
+    400      1.36e+01    1.32e+01      1.07e-02    2.07e-03      4.29e-01    4.19e-01    1.0e-03
+    500      1.07e+01    1.05e+01      2.45e-03    4.11e-03      3.38e-01    3.31e-01    1.0e-03
+```
+The file contains 8 columns, form right to left, are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by number of atoms in the system. One can visualize this file by a simple Python script:
+
+```py
+import numpy as np
+import matplotlib.pyplot as plt
+
+data = np.genfromtxt("lcurve.out", names=True)
+for name in data.dtype.names[1:-1]:
+    plt.plot(data['step'], data[name], label=name)
+plt.legend()
+plt.xlabel('Step')
+plt.ylabel('Loss')
+plt.xscale('symlog')
+plt.yscale('symlog')
+plt.grid()
+plt.show()
+```
+
+Checkpoints will be written to files with prefix `save_ckpt` every `save_freq` training steps. 
+
+## Warning
+It is warned that the example water data (in folder `examples/water/data`) is of very limited amount, is provided only for testing purpose, and should not be used to train a productive model.
\ No newline at end of file
diff --git a/doc/troubleshooting/index.md b/doc/troubleshooting/index.md
index 1c7d642355..dcb8775501 100644
--- a/doc/troubleshooting/index.md
+++ b/doc/troubleshooting/index.md
@@ -1,16 +1,14 @@
 # FAQs
-In consequence of various differences of computers or systems, problems may occur. Some common circumstances are listed as follows. 
+
+In consequence of various differences of computers or systems, problems may occur. Some common circumstances are listed as follows.
 In addition, some frequently asked questions about parameters setting are listed as follows.
-If other unexpected problems occur, you're welcome to contact us for help.
+If other unexpected problems occur, you’re welcome to contact us for help.
 
-## Trouble shooting
+- [Model compatibility](model-compatability.md)
 - [Installation](installation.md)
 - [The temperature undulates violently during early stages of MD](md-energy-undulation.md)
 - [MD: cannot run LAMMPS after installing a new version of DeePMD-kit](md-version-compatibility.md)
-- [Model compatability](model-compatability.md)
-
-## Parameters setting
-- [How to tune Fitting/embedding-net size ?](howtoset_netsize.md)
-- [How to control the number of nodes used by a job ?](howtoset_num_nodes.md)
-- [Do we need to set rcut < half boxsize ?](howtoset_rcut.md)
-- [How to set sel ?](howtoset_sel.md)
+- [Do we need to set rcut < half boxsize?](howtoset-rcut.md)
+- [How to set sel?](howtoset-sel.md)
+- [How to control the number of nodes used by a job?](howtoset_num_nodes.md)
+- [How to tune Fitting/embedding-net size?](howtoset_netsize.md)
\ No newline at end of file
diff --git a/doc/troubleshooting/index.rst b/doc/troubleshooting/index.rst
new file mode 100644
index 0000000000..603309f302
--- /dev/null
+++ b/doc/troubleshooting/index.rst
@@ -0,0 +1,14 @@
+FAQs
+====
+In consequence of various differences of computers or systems, problems may occur. Some common circumstances are listed as follows. 
+In addition, some frequently asked questions about parameters setting are listed as follows.
+If other unexpected problems occur, you're welcome to contact us for help.
+
+.. _trouble:
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Trouble shooting
+   :glob:
+
+   ./*
\ No newline at end of file