Add docs for qlib.rl #1322

lwwang1995 · 2022-10-19T15:24:15Z

No description provided.

docs/component/rl.rst

you-n-g · 2022-10-20T08:38:53Z

docs/component/rl.rst

+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`
+    The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits. 2) ``SimpleSingleAssetOrderExecution``, which is built based on naive simulation logic.


is built based on naive simulation logic

a simplified trading simulator, which ignores a lot of details (e.g. trading limitations, rounding) but is quite fast.

you-n-g · 2022-10-20T08:39:45Z

docs/component/rl.rst

+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`
+    The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits. 2) ``SimpleSingleAssetOrderExecution``, which is built based on naive simulation logic.


already two implementations of Simulator

already two implementations of Simulator for single asset trading.

you-n-g · 2022-10-20T08:41:17Z

docs/component/rl.rst

+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`
+    The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits. 2) ``SimpleSingleAssetOrderExecution``, which is built based on naive simulation logic.


which is built based on Qlib's backtest toolkits

which is built based on Qlib's backtest toolkits and hence considers a lot of practical trading details but is slow.

you-n-g · 2022-10-20T08:45:57Z

docs/component/rl.rst

+
+Portfolio Construction
+------------
+Portfolio construction is a process of selecting securities optimally by taking a minimum risk to achieve maximum returns. With an RL-based solution, an agent allocates stocks at every time step by obtaining information for each stock and the market. The key is to develop of policy for building a portfolio and make the policy able to pick the optimal portfolio. 


RL-based portfolio construction learning will be released in the future.

you-n-g · 2022-10-20T08:48:39Z

docs/component/rl.rst

+------------
+As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Essentially, the goal of order execution is twofold: it not only requires to fulfill the whole order but also targets a more economical execution with maximizing profit gain (or minimizing capital loss). The order execution with only one order of liquidation or acquirement is called single-asset order execution.
+
+Considering stock investment always aim to pursue long-term maximized profits, is usually behaved in the form of a sequential process of continuously adjusting the asset portfolio, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and making the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. 


is usually behaved?

weird grammar

you-n-g · 2022-10-20T08:52:46Z

docs/component/rl.rst

+According to the order execution’s trait of sequential decision making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy through interacting with the market environment. 
+
+With QlibRL, the RL algorithm in the above scenarios can be easily implemented.
+


I think we can add an extra section for nested Portfolio Construction & Order Execution
and emphasize the difference from traditional methods.

you-n-g · 2022-10-20T08:54:13Z

docs/component/rl.rst

+
+Example
+============
+QlibRL provides a set of APIs for developers to further simplify their development. For example, if developers have already defined their simulator / interpreters / reward function / policy, they could launch the training pipeline by simply running:


I think we can link each part to the example instead of only introducing how to call the training API

you-n-g · 2022-10-22T09:41:59Z

docs/component/highfreq.rst

@@ -15,15 +15,17 @@ In order to support the joint backtest strategies in multiple levels, a correspo

 Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
 For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
-To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
+To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level. 


Please remove the extra useless blank.

you-n-g · 2022-10-22T09:44:00Z

docs/component/highfreq.rst

-The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm.
+The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm. 
+
+The optimization for the nested decision execution framework can be implemented with an RL-based method, which can be supported by `qlib.rl<https://github.com/microsoft/qlib/tree/main/examples/rl>`_.


I think the reference to the docs will be better than an example.
I think keeping the example will also be helpful

you-n-g · 2022-10-22T09:46:27Z

docs/component/rl.rst

@@ -79,7 +99,7 @@ QlibRL provides a set of APIs for developers to further simplify their developme
        policy=policy,  
        reward=PAPenaltyReward(),  
        vessel_kwargs={
-            "episode_per_iter": 100, 
+            "episode_per_iter": 100, 6


What does the 6 mean here?

you-n-g · 2022-10-24T04:18:32Z

docs/component/highfreq.rst


-The optimization for the nested decision execution framework can be implemented with an RL-based method, which can be supported by `qlib.rl<https://github.com/microsoft/qlib/tree/main/examples/rl>`_.
+The optimization for the nested decision execution framework can be implemented with the support of QlibRL. To know more about how to use the QlibRL, go to API Reference: `RL API <../reference/api.html#rl>`_. 


Reference to the RL docs will be better instead of RL API.

It has been fixed.

docs/component/rl.rst

Co-authored-by: you-n-g <[email protected]>

ultmaster · 2022-10-28T06:12:48Z

docs/component/rl.rst

+As demonstrated in the following figure, an RL system consists of four elements, 1)the agent 2) the environment the agent interacts with 3) the policy that the agent follows to take actions on the environment and 4)the reward signal from the environment to the agent. 
+In general, the agent can perceive and interpret its environment, take actions and learn through reward, to seek long-term and maximum overall reward to achieve an optimal solution.
+
+.. image:: ../_static/img/RL_framework.png


I think the image might be too small. Have you checked it in the rendered document?

ultmaster · 2022-10-28T06:13:37Z

docs/component/rl.rst

+Reinforcement Learning in Quantitative Trading
+========================================================================
+.. currentmodule:: qlib
+


Suggest adding a summary upfront to describe what kind of problem we intend to solve.

ultmaster · 2022-10-28T06:14:06Z

docs/component/rl.rst

+
+According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment. 
+
+With ``QlibRL``, the RL algorithm in the above scenarios can be easily implemented.


Is QlibRL a term?

ultmaster · 2022-10-28T06:14:47Z

docs/component/rl.rst

+``QlibRL`` makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ as an example, the optimization of order execution strategy and portfolio management strategies can interact with each other to maximize returns.
+
+
+Quick Start


Suggest putting quick start into another separate file. Otherwise the file would look too long.

ultmaster · 2022-10-28T06:15:10Z

docs/component/rl.rst

+        buy: ["current", "$close"]
+        sell: ["current", "$close"]
+    strategies: 
+    30min: 


I think the indent is wrong?

ultmaster · 2022-10-28T06:15:36Z

docs/component/rl.rst

+            data_dim: 6
+            data_ticks: 240
+            max_step: 8
+            processed_data_provider:


Suggest adding per line explanation for what each configuration means.

ultmaster · 2022-10-28T06:16:21Z

docs/component/rl.rst

+
+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`


Link to class reference with :class:`~qlib.rl.Simulator` .

ultmaster · 2022-10-28T06:17:41Z

docs/component/rl.rst

+
+    $ python qlib/rl/contrib/backtest.py --config_path backtest_config.yml
+
+In that case, `qlib.rl.order_execution.simulator_qlib.SingleAssetOrderExecution <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/simulator_qlib.py>`_ and `qlib.rl.order_execution.simulator_simple.SingleAssetOrderExecutionSimple <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/simulator_simple.py>`_ as examples for simulator, `StateInterpreter <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/interpreter.py>`_ and `ActionInterpreter <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/interpreter.py>`_ as examples for interpreter, and `qlib.rl.order_execution.reward.PAPenaltyReward <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/reward.py>`_ as an example for reward.


Use :class: to reference class.

ultmaster · 2022-10-28T06:18:05Z

docs/component/rl.rst

+============
+``Qlib`` provides a set of APIs for developers to further simplify their development such as base classes for Interpreter, Simulator, Reward.
+
+.. automodule:: qlib.rl


I imagine this will be very long. Put it into another file please.

matluster · 2022-11-07T03:13:24Z

docs/component/rl/api.rst

+
+``Qlib`` provides a set of APIs for developers to further simplify their development such as base classes for Interpreter, Simulator and Reward.
+
+.. autoclass:: qlib.rl.simulator.Simulator


Could we use automodule?

Have you checked the rendered results?

matluster · 2022-11-07T03:13:45Z

docs/component/rl/framework.rst

+
+As you may have noticed, a training vessel itself holds all the required components to build an EnvWrapper rather than holding an instance of EnvWrapper directly. This allows the training vessel to create duplicates of EnvWrapper dynamically when necessary (for example, under parallel training).
+
+With a training vessel, the trainer could finally launch the training pipeline by simple, Scikit-learn-like interfaces (i.e., `trainer.fit()`).


Use double backtick for inline code-block.

matluster · 2022-11-07T03:14:12Z

docs/component/rl/quickstart.rst

+
+QlibRL provides an example of an implementation of a single asset order execution task and the following is an example of the config file to train with QlibRL.
+
+.. code-block:: text


matluster · 2022-11-07T03:14:47Z

docs/component/rl/quickstart.rst

+
+.. code-block:: console
+
+    $ python qlib/rl/contrib/train_onpolicy.py --config_path train_config.yml


Use python -m qlib.rl.contrib.train_onpolicy. Otherwise users must clone qlib to run this.

matluster · 2022-11-07T03:18:19Z

docs/component/rl/quickstart.rst

+                kwargs: 
+                    lr: 1.0e-4
+                    # the path for the latest model in the training process
+                    weight_file: ./checkpoints/latest.pth


How do I download this?

latest.pth is generated during training so there is no need to download it. The comment has already talked about this, but maybe we could make it more clear. @lwwang1995

I suggest commenting out this line by default.

matluster · 2022-11-07T03:19:00Z

docs/component/rl.rst

@@ -0,0 +1,278 @@
+.. _rl:


This file can be deleted?

ultmaster · 2022-11-07T07:22:50Z

docs/component/rl/quickstart.rst

+
+.. code-block:: console
+
+    $ python -m qlib/rl/contrib/train_onpolicy.py --config_path train_config.yml


python -m qlib.rl.contrib.train_policy

ultmaster · 2022-11-07T07:23:00Z

examples/rl/README.md

@@ -49,7 +49,7 @@ After training, checkpoints will be stored under `checkpoints/`.
 ## Run backtest

 ```
-python ../../qlib/rl/contrib/backtest.py --config_path ./experiment_config/backtest/config.py
+python ../../qlib/rl/contrib/backtest.py --config_path ./experiment_config/backtest/config.yml


* Add docs for qlib.rl * Update docs for qlib.rl * Add homepage introduct to RL framework * Update index Link * Fix Icon * typo * Update catelog * Update docs for qlib.rl * Update docs for qlib.rl * Update figure * Update docs for qlib.rl * Update setup.py * FIx setup.py * Update docs and fix some typos * Fix the reference to RL docs * Update framework.svg * Update framework.svg * Update framework.svg * Update docs for qlibrl. * Update docs for qlibrl. * Update docs for Qlibrl. * Update docs for qlibrl. * Update docs for qlibrl. * Update docs for qlibrl. * Add new framework * Update jpg * Update framework.svg * Update framework.svg * Update Qlib framework and description * Update grammar * Update README.md * Update README.md * Update docs/component/rl.rst Co-authored-by: you-n-g <[email protected]> * Update docs/component/rl.rst Co-authored-by: you-n-g <[email protected]> * Update docs for qlib.rl * Change theme for docs. * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl. * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl Co-authored-by: Young <[email protected]> Co-authored-by: you-n-g <[email protected]>

lwwang1995 and others added 7 commits October 19, 2022 23:16

Add docs for qlib.rl

ddbbcc6

Update docs for qlib.rl

9dd860c

Add homepage introduct to RL framework

c7b68b1

Update index Link

d13262f

Fix Icon

8cacc96

typo

c3577e1

Merge remote-tracking branch 'origin/main' into HEAD

b139e7d

you-n-g reviewed Oct 20, 2022

View reviewed changes

you-n-g and others added 8 commits October 20, 2022 17:48

Update catelog

a0d3621

Update docs for qlib.rl

1a62af9

Update docs for qlib.rl

c9b2198

Update figure

5a75c9d

Update docs for qlib.rl

77fbb16

Update setup.py

5d2f21f

Merge remote-tracking branch 'origin/main' into HEAD

160f951

FIx setup.py

4b705a9

you-n-g reviewed Oct 22, 2022

View reviewed changes

Update docs and fix some typos

145414b

you-n-g reviewed Oct 24, 2022

View reviewed changes

lwwang1995 and others added 11 commits October 24, 2022 13:29

Fix the reference to RL docs

f215418

Update framework.svg

b688db7

Update framework.svg

5035af1

Update framework.svg

3b182e1

Update docs for qlibrl.

834c4f4

Update docs for qlibrl.

0b17397

Update docs for Qlibrl.

7bfc937

Update docs for qlibrl.

21b765d

Update docs for qlibrl.

1703492

Update docs for qlibrl.

4d73676

Add new framework

f7713e2

you-n-g and others added 7 commits October 28, 2022 00:14

Update jpg

a59f844

Update framework.svg

47667a7

Update framework.svg

129c1a8

Update Qlib framework and description

db543fc

Update grammar

34e2bc4

Update README.md

8d7df20

Update README.md

b3eec1c

you-n-g reviewed Oct 28, 2022

View reviewed changes

docs/component/rl.rst Outdated Show resolved Hide resolved

docs/component/rl.rst Outdated Show resolved Hide resolved

lwwang1995 and others added 2 commits October 28, 2022 11:23

Update docs/component/rl.rst

946177d

Co-authored-by: you-n-g <[email protected]>

Update docs/component/rl.rst

04a9b8f

Co-authored-by: you-n-g <[email protected]>

ultmaster reviewed Oct 28, 2022

View reviewed changes

lwwang1995 added 3 commits November 2, 2022 20:08

Update docs for qlib.rl

e248066

Change theme for docs.

6020b86

Update docs for qlib.rl

5db5ea9

matluster reviewed Nov 7, 2022

View reviewed changes

lwwang1995 added 3 commits November 7, 2022 11:47

Update docs for qlib.rl

7b84f49

Update docs for qlib.rl

c47a460

Update docs for qlib.rl.

cf3642d

ultmaster reviewed Nov 7, 2022

View reviewed changes

lwwang1995 added 3 commits November 8, 2022 20:00

Update docs for qlib.rl

d723685

Update docs for qlib.rl

6484cfa

Update docs for qlib.rl

0db199c

you-n-g merged commit e182124 into microsoft:main Nov 10, 2022

you-n-g added the documentation Improvements or additions to documentation label Dec 9, 2022

		According to the order execution’s trait of sequential decision making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy through interacting with the market environment.

		With QlibRL, the RL algorithm in the above scenarios can be easily implemented.


		The optimization for the nested decision execution framework can be implemented with an RL-based method, which can be supported by `qlib.rl<https://github.com/microsoft/qlib/tree/main/examples/rl>`_.
		The optimization for the nested decision execution framework can be implemented with the support of QlibRL. To know more about how to use the QlibRL, go to API Reference: `RL API <../reference/api.html#rl>`_.


		According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment.

		With ``QlibRL``, the RL algorithm in the above scenarios can be easily implemented.

		``QlibRL`` makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ as an example, the optimization of order execution strategy and portfolio management strategies can interact with each other to maximize returns.


		Quick Start


		In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:

		- `Simulator`


		$ python qlib/rl/contrib/backtest.py --config_path backtest_config.yml

		In that case, `qlib.rl.order_execution.simulator_qlib.SingleAssetOrderExecution <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/simulator_qlib.py>`_ and `qlib.rl.order_execution.simulator_simple.SingleAssetOrderExecutionSimple <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/simulator_simple.py>`_ as examples for simulator, `StateInterpreter <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/interpreter.py>`_ and `ActionInterpreter <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/interpreter.py>`_ as examples for interpreter, and `qlib.rl.order_execution.reward.PAPenaltyReward <https://github.com/microsoft/qlib/blob/main/qlib/rl/order_execution/reward.py>`_ as an example for reward.


		``Qlib`` provides a set of APIs for developers to further simplify their development such as base classes for Interpreter, Simulator and Reward.

		.. autoclass:: qlib.rl.simulator.Simulator


		As you may have noticed, a training vessel itself holds all the required components to build an EnvWrapper rather than holding an instance of EnvWrapper directly. This allows the training vessel to create duplicates of EnvWrapper dynamically when necessary (for example, under parallel training).

		With a training vessel, the trainer could finally launch the training pipeline by simple, Scikit-learn-like interfaces (i.e., `trainer.fit()`).


		QlibRL provides an example of an implementation of a single asset order execution task and the following is an example of the config file to train with QlibRL.

		.. code-block:: text


		.. code-block:: console

		$ python qlib/rl/contrib/train_onpolicy.py --config_path train_config.yml


		.. code-block:: console

		$ python -m qlib/rl/contrib/train_onpolicy.py --config_path train_config.yml

Add docs for qlib.rl #1322

Add docs for qlib.rl #1322

Conversation

lwwang1995 commented Oct 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment