ray-project · sven1977 · Apr 2, 2024 · Mar 6, 2024 · Mar 16, 2024 · Mar 16, 2024
diff --git a/.buildkite/rllib.rayci.yml b/.buildkite/rllib.rayci.yml
@@ -106,17 +106,6 @@ steps:
         --test-env=RLLIB_NUM_GPUS=1
     depends_on: rllibgpubuild
 
-  - label: ":brain: rllib: rlmodule tests"
-    tags: rllib_directly
-    instance_type: large
-    commands:
-      - bazel run //ci/ray_ci:test_in_docker -- //rllib/... rllib 
-        --parallelism-per-worker 3
-        --only-tags rlm
-        --test-env RLLIB_ENABLE_RL_MODULE=1
-        --test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1
-    depends_on: rllibbuild
-
   - label: ":brain: rllib: data tests"
     if: build.branch != "master"
     tags: data

@@ -303,7 +303,7 @@ pip install -U "ray[rllib]" tensorflow  # or torch
 ```
 ````
 
-```{literalinclude} ../../../rllib/examples/documentation/rllib_on_ray_readme.py
+```{literalinclude} ../rllib/doc_code/rllib_on_ray_readme.py
 :end-before: __quick_start_end__
 :language: python
 :start-after: __quick_start_begin__

@@ -29,11 +29,11 @@ For example, if you provide a custom `gym.Env <https://github.com/openai/gym>`_
 
 Here is a simple example:
 
-.. literalinclude:: ../../../../rllib/examples/documentation/custom_gym_env.py
+.. literalinclude:: ../doc_code/custom_gym_env.py
    :language: python
 
-..   start-after: __sphinx_doc_model_construct_1_begin__
-..   end-before: __sphinx_doc_model_construct_1_end__
+..   start-after: __rllib-custom-gym-env-begin__
+..   end-before: __rllib-custom-gym-env-end__
 
 However, you may also conveniently sub-class any of the other supported RLlib-specific
 environment types. The automated paths from those env types (or callables returning instances of those types) to

@@ -236,7 +236,7 @@ With connectors essentially checkpointing all the transformations used during tr
 policies can be easily restored without the original algorithm for local inference,
 as demonstrated by the following Cartpole example:
 
-.. literalinclude:: ../../../rllib/examples/connectors/v1/run_connector_policy.py
+.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/run_connector_policy.py
    :language: python
    :start-after: __sphinx_doc_begin__
    :end-before: __sphinx_doc_end__
@@ -255,7 +255,7 @@ different environments to work together at the same time.
 Here is an example demonstrating adaptation of a policy trained for the standard Cartpole environment
 for a new mock Cartpole environment that returns additional features and requires extra action inputs.
 
-.. literalinclude:: ../../../rllib/examples/connectors/v1/adapt_connector_policy.py
+.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/adapt_connector_policy.py
    :language: python
    :start-after: __sphinx_doc_begin__
    :end-before: __sphinx_doc_end__

@@ -14,23 +14,8 @@ Tuned Examples
 --------------
 
 - `Tuned examples <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__:
-   Collection of tuned hyperparameters by algorithm.
-- `MuJoCo and Atari benchmarks <https://github.com/ray-project/rl-experiments>`__:
-   Collection of reasonably optimized Atari and MuJoCo results.
+   Collection of tuned hyperparameters sorted by algorithm.
 
-Blog Posts
-----------
-
-- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
-   This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
-- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
-   A how-to on connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
-- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
-   Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
-- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
-   This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
-- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
-   Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.
 
 Environments and Adapters
 -------------------------
@@ -47,7 +32,7 @@ Environments and Adapters
 Custom- and Complex Models
 --------------------------
 
-- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_keras_model.py>`__:
+- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/custom_keras_model.py>`__:
    Example of using a custom Keras model.
 - `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_model_loss_and_metrics.py>`__:
    Example of defining and registering a custom model with a supervised loss.
@@ -83,9 +68,9 @@ Training Workflows
 
 Evaluation:
 -----------
-- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
+- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/custom_evaluation.py>`__:
    Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
-- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
+- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/evaluation_parallel_to_training.py>`__:
    Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.
 
 
@@ -113,23 +98,23 @@ Serving and Offline
 Multi-Agent and Hierarchical
 ----------------------------
 
-- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
+- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/independent_learning.py>`__:
    Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
-- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
+- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/parameter_sharing.py>`__:
    Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
-- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py>`__:
+- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/rock_paper_scissors.py>`__:
    Example of different heuristic and learned policies competing against each other in rock-paper-scissors.
-- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`__:
+- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/two_step_game.py>`__:
    Example of the two-step game from the `QMIX paper <https://arxiv.org/pdf/1803.11485.pdf>`__.
 - `PettingZoo multi-agent example <https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Ray/rllib_pistonball.py>`__:
    Example on how to use RLlib to learn in `PettingZoo <https://www.pettingzoo.ml>`__ multi-agent environments.
 - `PPO with centralized critic on two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`__:
    Example of customizing PPO to leverage a centralized value function.
 - `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
    A simpler method of implementing a centralized critic by augmentating agent observations with global information.
-- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
+- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/custom_heuristic_rl_module.py>`__:
    Example of running a custom hand-coded policy alongside trainable policies.
-- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
+- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/multi_agent_cartpole.py>`__:
    Example of how to define weight-sharing layers between two different policies.
 - `Multiple algorithms <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
    Example of alternating training between DQN and PPO.
@@ -140,11 +125,11 @@ Multi-Agent and Hierarchical
 Special Action- and Observation Spaces
 --------------------------------------
 
-- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py>`__:
+- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/connector_v2_nested_action_spaces.py>`__:
    Learning in arbitrarily nested action spaces.
-- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/parametric_actions_cartpole.py>`__:
+- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/parametric_actions_cartpole.py>`__:
    Example of how to handle variable-length or parametric action spaces.
-- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
+- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/complex_struct_space.py>`__:
    How to use RLlib's `Repeated` space to handle variable length observations.
 - `Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
    Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).
@@ -185,3 +170,18 @@ Community Examples
    Example of training in StarCraft2 maps with RLlib / multi-agent.
 - `Traffic Flow <https://berkeleyflow.readthedocs.io/en/latest/flow_setup.html>`__:
    Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.
+
+
+Blog Posts
+----------
+
+- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
+   Blog describing RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
+- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
+   How-To guide about connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
+- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
+   Discussion on how the Ray Team ported 12 of RLlib's algorithms from TensorFlow to PyTorch and the lessons learned.
+- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
+   Blog post of a brief tutorial on multi-agent RL and its design in RLlib.
+- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
+   Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.
@@ -71,7 +71,7 @@ Here are three ways of specifying a type:
 .. dropdown:: **Changing a replay buffer configuration**
     :animate: fade-in-slide-down
 
-    .. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
+    .. literalinclude:: doc_code/replay_buffer_demo.py
         :language: python
         :start-after: __sphinx_doc_replay_buffer_type_specification__begin__
         :end-before: __sphinx_doc_replay_buffer_type_specification__end__
@@ -102,7 +102,7 @@ Advanced buffer types add functionality while trying to retain compatibility thr
 The following is an example of the most basic scheme of interaction with a :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.
 
 
-.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
+.. literalinclude:: doc_code/replay_buffer_demo.py
     :language: python
     :start-after: __sphinx_doc_replay_buffer_basic_interaction__begin__
     :end-before: __sphinx_doc_replay_buffer_basic_interaction__end__
@@ -113,7 +113,7 @@ Building your own ReplayBuffer
 
 Here is an example of how to implement your own toy example of a ReplayBuffer class and make SimpleQ use it:
 
-.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
+.. literalinclude:: doc_code/replay_buffer_demo.py
     :language: python
     :start-after: __sphinx_doc_replay_buffer_own_buffer__begin__
     :end-before: __sphinx_doc_replay_buffer_own_buffer__end__
@@ -132,7 +132,7 @@ When later calling the ``sample()`` method, num_items will relate to said storag
 
 Here is a full example of how to modify the storage_unit and interact with a custom buffer:
 
-.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
+.. literalinclude:: doc_code/replay_buffer_demo.py
     :language: python
     :start-after: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__begin__
     :end-before: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__end__
@@ -145,7 +145,7 @@ the same way as the parent's config.
 Here is an example of how to create an :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` with an alternative underlying :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.
 The :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` can stay the same. We only need to specify our own buffer along with a default call argument:
 
-.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
+.. literalinclude:: doc_code/replay_buffer_demo.py
     :language: python
     :start-after: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__begin__
     :end-before: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__end__