Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Cleanup examples folder #01. #44067

Merged
merged 34 commits into from
Apr 2, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Mar 16, 2024

Cleanup examples folder no. 01:

  • Move some old-API only scripts into examples/_old_api_stack.
  • Create new sub directories and move scripts into these as appropriate:
    • evaluation
    • multi_agent_and_self_play
    • gpu_training
  • Left all old script names as-is, in case of old web-links pointing to these, but they error out now and provide information on where to find their new versions.
  • Some of the existing scripts were not just moved, but also altered to work with the new API stack. In particular, these include all multi-agent scripts as well as all evaluation-related scripts.
  • Adjust docs and BUILD accordingly.
  • A few (very minor) bug fixes in the ConnectorV2 API are included.

TODO (in follow-up PRs):

  • Continue emptying old scripts, translating them to the new API stack, and moving them to better, more comprehensive sub-directories.
  • Consistently move examples classes (as opposed to example scripts) into special subdirectories within examples, e.g. examples/rl_module/classes/ or examples/env/classes and leave the space in the direct sub-directories for entire scripts only (i.e. inside examples/env, there should be all the example scripts demo'ing env/env-runner/custom env stuff; only within the sub-dir classes should all the example envs go; same with models, rl_module, learner, etc..).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
…_on_new_api_stack_w_env_runner_and_connectorv2

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	rllib/algorithms/algorithm.py
#	rllib/utils/actor_manager.py
Signed-off-by: sven1977 <[email protected]>
…f some bug in the rlmodule specs.

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some more infos would be helpful here and there to give a user/developer the big picture and the why. Awesome example updates!

--test-env RLLIB_ENABLE_RL_MODULE=1
--test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1
depends_on: rllibbuild

- label: ":brain: rllib: data tests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's actually the meaning of brain here? Learning tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) It defines the little icon shown in buildkite
image

module_specs=(
self.rl_module_spec.module_specs
if isinstance(self.rl_module_spec, MultiAgentRLModuleSpec)
else set(self.policies)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can soon get rid of the policy/ies naming. This is still confusing in the module setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. We need to unify this soon and fully adapt to the new stack terminology. Some ideas:

  • Have user explicitly enable multi-agent (otherwise, will error if multi-agent components are used).
  • config.multi_agent(policies) should no longer be necessary (already kind of replaced by config.rl_module)
  • policy_mapping_fn -> agent_to_module_mapping_fn
  • etc..

),
data,
)
for column, column_data in data.copy().items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make for each connector either a doc string that tells us the input and output shape?

We could also create something like the RLModules have: get_input_specs, get_output_specs and then check modules (also user modules) in the pipeline if they "fit".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's add a comment about the bigger picture here: what do the recurrent modules expect and what do we feed them.

@@ -69,7 +69,7 @@ class AgentToModuleMapping(ConnectorV2):

# Create our connector piece.
connector = AgentToModuleMapping(
modules=["module0", "module1"],
module_specs={"module0", "module1"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be: {"module0": SingleAgentModuleSpec(....), "module1": ...}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both is possible. Sometimes, users don't specify the individual SingleAgentRLModuleSpec (RLlib then uses the algo's default ones), thus they also do NOT provide space/class/config information for individual modules. The connector needs to be ok with that and fall back to only having the IDs of the modules w/o any further information.

@@ -615,6 +615,7 @@ def foreach_batch_item_change_in_place(
func: Callable[[Any, int, AgentID, ModuleID], Any],
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a doc string that explains when to use it and how?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. You are right, this one is completely missing a docstring :|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring and thorough .. testcode::.

We define a custom evaluation method that does the following:
- It changes the corridor length of all environments used on the evaluation EnvRunners.
- It runs a defined number of episodes for evaluation purposes.
- It collects the metrics from those runs, summarizes these metrics and return them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo :) "return" -> "returns"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

func=lambda worker: (worker.sample(), worker.get_metrics())[1],
local_worker=False,
)
for metrics_per_worker in metrics_all_workers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show here maybe how to sort the metrics from different corridor lengths in the results dict? (e.g. such that they show up in different diagrams in wandb/tensorboard)

See: https://pettingzoo.farama.org/environments/sisl/waterworld/
for more details on the environment.

Note that this example is different from the old API stack scripts:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah here they are. Awesome!

`examples/centralized_critic.py` and `examples/centralized_critic_2.py` in the
sense that here, a true shared value function is used via the new
`MultiAgentRLModule` class as opposed to both of the old API stack scripts, which
do NOT use a single central value function, but 2: One for each policy learnt.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "learnt" -> "learned"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

.multi_agent(
policies=policies,
# Exact 1:1 mapping from AgentID to ModuleID.
policy_mapping_fn=(lambda aid, *args, **kwargs: aid),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that they all share in addition the policy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, this example is NOT done yet. I'll finish it as discussed above.
Good call! :)

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing first batch of comments to review a higher priority PR first.

doc/source/rllib/rllib-examples.rst Outdated Show resolved Hide resolved
doc/source/rllib/rllib-examples.rst Outdated Show resolved Hide resolved
doc/source/rllib/rllib-examples.rst Outdated Show resolved Hide resolved
doc/source/rllib/rllib-examples.rst Outdated Show resolved Hide resolved
rllib/algorithms/algorithm_config.py Outdated Show resolved Hide resolved
rllib/examples/_old_api_stack/sb2rllib_sb_example.py Outdated Show resolved Hide resolved
rllib/examples/_old_api_stack/two_trainer_workflow.py Outdated Show resolved Hide resolved
rllib/examples/_old_api_stack/two_trainer_workflow.py Outdated Show resolved Hide resolved
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I reviewed more than you intended. Sorry for the delay. Hope this helps.


ray.init(local_mode=args.local_mode)

# Simple environment with 4 independent cartpole entities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Simple environment with 4 independent cartpole entities
# Simple environment with 4 independent cartpole entities.

"""Example of customizing the evaluation procedure for an RLlib algorithm.

Note, that you should only choose to provide a custom eval function, in case the already
built-in eval options are not sufficient. Normally, though, RLlib's eval utilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
built-in eval options are not sufficient. Normally, though, RLlib's eval utilities
built-in eval options aren't sufficient. Normally, though, RLlib's eval utilities

This script uses the SimpleCorridor environment, a simple 1D gridworld, in which
the agent can only walk left (action=0) or right (action=1). The goal is at the end of
the (1D) corridor. The env exposes an API to change the length of the corridor
on-the-fly. We use this API here to extend the size of the corridor for the evaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
on-the-fly. We use this API here to extend the size of the corridor for the evaluation
on-the-fly. This API extends the size of the corridor for the evaluation

on-the-fly. We use this API here to extend the size of the corridor for the evaluation
runs.

We define a custom evaluation method that does the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We define a custom evaluation method that does the following:
A custom evaluation method does the following:

runs.

We define a custom evaluation method that does the following:
- It changes the corridor length of all environments used on the evaluation EnvRunners.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- It changes the corridor length of all environments used on the evaluation EnvRunners.
- It changes the corridor length of all environments RLlib uses on the evaluation EnvRunners.


For debugging, use the following additional command line options
`--no-tune --num-env-runners=0`
Which should allow you to set breakpoints anywhere in the RLlib code and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Which should allow you to set breakpoints anywhere in the RLlib code and
which should allow you to set breakpoints anywhere in the RLlib code and

@sven1977 sven1977 merged commit cb05540 into ray-project:master Apr 2, 2024
4 of 5 checks passed
@sven1977 sven1977 deleted the cleanup_examples_folder branch April 9, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants