Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when running R2D2 using CUDA #561

Closed
5 of 11 tasks
MarcoMeter opened this issue Dec 20, 2022 · 11 comments
Closed
5 of 11 tasks

TypeError when running R2D2 using CUDA #561

MarcoMeter opened this issue Dec 20, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@MarcoMeter
Copy link

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
v0.4.5 1.12.1 3.7.15 (default, Nov 24 2022, 21:12:53)
[GCC 11.2.0] linux

Log

/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:21: UserWarning: �[33mWARN: It seems a Box observation space is an image but the `dtype` is not `np.uint8`, actual type: float32. If the Box observation space is not an image, we recommend flattening the observation to have only a 1D vector.�[0m
  f"It seems a Box observation space is an image but the `dtype` is not `np.uint8`, actual type: {observation_space.dtype}. "
/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:26: UserWarning: �[33mWARN: It seems a Box observation space is an image but the upper and lower bounds are not in [0, 255]. Generally, CNN policies assume observations are within that range, so you may encounter an issue if the observation values are not.�[0m
  "It seems a Box observation space is an image but the upper and lower bounds are not in [0, 255]. "
Env Space Information:
	Observation Space: Box(0.0, 1.0, (3, 84, 84), float32)
	Action Space: Discrete(4)
	Reward Space: Box(-inf, inf, (1,), float32)
Evaluation: Train Iter(0)	Env Step(0)	Episode Return(0.031)
Traceback (most recent call last):
  File "r2d2_memorygym.py", line 166, in <module>
    main()
  File "r2d2_memorygym.py", line 162, in main
    task.run()
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 168, in run
    self.forward(fn)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 48, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 232, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 201, in forward
    g = self.forward(fn, ctx, async_mode=False)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 48, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 232, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/middleware/learner.py", line 55, in __call__
    self._trainer(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 201, in forward
    g = self.forward(fn, ctx, async_mode=False)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 48, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 232, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/middleware/functional/trainer.py", line 34, in _train
    train_output = policy.forward(ctx.train_data)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/policy/r2d2.py", line 260, in _forward_learn
    data = self._data_preprocess_learn(data)  # output datatype: Dict
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/policy/r2d2.py", line 193, in _data_preprocess_learn
    data = to_device(data, self._device)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/torch_utils/data_helper.py", line 54, in to_device
    raise TypeError("not support item type: {}".format(type(item)))
TypeError: not support item type: <class 'treetensor.torch.tensor.Tensor'>
srun: error: cgpu03-001: task 0: Exited with exit code 1

Steps to reproduce:
Run CartPole R2D2 example while setting cfg["policy"]["cuda"]=True.

@PaParaZz1 PaParaZz1 added the bug Something isn't working label Dec 21, 2022
@PaParaZz1
Copy link
Member

I have fixed this bug in above commit, you can test this demo again.

@MarcoMeter
Copy link
Author

Thanks for investigating this issue so quickly!

I'm running now into another (probably related) exception:

Traceback (most recent call last):
  File "r2d2_memorygym.py", line 166, in <module>
    main()
  File "r2d2_memorygym.py", line 162, in main
    task.run()
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 205, in run
    self.forward(fn)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 51, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 273, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 237, in forward
    g = self.forward(fn, ctx, async_mode=False)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 51, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 273, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/middleware/learner.py", line 60, in __call__
    self._trainer(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 237, in forward
    g = self.forward(fn, ctx, async_mode=False)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 51, in runtime_handler
    return func(task, *args, **kwargs)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/task.py", line 273, in forward
    g = fn(ctx)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/framework/middleware/functional/trainer.py", line 31, in _train
    train_output = policy.forward(ctx.train_data)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/policy/r2d2.py", line 260, in _forward_learn
    data = self._data_preprocess_learn(data)  # output datatype: Dict
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/policy/r2d2.py", line 193, in _data_preprocess_learn
    data = to_device(data, self._device)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/ding/torch_utils/data_helper.py", line 34, in to_device
    return item.to(device)
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/treevalue/tree/func/func.py", line 134, in _new_method
    _result = _treelized(self, *args, **kwargs)
  File "treevalue/tree/func/cfunc.pyx", line 176, in treevalue.tree.func.cfunc._w_func_treelize_run
  File "treevalue/tree/func/cfunc.pyx", line 151, in treevalue.tree.func.cfunc._c_func_treelize_run
  File "treevalue/tree/func/cfunc.pyx", line 75, in treevalue.tree.func.cfunc._c_func_treelize_run
  File "/work/mmarplei/anaconda3/envs/ding/lib/python3.7/site-packages/treetensor/torch/tensor.py", line 225, in to
    return stream_call(self.to, *args, **kwargs)
AttributeError: 'list' object has no attribute 'to'

@PaParaZz1
Copy link
Member

I fixed this problem when processing prev_state field in the training data of R2D2.

@MarcoMeter
Copy link
Author

Thanks for this fix! The Cartpole training runs now for some time, but eventually runs into a new exception:

INFO:root:Training: Train Iter(800)	Env Step(216384)	Loss(0.020)
INFO:root:Evaluation: Train Iter(810)	Env Step(217728)	Episode Return(14.125)
INFO:root:Evaluation: Train Iter(840)	Env Step(225792)	Episode Return(9.375)

[<ipython-input-5-ad3710b91483>](https://localhost:8080/#) in main()
     41         task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_))
     42         task.use(CkptSaver(cfg, policy, train_freq=100))
---> 43         task.run()
     44 
     45 

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in run(self, max_step)
    203         for i in range(max_step):
    204             for fn in self._middleware:
--> 205                 self.forward(fn)
    206             # Sync should be called before backward, otherwise it is possible
    207             # that some generators have not been pushed to backward_stack.

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in runtime_handler(task, async_mode, *args, **kwargs)
     49             return task
     50         else:
---> 51             return func(task, *args, **kwargs)
     52 
     53     return runtime_handler

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in forward(self, fn, ctx)
    271         if not ctx:
    272             ctx = self.ctx
--> 273         g = fn(ctx)
    274         if isinstance(g, GeneratorType):
    275             try:

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in forward(ctx)
    235                     g = self.forward(fn, ctx, async_mode=False)
    236             else:
--> 237                 g = self.forward(fn, ctx, async_mode=False)
    238 
    239             def backward():

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in runtime_handler(task, async_mode, *args, **kwargs)
     49             return task
     50         else:
---> 51             return func(task, *args, **kwargs)
     52 
     53     return runtime_handler

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in forward(self, fn, ctx)
    271         if not ctx:
    272             ctx = self.ctx
--> 273         g = fn(ctx)
    274         if isinstance(g, GeneratorType):
    275             try:

[/usr/local/lib/python3.8/dist-packages/ding/framework/middleware/learner.py](https://localhost:8080/#) in __call__(self, ctx)
     53         train_output_queue = []
     54         for _ in range(self.cfg.policy.learn.update_per_collect):
---> 55             self._fetcher(ctx)
     56             if ctx.train_data is None:
     57                 break

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in forward(ctx)
    235                     g = self.forward(fn, ctx, async_mode=False)
    236             else:
--> 237                 g = self.forward(fn, ctx, async_mode=False)
    238 
    239             def backward():

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in runtime_handler(task, async_mode, *args, **kwargs)
     49             return task
     50         else:
---> 51             return func(task, *args, **kwargs)
     52 
     53     return runtime_handler

[/usr/local/lib/python3.8/dist-packages/ding/framework/task.py](https://localhost:8080/#) in forward(self, fn, ctx)
    274         if isinstance(g, GeneratorType):
    275             try:
--> 276                 next(g)
    277                 self._backward_stack[id(g)] = g
    278                 return g

[/usr/local/lib/python3.8/dist-packages/ding/framework/middleware/functional/data_processor.py](https://localhost:8080/#) in _fetch(ctx)
    129             if isinstance(buffer_, Buffer):
    130                 if unroll_len > 1:
--> 131                     buffered_data = buffer_.sample(
    132                         cfg.policy.learn.batch_size, groupby="env", unroll_len=unroll_len, replace=True
    133                     )

[/usr/local/lib/python3.8/dist-packages/ding/data/buffer/buffer.py](https://localhost:8080/#) in handler(buffer, *args, **kwargs)
     32                 return func(func_name, chain, *args, **kwargs)
     33 
---> 34             return wrap_handler(buffer._middleware, *args, **kwargs)
     35 
     36         return handler

[/usr/local/lib/python3.8/dist-packages/ding/data/buffer/buffer.py](https://localhost:8080/#) in wrap_handler(middleware, *args, **kwargs)
     24             def wrap_handler(middleware, *args, **kwargs):
     25                 if len(middleware) == 0:
---> 26                     return base_func(buffer, *args, **kwargs)
     27 
     28                 def chain(*args, **kwargs):

[/usr/local/lib/python3.8/dist-packages/ding/data/buffer/deque_buffer.py](https://localhost:8080/#) in sample(self, size, indices, replace, sample_range, ignore_insufficient, groupby, unroll_len)
    142             sampled_data = [hashed_data[index] for index in indices]
    143         elif groupby:
--> 144             sampled_data = self._sample_by_group(
    145                 size=size, groupby=groupby, replace=replace, unroll_len=unroll_len, storage=storage
    146             )

[/usr/local/lib/python3.8/dist-packages/ding/data/buffer/deque_buffer.py](https://localhost:8080/#) in _sample_by_group(self, size, groupby, replace, unroll_len, storage)
    330         sampled_groups = []
    331         if replace:
--> 332             sampled_groups = random.choices(group_names, k=size)
    333         else:
    334             try:

[/usr/lib/python3.8/random.py](https://localhost:8080/#) in choices(self, population, weights, cum_weights, k)
    397                 _int = int
    398                 n += 0.0    # convert to float for a small speed improvement
--> 399                 return [population[_int(random() * n)] for i in _repeat(None, k)]
    400             cum_weights = list(_accumulate(weights))
    401         elif weights is not None:

[/usr/lib/python3.8/random.py](https://localhost:8080/#) in <listcomp>(.0)
    397                 _int = int
    398                 n += 0.0    # convert to float for a small speed improvement
--> 399                 return [population[_int(random() * n)] for i in _repeat(None, k)]
    400             cum_weights = list(_accumulate(weights))
    401         elif weights is not None:

IndexError: list index out of range

@PaParaZz1
Copy link
Member

Can you always reproduce this IndexError?

@MarcoMeter
Copy link
Author

This bug is not always reproducible.

@sailxjx
Copy link
Member

sailxjx commented Dec 23, 2022

This means that there is no group in the buffer.

Another question, why use group sample in cartpole training, can you provide your main file?

@MarcoMeter
Copy link
Author

MarcoMeter commented Dec 23, 2022

R2D2_CartPole_v1_0.zip

Here is the Jupyter Notebook that I'm running on Colab.

# !pip install git+https://github.com/opendilab/DI-engine.git@main#egg=DI-engine

import ding
import gym
from ditk import logging
from ding.model import DRQN
from ding.policy import R2D2Policy
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
from ding.data import DequeBuffer
from ding.config import compile_config
from ding.framework import task
from ding.framework.context import OnlineRLContext
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, \
    eps_greedy_handler, CkptSaver, nstep_reward_enhancer
from ding.utils import set_pkg_seed
from dizoo.classic_control.cartpole.config.cartpole_r2d2_config import main_config, create_config


def main():
    logging.getLogger().setLevel(logging.INFO)
    cfg = compile_config(main_config, create_cfg=create_config, auto=True)
    cfg["policy"]["cuda"] = True
    with task.start(async_mode=False, ctx=OnlineRLContext()):
        collector_env = BaseEnvManagerV2(
            env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.collector_env_num)],
            cfg=cfg.env.manager
        )
        evaluator_env = BaseEnvManagerV2(
            env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.evaluator_env_num)],
            cfg=cfg.env.manager
        )

        set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)

        model = DRQN(**cfg.policy.model)
        buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
        policy = R2D2Policy(cfg.policy, model=model)

        task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
        task.use(eps_greedy_handler(cfg))
        task.use(StepCollector(cfg, policy.collect_mode, collector_env))
        task.use(nstep_reward_enhancer(cfg))
        task.use(data_pusher(cfg, buffer_, group_by_env=True))
        task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_))
        task.use(CkptSaver(cfg, policy, train_freq=100))
        task.run()


if __name__ == "__main__":
    main()

@sailxjx
Copy link
Member

sailxjx commented Dec 23, 2022

unroll_len is set to 42 in the configuration of r2d2, which may be too large for some environments. So not enough samples are collected and stored in the buffer. When sampling, it will actively filter out the group with insufficient unroll_len, which may happen the exception above. The solution is reduce the value of unroll_len.

@PaParaZz1
Copy link
Member

@MarcoMeter Are you still dealing with this issue?

@MarcoMeter
Copy link
Author

I ran the CartPole training three times using an unroll_len of 20. The exception did not occur again. How can I determine an unroll_len that does not cause this exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants