Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #1198

Closed
Killpit opened this issue May 26, 2023 · 8 comments · Fixed by #1472

Comments

@Killpit
Copy link

Killpit commented May 26, 2023

I tried following the RL tutorial from here. Even though it mentioned cuda, I use MPS and when I tried to apply the same code with MPS logic, I couldn't make it work, but it works perfectly without has_mps and using cuda.


device = "cpu" if not torch.has_mps else "mps:0"
num_cells = 256  #number of cells in each layer
lr = 3e-4
max_grad_norm = 1.0

frame_skip = 1
frames_per_batch = 1000 // frame_skip
#For a complete training, bring the number of frames up to 1M
total_frames = 50_000 // frame_skip

sub_batch_size = 64  #cardinality of the sub-samples gathered from the current
#data in the inner loop
num_epochs = 10  #optimization steps per batch of data collected
clip_epsilon = (0.2)  #clip value for PPO loss
gamma = 0.99
lmbda = 0.95
entropy_eps = 1e-4

base_env = GymEnv("InvertedDoublePendulum-v4", device=device, frame_skip=frame_skip)

env = TransformedEnv(
    base_env,
    Compose(
        #normalize observations
        ObservationNorm(in_keys=["observation"]),
        DoubleToFloat(in_keys=["observation"]),
        StepCounter(),
    ),
)

env.transform[0].init_stats(num_iter=1000, reduce_dim=0, cat_dim=0)

print("normalization constant shape:", env.transform[0].loc.shape)

@vmoens
Copy link
Contributor

vmoens commented May 27, 2023

We currently have limited coverage over MPS but it's a good point.
Let me see how we can make sure that we support that too!
Here it seems that the error occurs when we convert a float64 tensor from cpu to MPS when reading it from gym. Without the full error stack it's hard to really say what's going on but perhaps changing the corresponding env spec to have a type float32 instead of float64 could solve it (my guess is that this occurs during a call to env.observation_spec.encode?)

@Killpit
Copy link
Author

Killpit commented May 28, 2023

This is the full error

Traceback (most recent call last):
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/torch_rl.py", line 38, in
base_env = GymEnv("InvertedDoublePendulum-v4", device=device, frame_skip=frame_skip)
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/venv/lib/python3.10/site-packages/torchrl/envs/libs/gym.py", line 589, in init
super().init(**kwargs)
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/venv/lib/python3.10/site-packages/torchrl/envs/libs/gym.py", line 373, in init
super().init(**kwargs)
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/venv/lib/python3.10/site-packages/torchrl/envs/common.py", line 967, in init
self._make_specs(self._env) # writes the self._env attribute
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/venv/lib/python3.10/site-packages/torchrl/envs/libs/gym.py", line 522, in _make_specs
observation_spec = _gym_to_torchrl_spec_transform(
File "/Users/atatekeli/PycharmProjects/PyTorchProjects/venv/lib/python3.10/site-packages/torchrl/envs/libs/gym.py", line 226, in _gym_to_torchrl_spec_transform
low = torch.tensor(spec.low, device=device, dtype=dtype)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

@vmoens
Copy link
Contributor

vmoens commented May 30, 2023

I see
I believe the proper way to go around this would be to provide you with 2 transforms:
One that maps double to float (we already have that, DoubleToFloatTransform) and another that casts the content on MPS.
The second one should not be too difficult to come up with.
I'll get this done soon, stay tuned

@jonahclarsen
Copy link

@vmoens Any update?

@vmoens vmoens linked a pull request Aug 30, 2023 that will close this issue
@vmoens
Copy link
Contributor

vmoens commented Aug 30, 2023

This should be fixed:
Add a DoubleToFloat transform after creating the env on cpu and then map the data with DeviceCastTransform to the MPS device.
Feel free to reopen if needed

@EkaterinaAbramova
Copy link

This should be fixed:
Add a DoubleToFloat transform after creating the env on cpu and then map the data with DeviceCastTransform to the MPS device.
Feel free to reopen if needed

Could you please kindly provide a working example? I have the MPS float64 error when running base_env = GymEnv("InvertedDoublePendulum-v4", device=device, frame_skip=frame_skip)? Thank you so much!!

@EkaterinaAbramova
Copy link

Hi Vmoens, I am back to trying to resolve this issue please. I'd appreciate if you could please make a tutorial such that it contains python code that will work on MPS please. It is not clear at all what is the workaround as I am still getting errors unfortunately. I have run this code:

import torch
print(torch.__version__) # gpu acceleration available in this version
print(torch.backends.mps.is_available()) #the MacOS is higher than 12.3+
print(torch.backends.mps.is_built()) #MPS is activated

import torchrl
print(torchrl.__version__)

import tensordict
print(tensordict.__version__)
'''
TensorDict is like a Python dictionary with some extra tensor features. 
Many modules need to be told what key to read (in_keys) and what key to write (out_keys) in the tensordict they will receive. 
Usually, if out_keys is omitted, it is assumed that the in_keys entries will be updated in-place. 
'''

from collections import defaultdict

import matplotlib.pyplot as plt

from tqdm import tqdm

from tensordict.nn import TensorDictModule
from tensordict.nn.distributions import NormalParamExtractor

from torch import nn

from torchrl.collectors import SyncDataCollector

from torchrl.data.replay_buffers import ReplayBuffer
from torchrl.data.replay_buffers.samplers import SamplerWithoutReplacement
from torchrl.data.replay_buffers.storages import LazyTensorStorage

from torchrl.envs import (Compose, DoubleToFloat, ObservationNorm, StepCounter, TransformedEnv)
from torchrl.envs.libs.gym import GymEnv
from torchrl.envs.libs.gym import set_gym_backend
from torchrl.envs.utils import check_env_specs, set_exploration_mode
from torchrl.envs.transforms import DoubleToFloat, DeviceCastTransform

from torchrl.modules import ProbabilisticActor, TanhNormal, ValueOperator

from torchrl.objectives import ClipPPOLoss
from torchrl.objectives.value import GAE
 
device = "mps" 
frame_skip = 1 # action to be executed in current time-step only

with set_gym_backend("gym"):
    base_env = GymEnv("InvertedDoublePendulum-v4", device="cpu", frame_skip=frame_skip)

env = TransformedEnv(
    base_env,
    Compose(
        ObservationNorm(in_keys=["observation"]), # normalise observations (make it about Standard Normal)
        DoubleToFloat(),   
        StepCounter(),                            # count the number of steps before the environment is terminated
        DeviceCastTransform(device=device, orig_device="cpu"),
    ),
)

env.transform[0].init_stats(num_iter=1000, reduce_dim=0, cat_dim=0) 

check_env_specs(env)

And I get an error during the check:

check_env_specs(env)
Traceback (most recent call last):

  Cell In[10], line 1
    check_env_specs(env)

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/utils.py:435 in check_env_specs
    real_tensordict = env.rollout(3, return_contiguous=return_contiguous)

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/common.py:1797 in rollout
    tensordict = self.reset()

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/common.py:1480 in reset
    tensordict_reset = self._reset(tensordict, **kwargs)

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/transforms/transforms.py:760 in _reset
    tensordict_reset = self.transform._reset(tensordict, tensordict_reset)

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/transforms/transforms.py:1020 in _reset
    tensordict_reset = t._reset(tensordict, tensordict_reset)

  File ~/anaconda3/envs/gpu-torchrl-latest/lib/python3.10/site-packages/torchrl/envs/transforms/transforms.py:5077 in _reset
    step_count = torch.where(~expand_as_right(reset, step_count), step_count, 0)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices.

Please could you resolve the issue?

@vmoens
Copy link
Contributor

vmoens commented Jan 22, 2024

#1827 should fix this issue!
Do not hesitate to open a separate issue next time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants