Hivemind does not write how to connect to it #16

drimeF0 · 2023-05-04T14:48:01Z

🐛 Bug

ValueError                                Traceback (most recent call last)
     46 train_loader = utils.data.DataLoader(dataset)
     47 
---> 48 trainer = Trainer(
     49   max_epochs=4,
     50   accelerator="auto",

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](https://localhost:8080/#) in _check_config_and_set_final_flags(self, strategy, accelerator, precision, plugins, sync_batchnorm)
    207 
    208         if strategy != "auto" and strategy not in self._registered_strategies and not isinstance(strategy, Strategy):
--> 209             raise ValueError(
    210                 f"You selected an invalid strategy name: `strategy={strategy!r}`."
    211                 " It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`."

ValueError: You selected an invalid strategy name: `strategy=<lightning_hivemind.strategy.HivemindStrategy object at 0x7f3d206726e0>`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai/

Code sample

import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning.pytorch as pl
from lightning_hivemind.strategy import HivemindStrategy

from pytorch_lightning import Trainer

import torch

# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))


# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        # it is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        # Logging to TensorBoard (if installed) by default
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)


dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)

trainer = Trainer(
  max_epochs=4,
  accelerator="auto",
  devices=1 if torch.cuda.is_available() else None,strategy=HivemindStrategy(target_batch_size=8192)
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

Environment

PyTorch Version (e.g., 1.0): 2.0.0+cu118
OS (e.g., Linux): Ubuntu 20.04.5 LTS
How you installed PyTorch (conda, pip, source): pip
Python version: Python 3.10.11
CUDA/cuDNN version: CUDA Version: 12.0
GPU models and configuration: Tesla T4

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-04T14:48:40Z

Hi! thanks for your contribution!, great first issue!

drimeF0 · 2023-05-04T14:54:34Z

i fixed the previous mistake by replacing

py from pytorch_lightning import Trainer

to

from lightning import Trainer

but now I get another error:

May 04 14:49:11.323 [ERROR] [go-libp2p-daemon/daemon.go:190] error accepting connection: accept unix /tmp/hivemind-p2pd-XYPwBtT8_Qw.sock: use of closed network connection

and in the console I don't get data about where to connect

drimeF0 added bug Something isn't working help wanted Extra attention is needed labels May 4, 2023

drimeF0 closed this as completed May 4, 2023

drimeF0 reopened this May 4, 2023

drimeF0 changed the title ~~Trainer does not accept Hivemind strategy~~ Hivemind does not write how to connect to it May 4, 2023

drimeF0 closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hivemind does not write how to connect to it #16

Hivemind does not write how to connect to it #16

drimeF0 commented May 4, 2023

github-actions bot commented May 4, 2023

drimeF0 commented May 4, 2023

Hivemind does not write how to connect to it #16

Hivemind does not write how to connect to it #16

Comments

drimeF0 commented May 4, 2023

🐛 Bug

Code sample

Environment

github-actions bot commented May 4, 2023

drimeF0 commented May 4, 2023