Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hivemind does not write how to connect to it #16

Closed
drimeF0 opened this issue May 4, 2023 · 2 comments
Closed

Hivemind does not write how to connect to it #16

drimeF0 opened this issue May 4, 2023 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@drimeF0
Copy link

drimeF0 commented May 4, 2023

🐛 Bug

ValueError                                Traceback (most recent call last)
     46 train_loader = utils.data.DataLoader(dataset)
     47 
---> 48 trainer = Trainer(
     49   max_epochs=4,
     50   accelerator="auto",

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](https://localhost:8080/#) in _check_config_and_set_final_flags(self, strategy, accelerator, precision, plugins, sync_batchnorm)
    207 
    208         if strategy != "auto" and strategy not in self._registered_strategies and not isinstance(strategy, Strategy):
--> 209             raise ValueError(
    210                 f"You selected an invalid strategy name: `strategy={strategy!r}`."
    211                 " It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`."

ValueError: You selected an invalid strategy name: `strategy=<lightning_hivemind.strategy.HivemindStrategy object at 0x7f3d206726e0>`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai/

Code sample

import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning.pytorch as pl
from lightning_hivemind.strategy import HivemindStrategy

from pytorch_lightning import Trainer

import torch

# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))


# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        # it is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        # Logging to TensorBoard (if installed) by default
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)


dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)

trainer = Trainer(
  max_epochs=4,
  accelerator="auto",
  devices=1 if torch.cuda.is_available() else None,strategy=HivemindStrategy(target_batch_size=8192)
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

Environment

  • PyTorch Version (e.g., 1.0): 2.0.0+cu118
  • OS (e.g., Linux): Ubuntu 20.04.5 LTS
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: Python 3.10.11
  • CUDA/cuDNN version: CUDA Version: 12.0
  • GPU models and configuration: Tesla T4
@drimeF0 drimeF0 added bug Something isn't working help wanted Extra attention is needed labels May 4, 2023
@github-actions
Copy link

github-actions bot commented May 4, 2023

Hi! thanks for your contribution!, great first issue!

@drimeF0
Copy link
Author

drimeF0 commented May 4, 2023

i fixed the previous mistake by replacing

py from pytorch_lightning import Trainer

to

from lightning import Trainer

but now I get another error:

May 04 14:49:11.323 [ERROR] [go-libp2p-daemon/daemon.go:190] error accepting connection: accept unix /tmp/hivemind-p2pd-XYPwBtT8_Qw.sock: use of closed network connection

and in the console I don't get data about where to connect

изображение

@drimeF0 drimeF0 closed this as completed May 4, 2023
@drimeF0 drimeF0 reopened this May 4, 2023
@drimeF0 drimeF0 changed the title Trainer does not accept Hivemind strategy Hivemind does not write how to connect to it May 4, 2023
@drimeF0 drimeF0 closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant