Resolve some bugs #121

tchaton · 2024-05-07T07:44:48Z

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Optimize

from litdata import optimize
import torch

def fn(*_):
    return torch.zeros((1, 32))

optimize(
    fn,
    inputs=list(range(1000)),
    output_dir="output_dir",
    chunk_bytes="64MB"
)

Training

from lightning.pytorch import LightningModule, Trainer
from litdata import StreamingDataset
import torch
from torch import Tensor
from typing import Optional, Any
from torch.utils.data import DataLoader


class BoringModel(LightningModule):

    def __init__(self) -> None:
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x: Tensor) -> Tensor:
        return self.layer(x)

    def loss(self, preds: Tensor, labels: Optional[Tensor] = None) -> Tensor:
        if labels is None:
            labels = torch.ones_like(preds)
        # An arbitrary loss to have a loss that updates the model weights during `Trainer.fit` calls
        return torch.nn.functional.mse_loss(preds, labels)

    def step(self, batch: Any) -> Tensor:
        output = self(batch)
        return self.loss(output)

    def training_step(self, batch: Any, batch_idx: int):
        return {"loss": self.step(batch)}

    def validation_step(self, batch: Any, batch_idx: int):
        return {"x": self.step(batch)}

    def test_step(self, batch: Any, batch_idx: int):
        return {"y": self.step(batch)}

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.parameters(), lr=0.1)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]

    def train_dataloader(self) -> DataLoader:
        return DataLoader(StreamingDataset(input_dir="output_dir", shuffle=True))

    def val_dataloader(self) -> DataLoader:
        return DataLoader(StreamingDataset(input_dir="output_dir", shuffle=False))


model = BoringModel()
trainer = Trainer()
trainer.fit(model)

Using 4 T4 machine.

Fixes #70 #112

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

for more information, see https://pre-commit.ci

tchaton and others added 7 commits May 7, 2024 07:44

update

7ae4cc8

[pre-commit.ci] auto fixes from pre-commit.com hooks

66c8e2d

for more information, see https://pre-commit.ci

update

0e9a6b4

Merge branch 'main' into resolve_some_bugs

d157ef6

update

64ea6c4

update

8a10976

update

29ac238

tchaton marked this pull request as ready for review May 7, 2024 08:48

tchaton requested a review from awaelchli as a code owner May 7, 2024 08:48

update

8c92981

tchaton merged commit bc0366d into main May 7, 2024
32 checks passed

tchaton deleted the resolve_some_bugs branch May 7, 2024 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve some bugs #121

Resolve some bugs #121

tchaton commented May 7, 2024 •

edited

Loading

Resolve some bugs #121

Resolve some bugs #121

Conversation

tchaton commented May 7, 2024 • edited Loading

What does this PR do?

PR review

Did you have fun?

tchaton commented May 7, 2024 •

edited

Loading