Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[App] Fix bug when using structures with works #15911

Merged
merged 16 commits into from
Dec 8, 2022

Conversation

ethanwharris
Copy link
Member

@ethanwharris ethanwharris commented Dec 5, 2022

What does this PR do?

There is a bug where works in structures can cause failures and an app retry loop when running in the cloud.

E.g. this app fails with the latest release:

# app.py
# !curl https://raw.githubusercontent.com/Lightning-AI/lightning/master/examples/app_multi_node/pl_boring_script.py -o pl_boring_script.py
import lightning as L
from lightning.app.components.training import LightningTrainerScript

# run script that trains PyTorch with the Lightning Trainer
model_script = 'pl_boring_script.py'
component = LightningTrainerScript(
   model_script,
   num_nodes=1,
   cloud_compute=L.CloudCompute("gpu")
)
app = L.LightningApp(component)

The reason for the failure is that the LightningTrainerScript component populates the structure on init, before a backend is attached. As a result, the works get added without being connected to the flow, this triggers a validation error when work.run is called.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@ethanwharris ethanwharris added bug Something isn't working app (removed) Generic label for Lightning App package app:structures labels Dec 5, 2022
@ethanwharris ethanwharris added this to the v1.8.x milestone Dec 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-11, pytorch, 3.8, 1.11) success
pl-cpu (macOS-11, pytorch, 3.9, 1.12) success
pl-cpu (macOS-11, pytorch, 3.10, 1.13) success
pl-cpu (macOS-11, pytorch, 3.8, 1.10, oldest) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.10) success
pl-cpu (ubuntu-20.04, pytorch, 3.9, 1.11) success
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.12) success
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.13) success
pl-cpu (ubuntu-20.04, pytorch, 3.7, 1.10, oldest) success
pl-cpu (windows-2022, pytorch, 3.9, 1.11) success
pl-cpu (windows-2022, pytorch, 3.10, 1.12) success
pl-cpu (windows-2022, pytorch, 3.10, 1.13) success
pl-cpu (windows-2022, pytorch, 3.7, 1.10, oldest) success
pl-cpu (slow, macOS-11, pytorch, 3.7, 1.11) success
pl-cpu (slow, ubuntu-20.04, pytorch, 3.7, 1.11) success
pl-cpu (slow, windows-2022, pytorch, 3.7, 1.11) success
pl-cpu (macOS-11, lightning, 3.8, 1.13) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.13) success
pl-cpu (windows-2022, lightning, 3.8, 1.13) success

These checks are required after the changes to requirements/lite/base.txt, requirements/lite/examples.txt, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, requirements/lite/base.txt, requirements/lite/examples.txt.

🟢 pytorch_lightning: Benchmarks
Check ID Status
pytorch-lightning.Benchmark success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Azure HPU
Check ID Status
pytorch-lightning (HPUs) success

These checks are required after the changes to requirements/lite/base.txt, requirements/lite/examples.txt, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Azure IPU
Check ID Status
pytorch-lightning (IPUs) success

These checks are required after the changes to requirements/lite/base.txt, requirements/lite/examples.txt, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Docs
Check ID Status
make-doctest (pytorch) success
make-html (pytorch) success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Docker
Check ID Status
build-cuda (3.9, 1.10, 11.3.1) success
build-cuda (3.9, 1.11, 11.3.1) success
build-cuda (3.9, 1.12, 11.6.1) success
build-cuda (3.9, 1.13, 11.6.1) success
build-hpu (1.5.0, 1.11.0) success
build-ipu (3.9, 1.10) success
build-NGC success
build-pl (3.9, 1.10, 11.3.1) success
build-pl (3.9, 1.11, 11.3.1) success
build-pl (3.9, 1.12, 11.6.1) success
build-pl (3.9, 1.13, 11.6.1) success
build-xla (3.7, 1.12) success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, requirements/lite/base.txt, requirements/lite/examples.txt.

🟢 lightning_lite: CPU workflow
Check ID Status
lite-cpu (macOS-11, lite, 3.8, 1.11) success
lite-cpu (macOS-11, lite, 3.9, 1.12) success
lite-cpu (macOS-11, lite, 3.10, 1.13) success
lite-cpu (macOS-11, lite, 3.7, 1.10, oldest) success
lite-cpu (ubuntu-20.04, lite, 3.8, 1.10) success
lite-cpu (ubuntu-20.04, lite, 3.9, 1.11) success
lite-cpu (ubuntu-20.04, lite, 3.10, 1.12) success
lite-cpu (ubuntu-20.04, lite, 3.10, 1.13) success
lite-cpu (ubuntu-20.04, lite, 3.7, 1.10, oldest) success
lite-cpu (windows-2022, lite, 3.9, 1.11) success
lite-cpu (windows-2022, lite, 3.10, 1.12) success
lite-cpu (windows-2022, lite, 3.10, 1.13) success
lite-cpu (windows-2022, lite, 3.7, 1.10, oldest) success
lite-cpu (macOS-11, lightning, 3.8, 1.13) success
lite-cpu (ubuntu-20.04, lightning, 3.8, 1.13) success
lite-cpu (windows-2022, lightning, 3.8, 1.13) success

These checks are required after the changes to requirements/lite/base.txt, requirements/lite/examples.txt.

🟢 lightning_lite: Azure GPU
Check ID Status
lightning-lite (GPUs) success

These checks are required after the changes to requirements/lite/base.txt, requirements/lite/examples.txt.

🟢 lightning_app: Tests workflow
Check ID Status
app-pytest (macOS-11, app, 3.8, latest) success
app-pytest (macOS-11, app, 3.8, oldest) success
app-pytest (macOS-11, lightning, 3.9, latest) success
app-pytest (ubuntu-20.04, app, 3.8, latest) success
app-pytest (ubuntu-20.04, app, 3.8, oldest) success
app-pytest (ubuntu-20.04, lightning, 3.9, latest) success
app-pytest (windows-2022, app, 3.8, latest) success
app-pytest (windows-2022, app, 3.8, oldest) success
app-pytest (windows-2022, lightning, 3.8, latest) success

These checks are required after the changes to src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py, tests/tests_app/structures/test_structures.py, requirements/app/base.txt.

🟢 lightning_app: Examples
Check ID Status
app-examples (macOS-11, app, 3.9, latest) success
app-examples (macOS-11, app, 3.9, oldest) success
app-examples (macOS-11, lightning, 3.9, latest) success
app-examples (ubuntu-20.04, app, 3.9, latest) success
app-examples (ubuntu-20.04, app, 3.9, oldest) success
app-examples (ubuntu-20.04, lightning, 3.9, latest) success
app-examples (windows-2022, app, 3.9, latest) success
app-examples (windows-2022, app, 3.9, oldest) success
app-examples (windows-2022, lightning, 3.9, latest) success

These checks are required after the changes to src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py, requirements/app/base.txt.

🟢 lightning_app: Azure
Check ID Status
App.cloud-e2e success

These checks are required after the changes to src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py, requirements/app/base.txt.

🟢 lightning_app: Docs
Check ID Status
make-doctest (app) success
make-html (app) success

These checks are required after the changes to src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py, requirements/app/base.txt.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to requirements/app/base.txt, requirements/lite/base.txt, requirements/lite/examples.txt, requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.7) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, lite, 3.7) success
install-pkg (ubuntu-22.04, lite, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.7) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.7) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (macOS-12, app, 3.7) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, lite, 3.7) success
install-pkg (macOS-12, lite, 3.10) success
install-pkg (macOS-12, pytorch, 3.7) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.7) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (windows-2022, app, 3.7) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, lite, 3.7) success
install-pkg (windows-2022, lite, 3.10) success
install-pkg (windows-2022, pytorch, 3.7) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.7) success
install-pkg (windows-2022, lightning, 3.10) success

These checks are required after the changes to src/lightning_app/structures/dict.py, src/lightning_app/structures/list.py, requirements/app/base.txt, requirements/lite/base.txt, requirements/lite/examples.txt, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@ethanwharris ethanwharris changed the title Fix bug when using structures with works [App] Fix bug when using structures with works Dec 5, 2022
@awaelchli
Copy link
Contributor

@ethanwharris For my understanding, how can we produce this bug? If you can show us an example, we could turn it into a test case.

@awaelchli
Copy link
Contributor

awaelchli commented Dec 6, 2022

Here is a test that produces a failure on master and passes on your PR. I hope it demonstrates the intended behavior.

def test_structures_have_name_on_init():
    """Test that the children in structures have the correct name assigned upon initialization."""
    class ChildWork(LightningWork):
        def run(self):
            pass

    class Collection(EmptyFlow):
        def __init__(self):
            super().__init__()
            self.list_structure = List()
            self.list_structure.append(ChildWork())

            self.dict_structure = Dict()
            self.dict_structure["dict_child"] = ChildWork()

    flow = Collection()
    LightningApp(flow)  # wrap in app to init all component names
    assert flow.list_structure[0].name == "root.list_structure.0"
    assert flow.dict_structure["dict_child"].name == "root.dict_structure.dict_child"

I recommend adding it to test_structures.py.

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Dec 6, 2022
@Borda Borda enabled auto-merge (squash) December 7, 2022 13:38
@github-actions github-actions bot removed the app (removed) Generic label for Lightning App package label Dec 7, 2022
@Borda Borda merged commit 1283226 into master Dec 8, 2022
@Borda Borda deleted the bugfix/structures_child_names branch December 8, 2022 00:54
Borda pushed a commit that referenced this pull request Dec 8, 2022
* Fix bug when using structures with works
* Add test
* Update CHANGELOG.md

(cherry picked from commit 1283226)
@Borda
Copy link
Member

Borda commented Dec 8, 2022

@ethanwharris why all these requirement changes?

@@ -1,4 +1,4 @@
# NOTE: the upper bound for the package version is only set for CI stability, and it is dropped while installing this package
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

torchvision>=0.10.*, <=0.13.0
torchvision>=0.10.0, <=0.13.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shall be TV 0.11 as to be paired with PT 1.10, right?

@@ -1,6 +1,6 @@
# NOTE: the upper bound for the package version is only set for CI stability, and it is dropped while installing this package
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

torchvision>=0.11.*, <=0.14.0
torchvision>=0.11.1, <=0.14.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is TV 0.11.1 compatible with PT 1.10.0?

Copy link
Member Author

@ethanwharris ethanwharris Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 0.11.0 was yanked, 0.11.1 is recommended for PT 1.10.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -52,23 +50,21 @@ def __init__(self, *items: T):

self._name: t.Optional[str] = ""
self._last_index = 0
self._backend: Optional[Backend] = None
self._backend: t.Optional[Backend] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why import with t. and not from typing import ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we already have import typing as t in there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I see the weird typing hack :D

williamFalcon pushed a commit that referenced this pull request Dec 8, 2022
* [App] Remove `SingleProcessRuntime` (#15933)

* Remove SingleProcessRuntime
* Remove unused queues
* Docs

(cherry picked from commit e250dfe)

* [App] Fix bug when using structures with works (#15911)

* Fix bug when using structures with works
* Add test
* Update CHANGELOG.md

(cherry picked from commit 1283226)

* [App] Wait for full file to be transferred in Path / Payload (#15934)

* Wait for full file to be transferred in Path / Payload
* Fixes

(cherry picked from commit b8c7018)

Co-authored-by: Ethan Harris <[email protected]>
justusschock added a commit that referenced this pull request Dec 9, 2022
* Simplify enabling CPU offload in FSDP (#15832)


Co-authored-by: Jirka Borovec <[email protected]>

* [App] Enable running with spawn context (#15923)

* Fix compiler support test (#15927)

* Enable back inference mode support with hpu & update links (#15918)

* Enable back inference mode support with hpu
* Remove unused
* Update document link and address comment

Signed-off-by: Jerome <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Introduce auto scaler (#15769)

* Exlucde __pycache__ in setuptools

* Add load balancer example

* wip

* Update example

* rename

* remove prints

* _LoadBalancer -> LoadBalancer

* AutoScaler(work)

* change var name

* remove locust

* Update docs

* include autoscaler in api ref

* docs typo

* docs typo

* docs typo

* docs typo

* remove unused loadtest

* remove unused device_type

* clean up

* clean up

* clean up

* Add docstring

* type

* env vars to args

* expose an API for users to override to customise autoscaling logic

* update example

* comment

* udpate var name

* fix scale mechanism and clean up

* Update exampl

* ignore mypy

* Add test file

* .

* update impl and update tests

* Update changlog

* .

* revert docs

* update test

* update state to keep calling 'flow.run()'

Co-authored-by: Aniket Maurya <[email protected]>

* Add aiohttp to base requirements

* Update docs

Co-authored-by: Luca Antiga <[email protected]>

* Use deserializer utility

* fake trigger

* wip: protect /system/* with basic auth

* read password at runtime

* Change env var name

* import torch as optional

* Don't overcreate works

* simplify imports

* Update example

* aiohttp

* Add work_args work_kwargs

* More docs

* remove FIXME

* Apply Jirka's suggestions

Co-authored-by: Jirka Borovec <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean example device

* add comment on init threshold value

* bad merge

* nit: logging format

* {in,out}put_schema -> {in,out}put_type

* lowercase

* docs on seconds

* process_time -> processing_time

* Dont modify work state from flow

* Update tests

* worker_url -> endpoint

* fix exampl

* Fix default scale logic

* Fix default scale logic

* Fix num_pending_works

* Update num_pending_works

* Fix bug creating too many works

* Remove up/downscale_threshold args

* Update example

* Add typing

* Fix example in docstring

* Fix default scale logic

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename method

* rename locvar

* Add todo

* docs ci

* docs ci

* asdfafsdasdf pls docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* .

* doc

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 24983a0.

* Revert "Update src/lightning_app/components/auto_scaler.py"

This reverts commit 56ea78b.

* Remove redefinition

* Remove load balancer run blocker

* raise RuntimeError

* remove has_sent

* lower the default timeout_batching from 10 to 1

* remove debug

* update the default timeout_batching

* .

* tighten condition

* fix endpoint

* typo in runtimeerror cond

* async lock update severs

* add a test

* {in,out}put_type typing

* Update examples/app_server_with_auto_scaler/app.py

Co-authored-by: Jirka Borovec <[email protected]>

* Update .actions/setup_tools.py

Co-authored-by: Aniket Maurya <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Noha Alon <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: thomas chaton <[email protected]>

* ENG-627: Docs for CloudCompute Mount Argument (#15182)

fixed conflicts

* Fix LRScheduler import for PyTorch 2.0 (#15940)

* Fix LRScheduler import for PyTorch 2.0
* Add comment for posterity

* CI: fix pypi flow (#15944)

* CI: fixing pypi syntax (#15943)
* connect
* input

* [App] Remove `SingleProcessRuntime` (#15933)

* Remove SingleProcessRuntime
* Remove unused queues
* Docs

* [App] Fix bug when using structures with works (#15911)

* Fix bug when using structures with works
* Add test
* Update CHANGELOG.md

* [App] Wait for full file to be transferred in Path / Payload (#15934)

* Wait for full file to be transferred in Path / Payload
* Fixes

* [docs] Include all components in the API reference (#15805)

* Update docs

Co-authored-by: Jirka Borovec <[email protected]>

* Bump playwright from 1.27.1 to 1.28.0 in /requirements (#15903)

* Bump playwright from 1.27.1 to 1.28.0 in /requirements

Bumps [playwright](https://github.com/Microsoft/playwright-python) from 1.27.1 to 1.28.0.
- [Release notes](https://github.com/Microsoft/playwright-python/releases)
- [Commits](microsoft/playwright-python@v1.27.1...v1.28.0)

---
updated-dependencies:
- dependency-name: playwright
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* 1.28

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>

* [App] Add `configure_layout` method for works (#15926)

* Add `configure_layout` method for works
* Check for api access availability
* Updates from review
* Update CHANGELOG.md
* Apply suggestions from code review

Co-authored-by: Sherin Thomas <[email protected]>

* Make gradients available for all_gather on TPU (#15003)

* Make gradients available for all_gather on TPU
* Modify switch and tests
* Apply suggestions from code review
* Modify tests
* Fix test
* Drop test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

* Don't try to aggregate `requirements/__pycache__/base.txt` in setuptools (#15775)

Exlucde __pycache__ in setuptools

* [App] Multiprocessing-safe work pickling (#15836)

* Upgrade to HPU release 1.7.1 (#15956)

* Upgrade to HPU release 1.7.1
Update torch version check for hpu

Signed-off-by: Jerome <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multinode on MPS (#15748)

* Fix restarting attribute for lr finder
* update lite executor
* update trainer executor
* update spawn executor
* add multinode component tests
* add testing helpers
* add lite tests
* add trainer tests
* update changelog
* update trainer
* update workflow
* update tests
* debug
* add reason for skipif
* Apply suggestions from code review
* switch skipif

Co-authored-by: Jirka <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

* [App] Resolve PythonServer on M1 (#15949)


Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Lite: Fix DataLoader shuffling when using DistributedSampler (#15931)


Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Temporarily disable ready (#15958)

* Fix restarting attribute for lr finder (#15620)

* [App] Improve pdb for multiprocessing (#15950)


Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Improve debug triggering (#15951)

* [App] Add automatic conversion to structures (#15961)

* Make LightningModule torch.jit.script-able again (#15947)

* Make LightningModule torch.jit.script-able again
* remove skip

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* refactor: simplify Tensor import (#15959)

* Fix ImportErrors on Multinode if package not present (#15963)

* Fix typo in definition of world size in docs (#15954)

* [App] Enable running an app from the Gallery (#15941)


Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Jirka <[email protected]>

* Apply dynamo to training_step, validation_step, test_step, predict_step (#15957)

* Apply dynamo to training_step, validation_step, test_step, predict_step

* Add entry to CHANGELOG.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge conflict

* rename tpu workflow

Signed-off-by: Jerome <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Jerome Anand <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Aniket Maurya <[email protected]>
Co-authored-by: Noha Alon <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Rick Izzo <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Sherin Thomas <[email protected]>
Co-authored-by: stekiri <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Justus Schock <[email protected]>
Co-authored-by: thomas <[email protected]>
carmocca added a commit that referenced this pull request Jan 4, 2023
* Simplify enabling CPU offload in FSDP (#15832)

Co-authored-by: Jirka Borovec <[email protected]>

* [App] Enable running with spawn context (#15923)

* Fix compiler support test (#15927)

* Enable back inference mode support with hpu & update links (#15918)

* Enable back inference mode support with hpu
* Remove unused
* Update document link and address comment

Signed-off-by: Jerome <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Introduce auto scaler (#15769)

* Exlucde __pycache__ in setuptools

* Add load balancer example

* wip

* Update example

* rename

* remove prints

* _LoadBalancer -> LoadBalancer

* AutoScaler(work)

* change var name

* remove locust

* Update docs

* include autoscaler in api ref

* docs typo

* docs typo

* docs typo

* docs typo

* remove unused loadtest

* remove unused device_type

* clean up

* clean up

* clean up

* Add docstring

* type

* env vars to args

* expose an API for users to override to customise autoscaling logic

* update example

* comment

* udpate var name

* fix scale mechanism and clean up

* Update exampl

* ignore mypy

* Add test file

* .

* update impl and update tests

* Update changlog

* .

* revert docs

* update test

* update state to keep calling 'flow.run()'

Co-authored-by: Aniket Maurya <[email protected]>

* Add aiohttp to base requirements

* Update docs

Co-authored-by: Luca Antiga <[email protected]>

* Use deserializer utility

* fake trigger

* wip: protect /system/* with basic auth

* read password at runtime

* Change env var name

* import torch as optional

* Don't overcreate works

* simplify imports

* Update example

* aiohttp

* Add work_args work_kwargs

* More docs

* remove FIXME

* Apply Jirka's suggestions

Co-authored-by: Jirka Borovec <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean example device

* add comment on init threshold value

* bad merge

* nit: logging format

* {in,out}put_schema -> {in,out}put_type

* lowercase

* docs on seconds

* process_time -> processing_time

* Dont modify work state from flow

* Update tests

* worker_url -> endpoint

* fix exampl

* Fix default scale logic

* Fix default scale logic

* Fix num_pending_works

* Update num_pending_works

* Fix bug creating too many works

* Remove up/downscale_threshold args

* Update example

* Add typing

* Fix example in docstring

* Fix default scale logic

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename method

* rename locvar

* Add todo

* docs ci

* docs ci

* asdfafsdasdf pls docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* .

* doc

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 24983a0.

* Revert "Update src/lightning_app/components/auto_scaler.py"

This reverts commit 56ea78b.

* Remove redefinition

* Remove load balancer run blocker

* raise RuntimeError

* remove has_sent

* lower the default timeout_batching from 10 to 1

* remove debug

* update the default timeout_batching

* .

* tighten condition

* fix endpoint

* typo in runtimeerror cond

* async lock update severs

* add a test

* {in,out}put_type typing

* Update examples/app_server_with_auto_scaler/app.py

Co-authored-by: Jirka Borovec <[email protected]>

* Update .actions/setup_tools.py

Co-authored-by: Aniket Maurya <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Noha Alon <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: thomas chaton <[email protected]>

* ENG-627: Docs for CloudCompute Mount Argument (#15182)

fixed conflicts

* Fix LRScheduler import for PyTorch 2.0 (#15940)

* Fix LRScheduler import for PyTorch 2.0
* Add comment for posterity

* CI: fix pypi flow (#15944)

* CI: fixing pypi syntax (#15943)
* connect
* input

* [App] Remove `SingleProcessRuntime` (#15933)

* Remove SingleProcessRuntime
* Remove unused queues
* Docs

* [App] Fix bug when using structures with works (#15911)

* Fix bug when using structures with works
* Add test
* Update CHANGELOG.md

* [App] Wait for full file to be transferred in Path / Payload (#15934)

* Wait for full file to be transferred in Path / Payload
* Fixes

* [docs] Include all components in the API reference (#15805)

* Update docs

Co-authored-by: Jirka Borovec <[email protected]>

* Bump playwright from 1.27.1 to 1.28.0 in /requirements (#15903)

* Bump playwright from 1.27.1 to 1.28.0 in /requirements

Bumps [playwright](https://github.com/Microsoft/playwright-python) from 1.27.1 to 1.28.0.
- [Release notes](https://github.com/Microsoft/playwright-python/releases)
- [Commits](microsoft/playwright-python@v1.27.1...v1.28.0)

---
updated-dependencies:
- dependency-name: playwright
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* 1.28

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>

* [App] Add `configure_layout` method for works (#15926)

* Add `configure_layout` method for works
* Check for api access availability
* Updates from review
* Update CHANGELOG.md
* Apply suggestions from code review

Co-authored-by: Sherin Thomas <[email protected]>

* Make gradients available for all_gather on TPU (#15003)

* Make gradients available for all_gather on TPU
* Modify switch and tests
* Apply suggestions from code review
* Modify tests
* Fix test
* Drop test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

* Don't try to aggregate `requirements/__pycache__/base.txt` in setuptools (#15775)

Exlucde __pycache__ in setuptools

* [App] Multiprocessing-safe work pickling (#15836)

* Upgrade to HPU release 1.7.1 (#15956)

* Upgrade to HPU release 1.7.1
Update torch version check for hpu

Signed-off-by: Jerome <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multinode on MPS (#15748)

* Fix restarting attribute for lr finder
* update lite executor
* update trainer executor
* update spawn executor
* add multinode component tests
* add testing helpers
* add lite tests
* add trainer tests
* update changelog
* update trainer
* update workflow
* update tests
* debug
* add reason for skipif
* Apply suggestions from code review
* switch skipif

Co-authored-by: Jirka <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

* [App] Resolve PythonServer on M1 (#15949)

Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Lite: Fix DataLoader shuffling when using DistributedSampler (#15931)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Temporarily disable ready (#15958)

* Fix restarting attribute for lr finder (#15620)

* [App] Improve pdb for multiprocessing (#15950)

Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [App] Improve debug triggering (#15951)

* [App] Add automatic conversion to structures (#15961)

* Make LightningModule torch.jit.script-able again (#15947)

* Make LightningModule torch.jit.script-able again
* remove skip

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* refactor: simplify Tensor import (#15959)

* Fix ImportErrors on Multinode if package not present (#15963)

* Fix typo in definition of world size in docs (#15954)

* [App] Enable running an app from the Gallery (#15941)

Co-authored-by: thomas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Jirka <[email protected]>

* Apply dynamo to training_step, validation_step, test_step, predict_step (#15957)

* Apply dynamo to training_step, validation_step, test_step, predict_step

* Add entry to CHANGELOG.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge conflict

* rename tpu workflow

Signed-off-by: Jerome <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Jerome Anand <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Aniket Maurya <[email protected]>
Co-authored-by: Noha Alon <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Rick Izzo <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Sherin Thomas <[email protected]>
Co-authored-by: stekiri <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Justus Schock <[email protected]>
Co-authored-by: thomas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants