Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[App] Introduce auto scaler #15769

Merged
merged 127 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
e0cb13e
Exlucde __pycache__ in setuptools
akihironitta Nov 22, 2022
6e5a030
Merge branch 'master' into bugfix/dont-collect-pycache
akihironitta Nov 23, 2022
f162e96
Add load balancer example
akihironitta Nov 22, 2022
324d41d
Merge branch 'master' into bugfix/dont-collect-pycache
akihironitta Nov 24, 2022
0725439
wip
akihironitta Nov 23, 2022
0d67fd2
Merge branch 'bugfix/dont-collect-pycache' into feat/load-balancer-co…
akihironitta Nov 24, 2022
9cc237f
Update example
akihironitta Nov 24, 2022
9594371
rename
akihironitta Nov 24, 2022
fc64ceb
remove prints
akihironitta Nov 24, 2022
c4b5ac0
_LoadBalancer -> LoadBalancer
akihironitta Nov 24, 2022
4c61501
AutoScaler(work)
akihironitta Nov 24, 2022
dc72f1a
change var name
akihironitta Nov 24, 2022
57943ac
remove locust
akihironitta Nov 24, 2022
8aa68d9
Merge branch 'master' into feat/load-balancer-component
akihironitta Nov 24, 2022
b6a9918
Update docs
akihironitta Nov 24, 2022
744ddbe
Merge branch 'docs/add-missing-components' into feat/load-balancer-co…
akihironitta Nov 24, 2022
16f7333
include autoscaler in api ref
akihironitta Nov 24, 2022
cd9929c
docs typo
akihironitta Nov 24, 2022
f33874e
docs typo
akihironitta Nov 24, 2022
12d12b4
docs typo
akihironitta Nov 24, 2022
656b0b6
docs typo
akihironitta Nov 24, 2022
a5859f7
remove unused loadtest
akihironitta Nov 24, 2022
1bdf1bc
remove unused device_type
akihironitta Nov 24, 2022
1cb366f
clean up
akihironitta Nov 24, 2022
fb4d2e5
clean up
akihironitta Nov 24, 2022
c0ba351
clean up
akihironitta Nov 24, 2022
666918b
Add docstring
akihironitta Nov 24, 2022
6f0f43f
type
akihironitta Nov 24, 2022
24cb9c0
Merge branch 'master' into feat/load-balancer-component
akihironitta Nov 24, 2022
9cda544
env vars to args
akihironitta Nov 25, 2022
609eb10
expose an API for users to override to customise autoscaling logic
akihironitta Nov 25, 2022
4e779a1
update example
akihironitta Nov 25, 2022
92737e4
comment
akihironitta Nov 25, 2022
04c3a72
udpate var name
akihironitta Nov 25, 2022
52d240e
fix scale mechanism and clean up
akihironitta Nov 25, 2022
6d18f24
Update exampl
akihironitta Nov 25, 2022
5c5197e
ignore mypy
akihironitta Nov 25, 2022
98d56ad
Add test file
akihironitta Nov 25, 2022
b34076e
.
akihironitta Nov 25, 2022
dfc5dff
Merge branch 'master' into docs/add-missing-components
akihironitta Nov 25, 2022
c139ede
Merge branch 'docs/add-missing-components' into feat/load-balancer-co…
akihironitta Nov 25, 2022
26ca77d
update impl and update tests
akihironitta Nov 28, 2022
5082d44
Merge branch 'master' into feat/load-balancer-component
akihironitta Nov 28, 2022
c230254
Update changlog
akihironitta Nov 28, 2022
80e6b7d
.
akihironitta Nov 28, 2022
2aeec1c
revert docs
akihironitta Nov 28, 2022
8091ca9
update test
akihironitta Nov 28, 2022
a2bfaed
update state to keep calling 'flow.run()'
akihironitta Nov 28, 2022
eb784fe
Add aiohttp to base requirements
akihironitta Nov 28, 2022
9208959
Update docs
akihironitta Nov 28, 2022
15dc3ae
Use deserializer utility
akihironitta Nov 28, 2022
7ffd45a
fake trigger
akihironitta Nov 28, 2022
10627e9
wip: protect /system/* with basic auth
akihironitta Nov 28, 2022
f79d16b
read password at runtime
akihironitta Nov 29, 2022
ae7f300
Change env var name
akihironitta Nov 29, 2022
8ea25d1
import torch as optional
akihironitta Nov 29, 2022
a0fb484
Merge branch 'master' into feat/load-balancer-component
akihironitta Nov 29, 2022
9e66136
Don't overcreate works
akihironitta Nov 29, 2022
94600d7
simplify imports
akihironitta Nov 29, 2022
15dca21
Update example
akihironitta Nov 29, 2022
8d65628
aiohttp
Borda Nov 29, 2022
fe5c0f4
Add work_args work_kwargs
akihironitta Nov 29, 2022
77faca5
More docs
akihironitta Nov 29, 2022
cf32733
Merge remote-tracking branch 'origin/feat/load-balancer-component' in…
akihironitta Nov 29, 2022
39de0ba
remove FIXME
akihironitta Nov 29, 2022
a49766d
Apply Jirka's suggestions
akihironitta Nov 29, 2022
6861444
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 29, 2022
2b95111
clean example device
akihironitta Nov 29, 2022
5af2569
add comment on init threshold value
akihironitta Nov 29, 2022
69ec4c3
bad merge
akihironitta Nov 29, 2022
64d99e4
Merge remote-tracking branch 'origin/feat/load-balancer-component' in…
akihironitta Nov 29, 2022
61433b3
nit: logging format
akihironitta Nov 29, 2022
f9debb6
{in,out}put_schema -> {in,out}put_type
akihironitta Nov 29, 2022
e534e09
lowercase
akihironitta Nov 29, 2022
87e2882
docs on seconds
akihironitta Nov 29, 2022
7f2731d
process_time -> processing_time
akihironitta Nov 29, 2022
a933247
Dont modify work state from flow
akihironitta Nov 29, 2022
5d7d1c3
Update tests
akihironitta Nov 29, 2022
73cf389
worker_url -> endpoint
akihironitta Nov 29, 2022
8840b85
fix exampl
akihironitta Nov 29, 2022
b7301e6
Fix default scale logic
akihironitta Nov 30, 2022
c8d0c86
Fix default scale logic
akihironitta Nov 30, 2022
ad5b8e5
Fix num_pending_works
akihironitta Nov 30, 2022
9a43d31
Update num_pending_works
akihironitta Dec 1, 2022
ce4d257
Fix bug creating too many works
akihironitta Dec 1, 2022
9bebb86
Remove up/downscale_threshold args
akihironitta Dec 1, 2022
8b14154
Update example
akihironitta Dec 1, 2022
611077e
Add typing
akihironitta Dec 1, 2022
7a627ac
Merge branch 'master' into feat/load-balancer-component
akihironitta Dec 1, 2022
4e93898
Fix example in docstring
akihironitta Dec 1, 2022
1e42f55
Fix default scale logic
akihironitta Dec 1, 2022
0b6153a
Update src/lightning_app/components/auto_scaler.py
akihironitta Dec 1, 2022
6d677c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 1, 2022
67bbe49
rename method
akihironitta Dec 1, 2022
e524702
rename locvar
akihironitta Dec 1, 2022
2f24422
Add todo
akihironitta Dec 1, 2022
a20797d
docs ci
akihironitta Dec 1, 2022
a8a8aaa
docs ci
akihironitta Dec 1, 2022
09dfda5
asdfafsdasdf pls docs
akihironitta Dec 1, 2022
11842b0
Apply suggestions from code review
akihironitta Dec 2, 2022
29059a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2022
a35b4c8
Merge branch 'master' into feat/load-balancer-component
Dec 5, 2022
4285506
.
akihironitta Dec 5, 2022
72a6f13
doc
akihironitta Dec 5, 2022
56ea78b
Update src/lightning_app/components/auto_scaler.py
akihironitta Dec 5, 2022
24983a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2022
27431f4
Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
akihironitta Dec 5, 2022
b7dd2c1
Revert "Update src/lightning_app/components/auto_scaler.py"
akihironitta Dec 5, 2022
ebcfc51
Merge branch 'master' into feat/load-balancer-component
akihironitta Dec 6, 2022
a634446
Remove redefinition
akihironitta Dec 6, 2022
64a1960
Remove load balancer run blocker
akihironitta Dec 6, 2022
fba7a3c
raise RuntimeError
akihironitta Dec 6, 2022
4ccc38c
remove has_sent
akihironitta Dec 6, 2022
aa1785c
lower the default timeout_batching from 10 to 1
akihironitta Dec 6, 2022
7c09716
remove debug
akihironitta Dec 6, 2022
ff2009a
update the default timeout_batching
akihironitta Dec 6, 2022
839734d
.
akihironitta Dec 6, 2022
6a553b9
tighten condition
akihironitta Dec 6, 2022
506e192
fix endpoint
akihironitta Dec 6, 2022
205c0af
typo in runtimeerror cond
akihironitta Dec 6, 2022
6d76b0d
async lock update severs
akihironitta Dec 6, 2022
2233098
add a test
akihironitta Dec 6, 2022
4526496
{in,out}put_type typing
akihironitta Dec 6, 2022
00fed69
Merge branch 'master' into feat/load-balancer-component
tchaton Dec 6, 2022
468b626
Update examples/app_server_with_auto_scaler/app.py
akihironitta Dec 7, 2022
5b2b69f
Update .actions/setup_tools.py
akihironitta Dec 7, 2022
2c41e38
Merge branch 'master' into feat/load-balancer-component
akihironitta Dec 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .actions/setup_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ def _load_aggregate_requirements(req_dir: str = "requirements", freeze_requireme
load_requirements(d, file_name="base.txt", unfreeze=not freeze_requirements)
for d in glob.glob(os.path.join(req_dir, "*"))
# skip empty folder as git artefacts, and resolving Will's special issue
if os.path.isdir(d) and len(glob.glob(os.path.join(d, "*"))) > 0
if os.path.isdir(d) and len(glob.glob(os.path.join(d, "*"))) > 0 and "__pycache__" not in d
akihironitta marked this conversation as resolved.
Show resolved Hide resolved
]
if not requires:
return None
Expand Down
1 change: 1 addition & 0 deletions docs/source-app/api_references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ ___________________
~training.LightningTrainerScript
~serve.gradio.ServeGradio
~serve.serve.ModelInferenceAPI
~auto_scaler.AutoScaler
tchaton marked this conversation as resolved.
Show resolved Hide resolved

----

Expand Down
86 changes: 86 additions & 0 deletions examples/app_server_with_auto_scaler/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
from typing import Any, List

import torch
import torchvision
from pydantic import BaseModel

import lightning as L


class RequestModel(BaseModel):
image: str
akihironitta marked this conversation as resolved.
Show resolved Hide resolved
akihironitta marked this conversation as resolved.
Show resolved Hide resolved


class BatchRequestModel(BaseModel):
inputs: List[RequestModel]


class BatchResponse(BaseModel):
outputs: List[Any]


class PyTorchServer(L.app.components.PythonServer):
def __init__(self, *args, **kwargs):
super().__init__(
port=L.app.utilities.network.find_free_network_port(),
input_type=BatchRequestModel,
output_type=BatchResponse,
cloud_compute=L.CloudCompute("gpu"),
)

def setup(self):
self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
self._model = torchvision.models.resnet18(pretrained=True).to(self._device)

def predict(self, requests: BatchRequestModel):
transforms = torchvision.transforms.Compose(
[
torchvision.transforms.Resize(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
)
images = []
for request in requests.inputs:
image = L.app.components.serve.types.image.Image.deserialize(request.image)
image = transforms(image).unsqueeze(0)
images.append(image)
images = torch.cat(images)
images = images.to(self._device)
predictions = self._model(images)
results = predictions.argmax(1).cpu().numpy().tolist()
return BatchResponse(outputs=[{"prediction": pred} for pred in results])


class MyAutoScaler(L.app.components.AutoScaler):
def scale(self, replicas: int, metrics: dict) -> int:
"""The default scaling logic that users can override."""
# scale out if the number of pending requests exceeds max batch size.
max_requests_per_work = self.max_batch_size
pending_requests_per_running_or_pending_work = metrics["pending_requests"] / (
replicas + metrics["pending_works"]
)
if pending_requests_per_running_or_pending_work >= max_requests_per_work:
return replicas + 1

# scale in if the number of pending requests is below 25% of max_requests_per_work
min_requests_per_work = max_requests_per_work * 0.25
pending_requests_per_running_work = metrics["pending_requests"] / replicas
if pending_requests_per_running_work < min_requests_per_work:
return replicas - 1

return replicas


app = L.LightningApp(
MyAutoScaler(
PyTorchServer,
min_replicas=2,
max_replicas=4,
autoscale_interval=10,
endpoint="predict",
input_type=RequestModel,
output_type=Any,
timeout_batching=1,
)
)
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ module = [
"lightning_app.components.serve.types.type",
"lightning_app.components.serve.python_server",
"lightning_app.components.training",
"lightning_app.components.auto_scaler",
"lightning_app.core.api",
"lightning_app.core.app",
"lightning_app.core.flow",
Expand Down
1 change: 1 addition & 0 deletions requirements/app/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ beautifulsoup4>=4.8.0, <4.11.2
inquirer>=2.10.0
psutil<5.9.4
click<=8.1.3
aiohttp>=3.8.0, <=3.8.3
akihironitta marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions src/lightning_app/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Added a CloudMultiProcessBackend which enables running a child App from within the Flow in the cloud ([#15800](https://github.com/Lightning-AI/lightning/pull/15800))

- Added `AutoScaler` component ([#15769](https://github.com/Lightning-AI/lightning/pull/15769))


### Changed

Expand Down
2 changes: 2 additions & 0 deletions src/lightning_app/components/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from lightning_app.components.auto_scaler import AutoScaler
from lightning_app.components.database.client import DatabaseClient
from lightning_app.components.database.server import Database
from lightning_app.components.multi_node import (
Expand All @@ -15,6 +16,7 @@
from lightning_app.components.training import LightningTrainerScript, PyTorchLightningScriptRunner

__all__ = [
"AutoScaler",
"DatabaseClient",
"Database",
"PopenPythonScript",
Expand Down
Loading