Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Commit

Permalink
fix: Use fixed event dispatcher node ID (#513)
Browse files Browse the repository at this point in the history
* fix: Use hostname or explicitly configured node ID as event dispatcher node ID
* setup: Upgrade aiotools to 1.4.0

Backported-From: main (22.03)
Backported-To: 21.03
  • Loading branch information
achimnol committed Jan 10, 2022
1 parent a52a875 commit cc7571f
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 1 deletion.
2 changes: 2 additions & 0 deletions changes/513.fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Use a fixed value as the node ID in `EventDispatcher` instances, either auto-generated from the hostname or manually configured `manager.id` value of `manager.toml`.
- **IMPORTANT: An explicit admin/developer action is required** to fix up the corrupted Redis database and configuration. Check out the description of [lablup/backend.ai-manager#513](https://github.com/lablup/backend.ai-manager/pull/513) for details.
2 changes: 2 additions & 0 deletions config/sample.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ heartbeat-timeout = 30.0
# Override the name of this manager node.
# If empty or unspecified, the agent builds this from the hostname by prefixing it with "i-",
# like "i-hostname". The "i-" prefix is not mandatory, though.
# Explicit configuration may be required if the hostname changes frequently,
# to handle the event bus messages consistently.
# This affects the per-node configuration scope.
# id = ""

Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ install_requires =
aiojobs~=0.3.0
aiomonitor~=0.4.5
aioredis[hiredis]~=2.0
aiotools~=1.2.2
aiotools~=1.4.0
alembic~=1.6.5
async_timeout~=3.0
asyncache>=0.1.1
Expand Down
3 changes: 3 additions & 0 deletions src/ai/backend/manager/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@
from pathlib import Path
from pprint import pformat
import secrets
import socket
import sys
from typing import (
Any,
Expand Down Expand Up @@ -282,6 +283,8 @@
}),
t.Key('manager'): t.Dict({
t.Key('num-proc', default=_max_cpu_count): t.Int[1:_max_cpu_count],
t.Key('id', default=f"i-{socket.gethostname()}"): t.String,
t.Key('user', default=None): tx.UserID(default_uid=_file_perm.st_uid),
t.Key('user', default=None): tx.UserID(default_uid=_file_perm.st_uid),
t.Key('group', default=None): tx.GroupID(default_gid=_file_perm.st_gid),
t.Key('service-addr', default=('0.0.0.0', 8080)): tx.HostPortPair,
Expand Down
1 change: 1 addition & 0 deletions src/ai/backend/manager/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ async def event_dispatcher_ctx(root_ctx: RootContext) -> AsyncIterator[None]:
root_ctx.shared_config.data['redis'],
db=REDIS_STREAM_DB,
log_events=root_ctx.local_config['debug']['log-events'],
node_id=root_ctx.local_config['manager']['id'],
)
yield
await root_ctx.event_dispatcher.close()
Expand Down
2 changes: 2 additions & 0 deletions tests/test_distributed.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ async def _tick(context: Any, source: AgentId, event: NoopEvent) -> None:
event_dispatcher = await EventDispatcher.new(
self.shared_config.data['redis'],
db=REDIS_STREAM_DB,
node_id=self.local_config['manager']['id'],
)
event_producer = await EventProducer.new(
self.shared_config.data['redis'],
Expand Down Expand Up @@ -161,6 +162,7 @@ async def _tick(context: Any, source: AgentId, event: NoopEvent) -> None:
event_dispatcher = await EventDispatcher.new(
shared_config.data['redis'],
db=REDIS_STREAM_DB,
node_id=local_config['manager']['id'],
)
event_producer = await EventProducer.new(
shared_config.data['redis'],
Expand Down

0 comments on commit cc7571f

Please sign in to comment.