Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

fix: Use fixed event dispatcher node ID #513

Merged
merged 5 commits into from
Jan 10, 2022

Conversation

achimnol
Copy link
Member

@achimnol achimnol commented Jan 10, 2022

fixes lablup/backend.ai#345

After applying an update including this patch, all users MUST follow the instructions here:

  • Shutdown all manager instances.
  • Check the result of python -c 'import socket;print(socket.gethostname())' in all manager nodes.
    • If they are all different and kept persistent across reboots, don't need to do anything.
    • If they are not persistent (e.g., includes DHCP-assigned IP addresses) and not unique to each other, set manager.id config value in manager.toml.
    • For development setups on laptops, it is HIGHLY RECOMMENDED to configure manager.id manually because it may change depending on where you are (e.g., different WiFi networks may assign different hostnames).
  • Open a redis-cli shell of the Redis server used by agents and managers and execute:
    • select 4 to switch to the event streaming database
    • flushdb to delete all existing (randomized) consumers and pending messages inside their PEL entries
    • quit
    • NOTE: This redis cleanup proceduer is also required when you change the number of manager instances (e.g., the number of manager nodes, the number of manager worker processes) to prevent leaking consumers.
  • Restart all manager instances.

@achimnol achimnol added this to the 21.03 milestone Jan 10, 2022
@achimnol achimnol self-assigned this Jan 10, 2022
@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #513 (89cee1f) into main (2f194f9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 89cee1f differs from pull request most recent head aec1ed8. Consider uploading reports for the commit aec1ed8 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main     #513   +/-   ##
=======================================
  Coverage   48.83%   48.84%           
=======================================
  Files          54       54           
  Lines        8969     8970    +1     
=======================================
+ Hits         4380     4381    +1     
  Misses       4589     4589           
Impacted Files Coverage Δ
src/ai/backend/manager/server.py 59.73% <ø> (ø)
src/ai/backend/manager/config.py 44.03% <100.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f194f9...aec1ed8. Read the comment docs.

@achimnol achimnol merged commit 79ceccb into main Jan 10, 2022
@achimnol achimnol deleted the fix/support-fixed-consumer-id-in-event-dispatcher branch January 10, 2022 14:44
achimnol added a commit that referenced this pull request Jan 10, 2022
* fix: Use hostname or explicitly configured node ID as event dispatcher node ID
* setup: Upgrade aiotools to 1.4.0

Backported-From: main (22.03)
Backported-To: 21.09
achimnol added a commit that referenced this pull request Jan 10, 2022
* fix: Use hostname or explicitly configured node ID as event dispatcher node ID
* setup: Upgrade aiotools to 1.4.0

Backported-From: main (22.03)
Backported-To: 21.03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix accumulation of event bus consumers upon restarting managers
1 participant