Skip to content

Commit

Permalink
Do not close worker on comm error in heartbeat
Browse files Browse the repository at this point in the history
  • Loading branch information
fjetter authored and hendrikmakait committed Oct 20, 2022
1 parent 8899e3c commit 7ca7e26
Showing 1 changed file with 5 additions and 10 deletions.
15 changes: 5 additions & 10 deletions distributed/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -1237,17 +1237,12 @@ async def heartbeat(self) -> None:
)
self.bandwidth_workers.clear()
self.bandwidth_types.clear()
except CommClosedError:
logger.warning("Heartbeat to scheduler failed", exc_info=True)
await self.close()
except OSError as e:
# Scheduler is gone. Respect distributed.comm.timeouts.connect
if "Timed out trying to connect" in str(e):
logger.info("Timed out while trying to connect during heartbeat")
await self.close()
else:
logger.exception(e)
raise e
logger.exception(e)
except Exception as e:
logger.exception("Unexpected exception during heartbeat. Closing worker.")
await self.close()
raise e
finally:
self.heartbeat_active = False

Expand Down

0 comments on commit 7ca7e26

Please sign in to comment.