Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve connect protocol to take timeout #48

Closed
coretl opened this issue Nov 1, 2023 · 1 comment · Fixed by #97
Closed

Improve connect protocol to take timeout #48

coretl opened this issue Nov 1, 2023 · 1 comment · Fixed by #97
Assignees

Comments

@coretl
Copy link
Collaborator

coretl commented Nov 1, 2023

At the moment we have:

async def connect(self, sim: bool = False):
"""Connect self and all child Devices.
Parameters
----------
sim:
If True then connect in simulation mode.
"""
coros = {
name: child_device.connect(sim) for name, child_device in self.children()
}
if coros:
await wait_for_connection(**coros)

This doesn't take timeout, so to add timeout on top we wrap with asyncio.wait or similar (like other primitives in the asyncio library). Unfortunately this makes logging the error difficult as connect calls recursively connect child devices. The way we report reasonable errors at the moment is to catch the CancelledError that is injected when the task times out, and raise NotConnectedError with the name of the signal in question, accumulating them in the parent until we produce a single top level NotConnectedError with all the failing signals:

async def wait_for_connection(**coros: Awaitable[None]):
"""Call many underlying signals, accumulating `NotConnected` exceptions
Raises
------
`NotConnected` if cancelled
"""
ts = {k: asyncio.create_task(c) for (k, c) in coros.items()} # type: ignore
try:
done, pending = await asyncio.wait(ts.values())
except asyncio.CancelledError:
for t in ts.values():
t.cancel()
lines: List[str] = []
for k, t in ts.items():
try:
await t
except NotConnected as e:
if len(e.lines) == 1:
lines.append(f"{k}: {e.lines[0]}")
else:
lines.append(f"{k}:")
lines += [f" {line}" for line in e.lines]
raise NotConnected(*lines)
else:
# Wait for everything to foreground the exceptions
for f in list(done) + list(pending):
await f

This is horrible. It also bites people who await device.connect() rather than using asyncio.wait like in DiamondLightSource/dodal#223.

@callumforrester suggested a better way, pass the timeout down, then let each child produce a class ConnectTimeoutError(TimeoutError) when it times out, then assemble it at the top level with an asyncio.gather(*coros, return_exceptions=True), squashing ConnectTimeoutErrors into a single one.

This would change the signature to:

async def connect(self, sim: bool = False, timeout: float = DEFAULT_TIMEOUT):  

with each concrete class raising ConnectTimeoutError if it times out, then wait_for_connection doing the squashing of ConnectTimeoutErrors.

@olliesilvester
Copy link
Contributor

An ophyd_async device timed out on Hyperion yesterday, very difficult to diagnose without knowing what device / signal timed out, so this change will be very useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants