Improve `connect` protocol to take timeout #48

coretl · 2023-11-01T16:02:43Z

At the moment we have:

ophyd-async/src/ophyd_async/core/device.py

Lines 53 to 65 in f1b4aba

    
               async def connect(self, sim: bool = False): 
        
                   """Connect self and all child Devices. 
        
                   Parameters 
        
                   ---------- 
        
                   sim: 
        
                       If True then connect in simulation mode. 
        
                   """ 
        
                   coros = { 
        
                       name: child_device.connect(sim) for name, child_device in self.children() 
        
                   } 
        
                   if coros: 
        
                       await wait_for_connection(**coros)

This doesn't take timeout, so to add timeout on top we wrap with asyncio.wait or similar (like other primitives in the asyncio library). Unfortunately this makes logging the error difficult as connect calls recursively connect child devices. The way we report reasonable errors at the moment is to catch the CancelledError that is injected when the task times out, and raise NotConnectedError with the name of the signal in question, accumulating them in the parent until we produce a single top level NotConnectedError with all the failing signals:

ophyd-async/src/ophyd_async/core/utils.py

Lines 26 to 53 in f1b4aba

    
           async def wait_for_connection(**coros: Awaitable[None]): 
        
               """Call many underlying signals, accumulating `NotConnected` exceptions 
        
               Raises 
        
               ------ 
        
               `NotConnected` if cancelled 
        
               """ 
        
               ts = {k: asyncio.create_task(c) for (k, c) in coros.items()}  # type: ignore 
        
               try: 
        
                   done, pending = await asyncio.wait(ts.values()) 
        
               except asyncio.CancelledError: 
        
                   for t in ts.values(): 
        
                       t.cancel() 
        
                   lines: List[str] = [] 
        
                   for k, t in ts.items(): 
        
                       try: 
        
                           await t 
        
                       except NotConnected as e: 
        
                           if len(e.lines) == 1: 
        
                               lines.append(f"{k}: {e.lines[0]}") 
        
                           else: 
        
                               lines.append(f"{k}:") 
        
                               lines += [f"  {line}" for line in e.lines] 
        
                   raise NotConnected(*lines) 
        
               else: 
        
                   # Wait for everything to foreground the exceptions 
        
                   for f in list(done) + list(pending): 
        
                       await f

This is horrible. It also bites people who await device.connect() rather than using asyncio.wait like in DiamondLightSource/dodal#223.

@callumforrester suggested a better way, pass the timeout down, then let each child produce a class ConnectTimeoutError(TimeoutError) when it times out, then assemble it at the top level with an asyncio.gather(*coros, return_exceptions=True), squashing ConnectTimeoutErrors into a single one.

This would change the signature to:

async def connect(self, sim: bool = False, timeout: float = DEFAULT_TIMEOUT):

with each concrete class raising ConnectTimeoutError if it times out, then wait_for_connection doing the squashing of ConnectTimeoutErrors.

The text was updated successfully, but these errors were encountered:

olliesilvester · 2024-02-16T13:33:09Z

An ophyd_async device timed out on Hyperion yesterday, very difficult to diagnose without knowing what device / signal timed out, so this change will be very useful

rosesyrett mentioned this issue Jan 4, 2024

Modify error handling for timed out signals when connecting. #97

Merged

rosesyrett self-assigned this Jan 4, 2024

abbiemery assigned tpoliaw Jan 10, 2024

evalott100 closed this as completed in #97 Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `connect` protocol to take timeout #48

Improve `connect` protocol to take timeout #48

coretl commented Nov 1, 2023

olliesilvester commented Feb 16, 2024

Improve connect protocol to take timeout #48

Improve connect protocol to take timeout #48

Comments

coretl commented Nov 1, 2023

olliesilvester commented Feb 16, 2024

Improve `connect` protocol to take timeout #48

Improve `connect` protocol to take timeout #48