Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: increase the default timeout to help TMO load properly #112

Merged
merged 1 commit into from
Nov 3, 2022

Conversation

ZLLentz
Copy link
Member

@ZLLentz ZLLentz commented Nov 3, 2022

TMO was complaining about their typhos screens launched from lucid not being quite right. The enum value selection on some of the state devices would often come up with no options at all.

It looked similar to issues we saw in lightpath so I decided to boost the EpicsSignal timeouts for the application, like we did in lightpath. This does fix the issue and you can make the issue worse (more devices effected) if you drop the timeout to be arbitrarily low.

This probably shouldn't be the final form of this PR- I think timeout should be configurable, right? I'm not sure if it's best as a cli arg as part of the toolbar yaml or somewhere else. This is a reasonable band-aid though.

REF https://jira.slac.stanford.edu/browse/LCLSECSD-1659

@ZLLentz ZLLentz requested review from klauer and tangkong November 3, 2022 18:58
timeout = 10
EpicsSignalBase.set_defaults(
timeout=timeout,
connection_timeout=timeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd recommend the auto monitor option as well:

https://github.com/klauer/las-dispersion-scan/blob/176df2006080523d4ae56ae50c04c0ccd8acbdc5/las_dispersion_scan/bin/gui.py#L64-L72

And we have this in so many repos... probably time for a pcdsutils inclusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have it in a shared spot like pcdsutils, we can look at general env vars for tweakability:
PCDS_GUI_CA_TIMEOUT (PCDSUTILS_GUI_CA_TIMEOUT?) or something similar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👍 on a pcdsutils inclusion.

I can put on the auto_monitor option but I'm actually not sure the purpose/result of that. Specifically, we'll be hitting ophyd any time any PV updates? That might be a lot of updates. Something to keep in mind is that the PyDM widgets are separately monitoring as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For background: auto_monitor is a pyepics thing that sets up its subscription mechanism automatically: https://pyepics.github.io/pyepics/pv.html#pv-automonitor-label

I had found that it reduced the amount of timeouts I got from interacting with ophyd signals/devices via .get() .put() and .set() in the above project. You'll also see it used in EpicsMotor, avoiding different-but-similar timeouts that Pete was seeing (https://github.com/bluesky/ophyd/blob/0e8acb3df3d17e0ab7aa2cc924831f4a0c580449/ophyd/epics_motor.py#L42-L64)

You're definitely right that it could affect performance. Perhaps it has no place in the general LUCID config at the moment, unless we continue to have timeout issues after this PR. That then leads to Robert's point as to "LUCID wants this EpicsSignal configuration but las-dispersion-scan and btms-ui want a different configuration"... Which then leads me to think we shouldn't have this in pcdsutils 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try it and see if it feels any different at all (and check cpu usage etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto_monitor=True had the following affects on the TMO lucid:

  • Slightly smaller cpu usage spike during startup
  • Higher cpu usage in general after startup (4% -> 7% with no typhos open on psbuild-rhel7-03)

I'm going to err on the side of leaving it as-is because we haven't done a lot of typhos/lucid testing with auto_monitor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to say: the app was exactly as responsive as before RE button clicks etc., but keep in mind that a lot of the PyDM stuff sidesteps ophyd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the testing. I agree with your assessment 👍

Copy link
Contributor

@tangkong tangkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought I had is that the timeout settings could be stuffed into an env variable, but then they might conflict across apps (maybe lightpath wants a different timeout than lucid? 🤷 )

This seems fine as a band-aid to me. My lucid dots seemed to fill in fine, but maybe I'm looking at the wrong thing.

In lightpath we also solved some of these issues by punting some of the requests to a run-later queue. I'm not sure if that pattern would be good here but it's a (more involved) thought

@ZLLentz ZLLentz merged commit 3b53413 into pcdshub:master Nov 3, 2022
@ZLLentz ZLLentz deleted the maint_default_timeout branch November 3, 2022 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants