Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes, a running workflow displays as stopped #383

Closed
MetRonnie opened this issue Jun 8, 2022 · 9 comments · Fixed by #463
Closed

Sometimes, a running workflow displays as stopped #383

MetRonnie opened this issue Jun 8, 2022 · 9 comments · Fixed by #463
Labels
bug Something isn't working
Milestone

Comments

@MetRonnie
Copy link
Member

MetRonnie commented Jun 8, 2022

Describe the bug

This is difficult to reproduce. I have noticed it a few times. Possibly it becomes more common if you have a lot of workflows in ~/cylc-run.

Sometimes, I will click play or play a workflow from the CLI, yet it will remain apparently stopped in the UI. But cylc scan shows it is indeed running.

Additional context

This may turn out to be a UI Server bug, but the UIServer log shows that connect_workflow does happen as it should.

Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read CONTRIBUTING.md before starting any work though).

@MetRonnie MetRonnie added the bug Something isn't working label Jun 8, 2022
@MetRonnie MetRonnie added this to the 1.4.0 milestone Jun 8, 2022
@oliver-sanders
Copy link
Member

If the problem goes away when you reload the UI it is most likely a UI bug, otherwise it is most likely a UIS bug.

If it's a UIS bug there's a reasonably chance that there will be an associated traceback in the UIS output, e.g. an error connecting or some internal issue.

If it's a UI bug, try inspecting the subscription in the network tab of the web inspector. Hard to do when there are lots of events going on but you should be able to find the corresponding added/updated deltas in there (which would confirm it's a UI bug).

@MetRonnie
Copy link
Member Author

Just experienced this again. Difficulty is you have to have had the devtools open already in order to see the subscriptions in the network tab

@MetRonnie
Copy link
Member Author

MetRonnie commented Sep 1, 2022

Reproduced again. Again didn't have devtools open in order to be able to investigate.

Edit:

This time the UI Server log did not show connect_workflow after logging the cylc play command (for reference, also included a previous run that did show as expected):

# First run
[I 10:07:02 CylcUIServer] $ cylc play --color=never --mode live Mist
[I 10:07:05 CylcHubApp log:189] 200 POST /user/rdutta/cylc/graphql (rdutta@::ffff:XX.XXX.XX.XXX) 2244.79ms
[I 10:07:06 CylcUIServer] [data-store] connect_workflow('~rdutta/Mist', <dict>)
[I 10:07:27 CylcHubApp log:189] 200 POST /user/rdutta/cylc/graphql (rdutta@::ffff:XX.XXX.XX.XXX) 118.07ms
[I 10:07:42 CylcHubApp log:189] 200 POST /user/rdutta/cylc/graphql (rdutta@::ffff:XX.XXX.XX.XXX) 63.68ms
[I 10:07:47 CylcUIServer] [data-store] disconnect_workflow('~rdutta/Mist')
10:09:38 [ConfigProxy] info: 200 GET /api/routes 

# Second run (deleted workflow DB after first run)
[I 10:09:45 CylcUIServer] $ cylc play --color=never --mode live Mist
[I 10:09:47 CylcHubApp log:189] 200 POST /user/rdutta/cylc/graphql (rdutta@::ffff:XX.XXX.XX.XXX) 1342.64ms
[I 10:09:57 JupyterHub log:189] 200 POST /hub/api/users/rdutta/activity ([email protected]) 26.71ms
[I 10:10:11 JupyterHub log:189] 200 GET /hub/api/user ([email protected]) 27.83ms

@oliver-sanders
Copy link
Member

What was your scan interval?

If the workflow doesn't start quickly then the UIS won't attempt to connect until the next scan interval.

@MetRonnie
Copy link
Member Author

The default presumably, I haven't configured it anywhere. The workflow was left running, stalled for several minutes on the second run

@oliver-sanders
Copy link
Member

From Ronnie's description:

  • The workflow passed through multiple scans.
  • It showed up when the UIS was restarted.
  • The connect_workflow log message didn't occur.

This points the finger at the UIS not the UI so transferring the issue there.

I think this means that one or more of the following must be true:

  • The workflows manager didn't detect the state change.
  • The UIS failed to connect to the workflow.
    • The log message goes to the debug level as the UIS will automatically retry.
    • This was on NFS, it's entirely possible that the FS was messing about with the key files blocking connection for a while.
  • A small possibility that the WorkflowRuntimeClient hung during init somehow?

@oliver-sanders oliver-sanders transferred this issue from cylc/cylc-ui Sep 1, 2022
@oliver-sanders oliver-sanders modified the milestones: 1.4.0, 1.2.0 Sep 1, 2022
@oliver-sanders oliver-sanders modified the milestones: 1.2.0, pending Dec 12, 2022
@oliver-sanders
Copy link
Member

Needs reproducing with latest versions and a reproducible-ish example.

@oliver-sanders
Copy link
Member

@MetRonnie, possibly closed by #463?

@MetRonnie
Copy link
Member Author

Most probably.

@MetRonnie MetRonnie modified the milestones: pending, 1.3.0 Jul 17, 2023
@MetRonnie MetRonnie linked a pull request Jul 17, 2023 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants