Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Alias and Outputs on startup are slow #213

Closed
Atticus1806 opened this issue Oct 29, 2024 · 2 comments · Fixed by #214
Closed

Creating Alias and Outputs on startup are slow #213

Atticus1806 opened this issue Oct 29, 2024 · 2 comments · Fixed by #214

Comments

@Atticus1806
Copy link
Contributor

I am currently running into the issue that my manager startup is slowed down by updating all alias and outputs every time. I am wondering, is this even required for the manager to work properly? I guess this can potentially cause files to be "missing" from output and alias, but behaviour itself should be safe right?

create_aliases(self.sis_graph.jobs())

For context I am currently looking at 52 secs until config load, 280 secs for alias and 1370 secs for outputs. While this probably is also related to slow fs and creating quite a number of outputs/alias these are hard to fix for me right now. In case the behaviour is not endangered I would create a PR adding a flag to disable the full update on startup.

My first test shows that this should work, but since I am not that familiar with the manager loop I want to make sure this does not implicitly break anything.

@michelwi
Copy link
Contributor

is this even required for the manager to work properly? I guess this can potentially cause files to be "missing" from output and alias, but behaviour itself should be safe right?

I think I would agree, in principle the manager could already start without all outputs in place. Unless of cause you are a naughty person and define tk.Paths into your output folder.

disable the full update on startup.

I am not sure when else the full update will be happening.

I guess in cases where you kill the manager to clear Jobs that go into error state, there is not much use in updating everything every time.
But when you kill the manager, change your graph and the outputs and then restart it, then we would need to update on startup; otherwise all aliases and outputs would still point to the old versions before the change and (the outputs) will only be updated once the manager finishes

self.check_output(write_output=self.link_outputs, update_all_outputs=True, force_update=True)

(assuming you are not impatient like me and hit ctrl+c a couple of times to get the shell back quicker)

Maybe the update could be pushed into a thread that runs in parallel to the manager loop?

@JackTemaki
Copy link
Contributor

Maybe the update could be pushed into a thread that runs in parallel to the manager loop?

This sounds like a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants