Add option to always keep temporary files #199

Yoshanuikabundi · 2022-10-11T09:25:45Z

Description

Trouble shooting bespoke fit is very frustrating, in part because temporary files are aggressively cleaned up. This PR attempts to add a setting, BEFLOW_KEEP_TMP_FILES, that ensures that temporary files are kept (though perhaps not in an obvious location)

Todos

Notable points that this PR has either accomplished or will accomplish.

Add setting
Add new temporary_cd() implementation using setting
Replace uses of temporary_cd()
See if it works
Investigate other changes needed to keep files
Try to keep temporary file creation inside the requested working directory
Rebase after Support OpenFF Toolkit v0.11+ #198 is merged

Questions

Question1

Status

Ready to go

codecov · 2022-10-11T09:56:48Z

Codecov Report

Merging #199 (9642a26) into main (2af0a1a) will increase coverage by 0.13%.
The diff coverage is 97.43%.

Additional details and impacted files

openff/bespokefit/executor/services/_settings.py

jthorton · 2022-10-11T10:09:25Z

see here for how this currently works I think this more general solution looks better and more useful so feel free to remove the old logic.

I wonder if it would be worth extending the results schema to include a file_path to the location of files so users can work out which directory corresponds to which optimisation?

Yoshanuikabundi · 2022-10-11T13:07:33Z

Yeah I wrote the current implementation because I was frustrated with losing my files if the optimisation errored out before it gets to the copy step. I agree that my implementation is delightfully general and useful but unfortunately it has the small downside of not working 😅 The temporary directories themselves are preserved but not their contents. I'll keep working on it when I get a chance but I need to focus on the workshop at the moment.

mattwthompson · 2023-01-21T01:29:39Z

Is this closer to 0.1.3 or 0.2.0? This might be very useful when working on the step that interacts with ForceBalance if it can bypass the need to generate data each run.

Yoshanuikabundi · 2023-01-21T04:10:25Z

Yeah this will be super useful for debugging but it's a ways off. It ran into the usual problem - unit tests weren't capturing the bugs it introduced. So it's probably a post-0.2.0 feature.

The goal was not to do any caching, just to keep the files around - BespokeFit already has a caching feature for the QC step, it's probably simpler to extend that to optimisation.

jthorton · 2023-03-16T12:28:29Z

Thinking about this some more I think it might be best to reverse how this works, what about doing the work in a local folder whose name is the celery task-id and then removing the files depending on the exposed settings? We would then add the celery task-id to the relevant stage schema so that when inspecting a running task users can query the task id of the current stage and then check the folder and see the progress. In cases like #237 this would help users check if the task is actually running or stuck.

Yoshanuikabundi · 2023-03-22T05:13:44Z

I think that's the approach this PR takes? I (tried to) refactor the temporary_cd function to create a temporary directory, and then only clean it up if the appropriate setting is on. If the setting is off, the directory is created but not cleaned up. Naming things after the task id is a great idea but I've gotta get this working first 🙃

for more information, see https://pre-commit.ci

This reverts commit ff8dc2e.

Yoshanuikabundi · 2023-03-22T09:44:25Z

OK I think I mostly got this working!! It seems to keep all the optimizer files at least. It does not keep the QC files, but that's apparently just how QCEngine works - I tried keeping the QCE scratch space but there's nothing very interesting in it. I don't think other steps produce any files so I think it works.

I'd like to name folders after their IDs like you suggesting @jthorton, but I don't know what the best way to do that would be. Do you know if there's some place you could stick a with temporary_cd(f"id-{id}"): or something that would cover all the workers? My very limited understanding of celery suggests that might not be possible... I'll have a crack at seeing if I can come up with an alternative I like tomorrow :)

jthorton · 2023-03-22T10:13:24Z

I'd like to name folders after their IDs

Thats great this will help a lot. I think that we will have to change each workers call to the delay function to include a task id like the example here and also store this in the task schema under a new id field, then the task and the results folder have a common ID. We can then have worker function use the new temporary cd. Maybe these changes would be better handled in another PR as its changing the scope of this one by updating the schemas. Happy to take that on!

Yoshanuikabundi · 2023-03-23T04:36:49Z

That makes sense to me! If you could take a crack at that, that'd be great - I think we should merge this, #239, and #241, release 0.2.1 next week, and then we can remove the old BEFLOW_OPTIMIZER_KEEP_FILES setting and name temporary folders after the task ID in the next release. Something something breaking changes something something lets just get a release that supports toolkit 0.11+ out.

If you could also formally review/approve this PR that'd be great!

openff/bespokefit/executor/services/_settings.py

openff/bespokefit/utilities/tempcd.py

jthorton · 2023-03-23T11:53:36Z

I had a look into setting the schema id to match the task and it turns out we already do this! So we could add it to this PR, the task id is extracted and set here this would just then need to be passed to the temp cd here and then we can remove the lines at the bottom which copy the tree as this would all be handled by the new tmp cd function. I tested this locally and see that it works I can watch the ForceBalance optimisation output in real-time which is great!

Yoshanuikabundi · 2023-03-30T06:26:18Z

I shoulda just tried out the task ID thing before commenting everywhere 😅 I love how using the task ID consolidates all the logic - this'll make removing the old setting easy in the future. We just need to remember to tempcd to the task id whenever we write temporary files to disk.

I'll merge this on Monday unless you find something else to fix!

jthorton

Thanks for this @Yoshanuikabundi, looks fantastic!

jthorton reviewed Oct 11, 2022

View reviewed changes

openff/bespokefit/executor/services/_settings.py Show resolved Hide resolved

This was referenced Jan 26, 2023

Use openff-forcebalance for optimizations #221

Closed

The future of BespokeFit #224

Open

Yoshanuikabundi and others added 11 commits March 22, 2023 18:52

Add option to always keep temporary files

05f34f0

Simplify tempcd

21a8f0a

Specify type of BEFLOW_KEEP_TMP_FILES

ee62780

Add tests for temporary_cd

2506e63

Update temporary_cd to pass tests

309dc71

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b719ba

for more information, see https://pre-commit.ci

Fix test warnings

7f31ba3

Add comment

f59cf9b

Support multi-part paths without parents argument

746d1ec

Remove print statements from temporary_cd

18662d4

Changes to optimizer worker to support KEEP_TMP_FILES

8d8a773

Yoshanuikabundi force-pushed the keep_tmp_files branch from 3100534 to 8d8a773 Compare March 22, 2023 07:57

Yoshanuikabundi added 3 commits March 22, 2023 19:26

Try to keep QCEngine's scratch files

ff8dc2e

Deprecate BEFLOW_OPTIMIZER_KEEP_FILES

c701813

Revert "Try to keep QCEngine's scratch files"

dcbc58a

This reverts commit ff8dc2e.

Yoshanuikabundi marked this pull request as ready for review March 23, 2023 04:31

Yoshanuikabundi requested a review from jthorton March 23, 2023 04:36

Yoshanuikabundi added this to the v0.2.1 milestone Mar 23, 2023

Revert tests to use openff.utilities' temporary_cd

4c17076

jthorton reviewed Mar 23, 2023

View reviewed changes

openff/bespokefit/executor/services/_settings.py Show resolved Hide resolved

jthorton reviewed Mar 23, 2023

View reviewed changes

openff/bespokefit/utilities/tempcd.py Outdated Show resolved Hide resolved

Yoshanuikabundi added 2 commits March 30, 2023 17:19

Consolidate various methods of deleting temporary files

5aad257

Ensure deprecated alias of setting is unset in tests

9642a26

jthorton approved these changes Mar 30, 2023

View reviewed changes

Yoshanuikabundi merged commit 27813ab into main Mar 30, 2023

mattwthompson mentioned this pull request Feb 22, 2024

Document environment variables #326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to always keep temporary files #199

Add option to always keep temporary files #199

Yoshanuikabundi commented Oct 11, 2022 •

edited

Loading

codecov bot commented Oct 11, 2022 •

edited

Loading

jthorton commented Oct 11, 2022

Yoshanuikabundi commented Oct 11, 2022

mattwthompson commented Jan 21, 2023

Yoshanuikabundi commented Jan 21, 2023

jthorton commented Mar 16, 2023

Yoshanuikabundi commented Mar 22, 2023

Yoshanuikabundi commented Mar 22, 2023

jthorton commented Mar 22, 2023 •

edited

Loading

Yoshanuikabundi commented Mar 23, 2023 •

edited

Loading

jthorton commented Mar 23, 2023

Yoshanuikabundi commented Mar 30, 2023

jthorton left a comment

Add option to always keep temporary files #199

Add option to always keep temporary files #199

Conversation

Yoshanuikabundi commented Oct 11, 2022 • edited Loading

Description

Todos

Questions

Status

codecov bot commented Oct 11, 2022 • edited Loading

Codecov Report

jthorton commented Oct 11, 2022

Yoshanuikabundi commented Oct 11, 2022

mattwthompson commented Jan 21, 2023

Yoshanuikabundi commented Jan 21, 2023

jthorton commented Mar 16, 2023

Yoshanuikabundi commented Mar 22, 2023

Yoshanuikabundi commented Mar 22, 2023

jthorton commented Mar 22, 2023 • edited Loading

Yoshanuikabundi commented Mar 23, 2023 • edited Loading

jthorton commented Mar 23, 2023

Yoshanuikabundi commented Mar 30, 2023

jthorton left a comment

Choose a reason for hiding this comment

Yoshanuikabundi commented Oct 11, 2022 •

edited

Loading

codecov bot commented Oct 11, 2022 •

edited

Loading

jthorton commented Mar 22, 2023 •

edited

Loading

Yoshanuikabundi commented Mar 23, 2023 •

edited

Loading