-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: how_to_resume_a_run_after_a_crash (#919)
Task: PHS-600 Co-authored-by: FelixFehse <[email protected]>
- Loading branch information
1 parent
1237e88
commit ae94ea9
Showing
4 changed files
with
131 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
src/documentation/how_tos/how_to_resume_a_run_after_a_crash.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pytest\n", | ||
"from example_data import DummyTaskCanFail, example_data\n", | ||
"\n", | ||
"from intelligence_layer.evaluation.run.in_memory_run_repository import (\n", | ||
" InMemoryRunRepository,\n", | ||
")\n", | ||
"from intelligence_layer.evaluation.run.runner import Runner\n", | ||
"\n", | ||
"my_example_data = example_data()\n", | ||
"\n", | ||
"dataset_repository = my_example_data.dataset_repository\n", | ||
"run_repository = InMemoryRunRepository()\n", | ||
"task = DummyTaskCanFail()\n", | ||
"\n", | ||
"runner = Runner(task, dataset_repository, run_repository, \"MyRunDescription\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# How to resume a run after a crash\n", | ||
"\n", | ||
"0. Run task on a dataset, see [here](./how_to_run_a_task_on_a_dataset.ipynb).\n", | ||
"1. A crash occurs.\n", | ||
"2. Re-run task on the same dataset with `resume_from_recovery_data` set to `True`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Steps 0 & 1: Run task for dataset\n", | ||
"with pytest.raises(Exception): # noqa: B017\n", | ||
" run_overview = runner.run_dataset(my_example_data.dataset.id, abort_on_error=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A failure has occurred. Note, this might be a crash of the computer or an unexpected uncaught exception. \n", | ||
"\n", | ||
"For demonstration purposes, we set `abort_on_error=True`, such that an exception is raised. Further, we catch the exception for purely technical reasons of our CI. Feel free to remove the pytest scope on your local setup when running this notebook.\n", | ||
"\n", | ||
"Even though the run crashed, the `RunRepository` stores recovery data and is able to continue `run_dataset` by setting `resume_from_recovery_data` to `True`. This way, the already successfully calculated outputs do not have to be re-calculated again, and only the missing examples are processed:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Step 2: Re-run the same run with `resume_from_recovery_data` enabled\n", | ||
"run_overview = runner.run_dataset(\n", | ||
" my_example_data.dataset.id, abort_on_error=True, resume_from_recovery_data=True\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(run_overview)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note: The `FileSystemRepository` persists the recovery data in the file system. The run can therefore be resumed even in case of a complete program or even computer crash. \n", | ||
"\n", | ||
"On the other hand, the `InMemoryRunRepository` retains the recovery data only as long as the repository resides in computer memory. A crash of the process will lead to the loss of the recovery data. In that case, all examples will have to be recalculated." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "intelligence-layer-dgcJwC7l-py3.11", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |