Add optimisation options/guidance #81

ElliottKasoar · 2023-12-15T20:20:19Z

As discussed in #78, there are (at least) two forms of optimisation that would be relatively straight forward to facilitate in some capacity, but require more consideration/are unlikely to be the default options (which is why they are not included in the referenced PR):

Model freezing

See: torch.jit.freeze
Applies system-independent optimization, as opposed to the system-dependent optimize_for_inference (which is currently broken, unfortunately)
Can give up to 50% speedup
From our benchmarking (see FTorch with/without gradients and/or frozen models sections), freezing the model can make more modest, but not insignificant, improvements (in most cases)
- Tests were carried out by replacing scripted_model.save(filename) with frozen_model = torch.jit.freeze(scripted_model), and then frozen_model.save(filename) in pt2ts.py
While not a problem for the use of FTorch, it's also worth noting that running the same TorchScript via Forpy (on CPU or GPU) seemed to give a similar errors to what optimize_for_inference can gives e.g. AttributeError: 'RecursiveScriptModule' object has no attribute 'training'
Freezing the model appears to lead to numerical errors (~10^-6) for the ResNet benchmark, raising a RuntimeError when saving, but this doesn't seem to be the case for the cgdrag benchmark, and it is unclear why
The guidance part of the title is perhaps most relevant here, as this is less about the main FTorch library, and more about how we enable users to use tools like pt2ts.py as part of a workflow involving FTorch
- This is somewhat dependent on the typical familiarity of potential FTorch users with the processes involved in saving to TorchScript
- Note: trace_to_torchscript currently uses model freezing. It would be preferable to have a shared setting and/or behaviour, unless there is a clear reason to use freezing in only one of the functions
- Any guidelines on trace_to_torchscript compared with script_to_torchscript may also be useful, as currently there is no clear motivation not to use the "default" script_to_torchscript

InferenceMode

See: inference mode, autograd mechanics and the dev podcast
From our benchmarking (see FTorch with InferenceMode and NoGradMode sections), benefits were less clear, but in general it is expected to be at least as fast
- Tests were carried out by replacing torch::AutoGradMode enable_grad(requires_grad); with c10::InferenceMode guard(requires_grad); in all ctorch.cpp functions, but ideally both options would be presented to users
This mode was only added (as a beta) in PyTorch 1.9, so we would need to consider support for older versions
The mode is also much stricter than NoGradMode, so cannot be used in all cases

The text was updated successfully, but these errors were encountered:

jatkinson1000 · 2024-04-09T01:30:37Z

Closing as broken down into #112 and #113

ElliottKasoar added the enhancement New feature or request label Dec 15, 2023

This was referenced Apr 9, 2024

Operate under Inference Mode as well as No Grad #112

Open

Model Freezing #113

Open

jatkinson1000 closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimisation options/guidance #81

Add optimisation options/guidance #81

ElliottKasoar commented Dec 15, 2023

jatkinson1000 commented Apr 9, 2024

Add optimisation options/guidance #81

Add optimisation options/guidance #81

Comments

ElliottKasoar commented Dec 15, 2023

jatkinson1000 commented Apr 9, 2024