Make responses start faster by removing unnecessary cleanup calls #6625

oobabooga · 2025-01-01T21:33:15Z

The clear_torch_cache() function takes about 0.08 seconds to run because it includes a call to gc.collect(). Previously, this function was called twice before each generation to address memory leaks in Transformers during text streaming.

Changes made:

Removed all clear_torch_cache() calls for loaders other than Transformers, saving approximately 0.2 seconds per generation and making replies start faster both in the UI and the API.
Reduced the calls to clear_torch_cache() for Transformers from two to one, cutting the time spent on this function by half.

…babooga#6625)

oobabooga added 2 commits January 1, 2025 13:11

Reduce the number of clear_torch_cache() calls

fad6b79

Add clear_torch_cache() when loading a model

c17cdb1

oobabooga merged commit 7b88724 into dev Jan 1, 2025

oobabooga deleted the faster-reply branch January 5, 2025 14:59

jfmherokiller pushed a commit to jfmherokiller/text-generation-webui that referenced this pull request Jan 15, 2025

Make responses start faster by removing unnecessary cleanup calls (oo…

7f3ec34

…babooga#6625)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make responses start faster by removing unnecessary cleanup calls #6625

Make responses start faster by removing unnecessary cleanup calls #6625

oobabooga commented Jan 1, 2025

Make responses start faster by removing unnecessary cleanup calls #6625

Make responses start faster by removing unnecessary cleanup calls #6625

Conversation

oobabooga commented Jan 1, 2025