Prefix caching improvements #758

popovaan · 2024-08-09T09:33:05Z

Applied comments from #675

src/cpp/src/sequence_group.hpp

…provements

src/cpp/src/continuous_batching_pipeline.cpp

src/cpp/src/sequence_group.hpp

src/cpp/src/sequence_group.cpp

ilya-lavrenov · 2024-08-13T09:21:53Z

src/cpp/src/sequence_group.cpp

+namespace ov {
+namespace genai {
+size_t Sequence::_make_hash(size_t content_length) {
+        auto sequence_group = get_sequence_group_ptr();


do we need to add an assert that content_length corresponds to last uncomputed block? E.g. if we have 1 block with has, but content_length is 3x of block_size.

E.g. block_start_idx / block_size == m_prefix_hashes.size()

There is a case when block_start_idx / block_size < m_prefix_hashes.size().
When we restore blocks of prompt first we check hash of full block and it is saved in m_prefix_hashes, then if we couldn't find hash of full block in cashed_blocks we check hashes of partially completed content of this block. So content_length in this case is less than m_prefix_hashes.size() * block_size.

So I added assert block_start_idx / block_size <= m_prefix_hashes.size().

src/cpp/src/sequence_group.cpp

popovaan added 4 commits August 7, 2024 10:33

Applied review comments.

c97e69c

Minor correction.

d869221

Optimized hash computation.

7f3e4f6

Moved restoring of blocks to the requests creation method.

73de969

ilya-lavrenov reviewed Aug 9, 2024

View reviewed changes

src/cpp/src/sequence_group.hpp Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into prefix_caching_im…

7e6e2e5

…provements

ilya-lavrenov self-assigned this Aug 9, 2024

ilya-lavrenov added this to the 2024.4 milestone Aug 9, 2024

popovaan added 4 commits August 12, 2024 16:32

Added weak_ptr to sequence group in Sequence.

f9fe637

Merge remote-tracking branch 'upstream/master' into prefix_caching_im…

9fd7dd3

…provements

Code format.

9fcf2eb

Code format.

adfd7e6

ilya-lavrenov reviewed Aug 13, 2024

View reviewed changes

popovaan added 3 commits August 13, 2024 17:36

Tests cmake fix.

acd443b

Minor correction.

9f0bed2

Removed enable_prefix_caching flag from Sequence, changed conditions.

006b144

ilya-lavrenov approved these changes Aug 14, 2024

View reviewed changes

ilya-lavrenov added this pull request to the merge queue Aug 14, 2024

Merged via the queue into openvinotoolkit:master with commit 762fc93 Aug 14, 2024
33 checks passed

ilya-lavrenov added the category: continuous batching Continuous batching label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefix caching improvements #758

Prefix caching improvements #758

popovaan commented Aug 9, 2024

ilya-lavrenov Aug 13, 2024

popovaan Aug 14, 2024

Prefix caching improvements #758

Prefix caching improvements #758

Conversation

popovaan commented Aug 9, 2024

ilya-lavrenov Aug 13, 2024

Choose a reason for hiding this comment

popovaan Aug 14, 2024

Choose a reason for hiding this comment