-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial preemption for groups with multiple sequences #574
Partial preemption for groups with multiple sequences #574
Conversation
LGTM |
total_num_released_blocks += released_blocks; | ||
prev_blocks_count = m_block_manager.num_free_blocks(); | ||
|
||
if (num_running_sequences > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to distinguish such cases? Looks like multiple sequences within a group is more generic case and should cover single sequence case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary to distinguish. But for single sequence case we can release blocks more efficiently than in general case using resize and not release blocks layer by layer:
openvino.genai/src/cpp/src/block_manager.hpp
Line 413 in 9d35767
m_block_table[seq_id].resize(m_block_table[seq_id].size() - block_num); |
So it was distinguished only in terms of efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed distinguishing between these cases in PR as discussed.
const bool free_group_partially_multiple_runnning_sequence(SequenceGroup::Ptr sequence_group, size_t num_required_blocks, size_t& phisical_blocks_released, size_t& logical_blocks_released) { | ||
phisical_blocks_released = 0; | ||
logical_blocks_released = 0; | ||
while (num_required_blocks > phisical_blocks_released) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if case we need to preempt very long sequence, such loop can be expensive.
If it's possible, it would be great to compute a number of preempted blocks based on required number of blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified this method using formula blocks_to_remove_per_sequence == ceil(num_required_blocks / sequence_num)
.
@@ -110,6 +110,78 @@ class BlockManager { | |||
return m_block_table[seq_id]; | |||
} | |||
|
|||
const size_t free_rightest_blocks(SequenceGroup::Ptr sequence_group) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this (or some other) methods are not planned to be used as public API, let's move them to private section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this method.
Ticket: 140648