-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a choice of how to end streaming from callback: STOP or CANCEL #1476
base: master
Are you sure you want to change the base?
Conversation
TODO: add CANCEL for ContinuousBatching |
454cdd9
to
1592ed0
Compare
10a755b
to
d18fe16
Compare
done |
2758f6b
to
03ca3ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add tests for new functionality.
print(subword, end='', flush=True) | ||
# Return flag corresponds whether generation should be stopped. | ||
# False means continue generation. | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, should we also support callback w/o return value?
E.g. when user don't care about any stop / cancellation
std::cout << word << std::flush; | ||
// Return flag corresponds whether generation should be stopped. | ||
// false means continue generation. | ||
return false; | ||
|
||
return ov::genai::StreamerRunningStatus::RUNNING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ov::genai::StreamerRunningStatus::RUNNING; | |
return ov::genai::StreamingStatus::CONTINUE; |
@@ -30,6 +31,9 @@ struct EncodedGenerationResult { | |||
|
|||
// Status of generation | |||
GenerationStatus m_status = GenerationStatus::RUNNING; | |||
|
|||
// Status of streaming | |||
StreamerRunningStatus m_streaming_status = ov::genai::StreamerRunningStatus::UNDEF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can extend GenerationStatus
? E.g. DROPPED_BY_HANDLE means STOP in its current implementation, while for CANCEL we can add a new value.
BTW, looks like we can drop DROPPED_BY_PIPELINE
as unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More thoughts:
- maybe we should deprecated drop() method and introduce stop() instead
- similarly for GenerationStatus
- and extend both GenerationHandle and GenerationStatus with cancel() functionality
In this case CB and LLM pipelines logic / naming will be aligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to GenerationStatus
@@ -15,7 +15,7 @@ | |||
RawPerfMetrics, | |||
PerfMetrics, | |||
StreamerBase, | |||
get_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -11,12 +11,17 @@ namespace genai { | |||
|
|||
class TextCallbackStreamer: public StreamerBase { | |||
public: | |||
StreamerRunningStatus streaming_status = StreamerRunningStatus::UNDEF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I see StreamerBase
already contains this field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
CANCEL = 3 // Stop generate, drop last prompt and all generated tokens from history, KV cache include history but last step | ||
}; | ||
|
||
using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>; | |
using CallbackTypeVariant = std::variant<void, bool, StreamerRunningStatus>; |
to support callback which just "prints"
@@ -22,6 +34,10 @@ class OPENVINO_GENAI_EXPORTS StreamerBase { | |||
/// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one | |||
virtual void end() = 0; | |||
|
|||
virtual StreamerRunningStatus get_finish_streaming_reason() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual StreamerRunningStatus get_finish_streaming_reason() { | |
StreamingStatus get_streaming_status() { |
... | ||
@typing.overload | ||
def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | StreamerBase | None = None) -> list[GenerationResult]: | ||
def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | typing.Callable[[str], ...] | StreamerBase | None = None) -> list[GenerationResult]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we propagate StreamingStatus to Python API? to use enum instead of str
|
||
namespace ov { | ||
namespace genai { | ||
|
||
enum class StreamerRunningStatus { | ||
UNDEF = 0, // Streaming is not run | ||
RUNNING = 1, // Continue to run of inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUNNING and UNDEF seem to be equivalent. In that case you should keep only one of them. Moreover callback should never return UNDEF, so merging them fixes the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it , moved to GenerationStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging with GenerationStatus
allows a callback to return FINISHED and IGNORED which aren't related to this. I'd guess #1476 (comment) was about aligning API but not merging. @ilya-lavrenov, is that so?
17a9501
to
8975221
Compare
No description provided.