Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Jan 3, 2025

No description provided.

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: speculative decoding Speculative decoding category: GenAI C++ API Changes in GenAI C++ public headers no-match-files category: prompt lookup labels Jan 3, 2025
@sbalandi
Copy link
Contributor Author

sbalandi commented Jan 3, 2025

TODO: add CANCEL for ContinuousBatching

@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Jan 4, 2025
@ilya-lavrenov ilya-lavrenov self-assigned this Jan 6, 2025
@sbalandi sbalandi force-pushed the callback branch 5 times, most recently from 454cdd9 to 1592ed0 Compare January 8, 2025 19:38
@github-actions github-actions bot added category: Python API Python API for GenAI category: samples GenAI samples labels Jan 8, 2025
@sbalandi sbalandi force-pushed the callback branch 3 times, most recently from 10a755b to d18fe16 Compare January 8, 2025 22:19
@sbalandi
Copy link
Contributor Author

sbalandi commented Jan 8, 2025

TODO: add CANCEL for ContinuousBatching

done

@sbalandi sbalandi marked this pull request as ready for review January 8, 2025 22:43
@sbalandi sbalandi force-pushed the callback branch 3 times, most recently from 2758f6b to 03ca3ce Compare January 9, 2025 21:56
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add tests for new functionality.

print(subword, end='', flush=True)
# Return flag corresponds whether generation should be stopped.
# False means continue generation.
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, should we also support callback w/o return value?

E.g. when user don't care about any stop / cancellation

std::cout << word << std::flush;
// Return flag corresponds whether generation should be stopped.
// false means continue generation.
return false;

return ov::genai::StreamerRunningStatus::RUNNING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ov::genai::StreamerRunningStatus::RUNNING;
return ov::genai::StreamingStatus::CONTINUE;

@@ -30,6 +31,9 @@ struct EncodedGenerationResult {

// Status of generation
GenerationStatus m_status = GenerationStatus::RUNNING;

// Status of streaming
StreamerRunningStatus m_streaming_status = ov::genai::StreamerRunningStatus::UNDEF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can extend GenerationStatus ? E.g. DROPPED_BY_HANDLE means STOP in its current implementation, while for CANCEL we can add a new value.

BTW, looks like we can drop DROPPED_BY_PIPELINE as unused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts:

  • maybe we should deprecated drop() method and introduce stop() instead
  • similarly for GenerationStatus
  • and extend both GenerationHandle and GenerationStatus with cancel() functionality

In this case CB and LLM pipelines logic / naming will be aligned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to GenerationStatus

@@ -15,7 +15,7 @@
RawPerfMetrics,
PerfMetrics,
StreamerBase,
get_version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -11,12 +11,17 @@ namespace genai {

class TextCallbackStreamer: public StreamerBase {
public:
StreamerRunningStatus streaming_status = StreamerRunningStatus::UNDEF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I see StreamerBase already contains this field ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

CANCEL = 3 // Stop generate, drop last prompt and all generated tokens from history, KV cache include history but last step
};

using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>;
using CallbackTypeVariant = std::variant<void, bool, StreamerRunningStatus>;

to support callback which just "prints"

@@ -22,6 +34,10 @@ class OPENVINO_GENAI_EXPORTS StreamerBase {
/// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one
virtual void end() = 0;

virtual StreamerRunningStatus get_finish_streaming_reason() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual StreamerRunningStatus get_finish_streaming_reason() {
StreamingStatus get_streaming_status() {

...
@typing.overload
def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | StreamerBase | None = None) -> list[GenerationResult]:
def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | typing.Callable[[str], ...] | StreamerBase | None = None) -> list[GenerationResult]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we propagate StreamingStatus to Python API? to use enum instead of str

@andrei-kochin andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025
src/cpp/include/openvino/genai/streamer_base.hpp Outdated Show resolved Hide resolved
src/cpp/src/visual_language/pipeline.cpp Outdated Show resolved Hide resolved

namespace ov {
namespace genai {

enum class StreamerRunningStatus {
UNDEF = 0, // Streaming is not run
RUNNING = 1, // Continue to run of inference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUNNING and UNDEF seem to be equivalent. In that case you should keep only one of them. Moreover callback should never return UNDEF, so merging them fixes the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it , moved to GenerationStatus

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging with GenerationStatus allows a callback to return FINISHED and IGNORED which aren't related to this. I'd guess #1476 (comment) was about aligning API but not merging. @ilya-lavrenov, is that so?

@sbalandi sbalandi force-pushed the callback branch 11 times, most recently from 17a9501 to 8975221 Compare January 21, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GenAI C++ API Changes in GenAI C++ public headers category: LLM LLM pipeline (stateful, static) category: prompt lookup category: Python API Python API for GenAI category: samples GenAI samples category: speculative decoding Speculative decoding category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants