-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Continuous batching] Add finish reason to generation output #725
Conversation
@@ -32,6 +32,12 @@ struct EncodedGenerationResult { | |||
GenerationStatus m_status = GenerationStatus::RUNNING; | |||
}; | |||
|
|||
enum class GenerationFinishReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this new enum? I mean that GenerationStatus is more generic status, which can be extended to support FINISHED_BY_LENGHT and FINISHED_BY_STOP instead of generic FINISHED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenerationStatus
is a status for the whole request while GenerationFinishReason
is per sequence, so for example when we have beam search or n > 1 we can read generation status for entire request (that would be finished) and get finish reason for every output separately (in case some of the beams hit max_new_tokens and some of them stopped naturally due to EOS token for example).
I also considered using SequenceStatus
, but this class is internal and we need a one that's available for the user.
Introducing additional information about generation finish reason to generation outputs. This allows supporting
finish_reason
field in OpenAI completion and chat completion response in OVMS.