Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix Gemini stream mode? #8998

Closed
AAEE86 opened this issue Oct 3, 2024 · 9 comments
Closed

How to fix Gemini stream mode? #8998

AAEE86 opened this issue Oct 3, 2024 · 9 comments
Labels
🐞 bug Something isn't working

Comments

@AAEE86
Copy link
Contributor

AAEE86 commented Oct 3, 2024

@CXwudi 在config_kwargs中设置
···
config_kwargs.pop("stream", None)
····
Although it can solve the error reporting problem, it is still not stream output. The code _handle_generate_stream_response seems to have a problem.

Originally posted by @AAEE86 in #8678 (comment)

Copy link

dosubot bot commented Oct 3, 2024

To address the issue with the Gemini stream mode not resulting in stream output, you should focus on the _handle_chat_generate_stream_response method. This method is responsible for processing the streaming response. Here are a few things to check and potentially fix:

  1. Ensure the Response is Properly Iterated: The method iterates over the response object, which should be an Iterator. Make sure that the response object passed to this method is indeed an iterator and is yielding data as expected.

  2. Check the Data Parsing: The method attempts to parse each line of the response as JSON. If the response lines are not in the expected format, this could cause issues. Ensure that the response lines are correctly formatted JSON strings.

  3. Handle the End of Stream: The method checks for a line that equals "[DONE]" to determine the end of the stream. Ensure that this condition is correctly handled and that the stream is properly closed or returned when this line is encountered.

  4. Yielding the Stream Data: The method yields LLMResultChunk objects for each piece of data. Ensure that the choice["delta"]["content"] is not empty before yielding, as the current code skips empty content.

Here is a snippet of the relevant code for reference:

def _handle_chat_generate_stream_response(
        self,
        model: str,
        prompt_messages: list[PromptMessage],
        credentials: dict,
        response: Iterator,
    ) -> Generator:
        for line in response:
            if not line:
                continue
            line = line.decode("utf-8")
            # remove the first `data: ` prefix
            if line.startswith("data:"):
                line = line[5:].strip()
            try:
                data = json.loads(line)
            except Exception as e:
                if line.strip() == "[DONE]":
                    return
            choices = data.get("choices", [])

            stop_reason = ""
            for choice in choices:
                if choice.get("finish_reason"):
                    stop_reason = choice["finish_reason"]

                if len(choice["delta"]["content"]) == 0:
                    continue
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(content=choice["delta"]["content"], tool_calls=[]),
                        finish_reason=stop_reason,
                    ),
                )

Ensure that the response from the model is correctly formatted and that the streaming logic is correctly implemented in the _handle_chat_generate_stream_response method [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Oct 3, 2024
@Hisir0909
Copy link
Contributor

image
We need to know how the stream parameter was initially passed in. I can override this stream in the function to fix the stream=True case, but when stream is False, because the external stream defaults to True, the non-streaming return result won't be displayed.

@Hisir0909
Copy link
Contributor

image
When I changed the return value of _handle_generate_response to also be in stream format, the stream=False issue was also fixed. This indicates that the handling of the return value has a conditional branch based on the stream value. However, I'm currently unsure how to pass the original stream value to the modified function.

@Hisir0909
Copy link
Contributor

    def _handle_invoke_result(
        self, invoke_result: LLMResult | Generator
    ) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
        """
        Handle invoke result
        :param invoke_result: invoke result
        :return:
        """
        if isinstance(invoke_result, LLMResult):
            return

Hmm, okay, I found the place to handle it. They didn't consider the stream=false case at all... Even though they require us to have non-streaming handling when adding LLMs, they actually completely ignore the non-streaming return.

@AAEE86
Copy link
Contributor Author

AAEE86 commented Oct 8, 2024

I don't quite understand these. Can you fix this issue? @Hisir0909

@CXwudi
Copy link
Contributor

CXwudi commented Oct 8, 2024

Hi @Hisir0909, I am just adding my one cent here, _handle_generate_response has nothing to do when stream=False. Based on

if stream:
return self._handle_generate_stream_response(model, credentials, response, prompt_messages)
return self._handle_generate_response(model, credentials, response, prompt_messages)

_handle_generate_stream_response is the method handling stream output. However it correctly returns a generator type which really confusing me

@Hisir0909
Copy link
Contributor

Hi @Hisir0909, I am just adding my one cent here, _handle_generate_response has nothing to do when stream=False. Based on

if stream:
return self._handle_generate_stream_response(model, credentials, response, prompt_messages)
return self._handle_generate_response(model, credentials, response, prompt_messages)

_handle_generate_stream_response is the method handling stream output. However it correctly returns a generator type which really confusing me

What do you mean? When stream=False, is _handle_generate_stream_response still being processed here? My current solution is to use a local variable to override the function's stream parameter here, because the LLM node always calls it with stream=True, while the actual stream setting in the YAML configuration is in model_parameters.
image

@CXwudi
Copy link
Contributor

CXwudi commented Oct 9, 2024

I see, my bad. I misunderstood your previous statement

@Hisir0909
Copy link
Contributor

@AAEE86 @CXwudi Please take a look at my submission. Can it resolve the issue, and are my modifications reasonable? 🐸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants