Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII response from Gemini is partially broken in VScode extension #1492

Closed
3 tasks done
reosablo opened this issue Jun 15, 2024 · 2 comments
Closed
3 tasks done
Labels
ide:vscode Relates specifically to VS Code extension kind:bug Indicates an unexpected problem or unintended behavior

Comments

@reosablo
Copy link
Contributor

Before submitting your bug report

Relevant environment info

- OS: Windows 11 23H2
- Continue: v0.8.40
- IDE: VSCode 1.91.0-insider
- Model: gemini-1.5-pro-latest

Description

A description of the bug

I'm encountering an issue where responses from Gemini with non-ASCII characters are garbled. This doesn't seem to happen with responses from Groq.

What you expected to happen

Non-ASCII character responses should be displayed correctly, without any garbled characters or replacement characters (like "�").

What actually happened

Currently, non-ASCII characters are being replaced with the replacement character "�". This happens consistently.

For example, the following response is affected:

zh-cn.ts���ァイルの一部ですね。 これは中国語の簡体字で������れ��コードで、音声操作に関するUIのテキストのようです。

日本語のメッセージにする場合、どのような文脈で表示されるかを考慮する必要があります。 例えば、以下のように状況を想定して、より自然で適切な日本語���を検討できます。

Screenshots or videos

screenshot

Possible solutions

I suspect this is because the buffer is being treated as a string, rather than an ArrayBuffer. Since Gemini responses may contain incomplete Unicode bytes, using a string buffer could be causing the corruption.

solution 1: change buffer from string to ArrayBuffer in streamChatGemini function

let buffer = "";

solution 2: use TextDecoderStream instead of TextDecoder in streamResponse function.

const stream = response.body as any;
const decoder = new TextDecoder("utf-8");
for await (const chunk of stream) {
yield decoder.decode(chunk);
}

yield* response.body.pipeThrough(new TextDecoderStream());

To reproduce

Ask some questions in the chat panel in Japanese.

Log output

No output during chat response.
@reosablo reosablo added the bug label Jun 15, 2024
@reosablo
Copy link
Contributor Author

I tried TextDecoderStream and it seems to work fine.

I'll create PR.

// core/llm/stream.ts
export async function* streamResponse(
  response: Response,
): AsyncGenerator<string> {
  if (response.status !== 200) {
    throw new Error(await response.text());
  }

  if (!response.body) {
    throw new Error("No response body returned.");
  }

  // `response` doesn't seem to be an instance of globalThis.Response and
  // TypeScript doesn't seem to know ReadableStream has `from` method.
  const stream = (ReadableStream as any).from(response.body);

  // The type of stream is any, not ReadableStream.
  // So we don't need "DOM.AsyncIterable" lib for this line.
  yield* stream.pipeThrough(new TextDecoderStream());
}

@sestinj
Copy link
Contributor

sestinj commented Jun 16, 2024

Thanks for the PR @reosablo !

@sestinj sestinj closed this as completed Jun 16, 2024
@dosubot dosubot bot added kind:bug Indicates an unexpected problem or unintended behavior ide:vscode Relates specifically to VS Code extension labels Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ide:vscode Relates specifically to VS Code extension kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants