Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give Leo information about when text is being truncated so it can better formulate an answer. #33006

Closed
Tracked by #35733
bbondy opened this issue Sep 14, 2023 · 7 comments
Assignees
Labels
closed/wontfix OS/Android Fixes related to Android browser functionality OS/Desktop priority/P3 The next thing for us to work on. It'll ride the trains.

Comments

@bbondy
Copy link
Member

bbondy commented Sep 14, 2023

Could we do better at making the model aware that it didn’t consume the full content? I didn’t verify but I’m assuming it stopped at 13 because the text got truncated?
https://music.youtube.com/playlist?list=OLAK5uy_lm33KX5L-19O27gjy9IrZCQJEFxx00WMQ

Screenshot 2023-09-13 at 4 05 41 PM

@bbondy bbondy added priority/P3 The next thing for us to work on. It'll ride the trains. OS/Android Fixes related to Android browser functionality OS/Desktop labels Sep 14, 2023
@bbondy bbondy moved this to Todo in Browser AI Sep 14, 2023
@petemill
Copy link
Member

Is it enough to make the user aware it's cut off? #31405

@bbondy
Copy link
Member Author

bbondy commented Sep 27, 2023

For MVP? I think so

@bbondy bbondy moved this from Todo to Important / Polish in Browser AI Oct 26, 2023
@stevelaskaridis
Copy link

stevelaskaridis commented Nov 28, 2023

After some testing, here are my findings:

List of links tested

Methodology

Tested summary plus a few questions about the context inside and outside of scope.

Prompt changes tested:

(changes signified in bold)

  • Mode 1 (prepend): This is part of an article/transcript within <article> tags: [...]
  • Mode 2 (apppend): [...] You have only seen part of the input.
  • Mode 3 (seed): [/INST] You have only read part of this input. Here is your response:

Models checked

  • Llama-2-13B-chat
  • Llama-2-70B-chat

Results

Mode 1 - Prepend

Works okay, does not make much of a difference, unless you explicitly ask if the whole content was consumed. 13B model does not answer that correctly (assumes the whole article was read).

Mode 2 - Append

Works okay. Stronger conditioning that the whole output was not consumed.
Sometimes, it makes the model not to respond properly.

Mode 3 - Seed

This did not work at all. It affects the task completion significantly.

Recommendation

Making the prompt truncation aware does not monotonically change the behaviour towards better responses. However, under some queries, it may yield that the whole content was not consumed instead of e.g. the cutoff point of training or lack of access to real information. The downside is that I also saw the behaviour of refusing to respond due to "ethics, etc.".

For Claude-Instant, I did not test thoroughly as the context limit is quite high anyways.

If we are to integrated it, I would vote for Mode 2. I can push the changes made. QA can further test more wide into the impact of this prompt change.

@bbondy
Copy link
Member Author

bbondy commented Nov 29, 2023

An option is to just close this with that investigation too. Do you recommend that or would you prefer Mode 2?

@bbondy
Copy link
Member Author

bbondy commented Nov 29, 2023

Keeping in mind that we do warn the user about it now even if the model doesn't know.

@stevelaskaridis
Copy link

Given that Llama-13B does not particularly "care" about this prompt change anyways (which is our public offering), I propose that we close this for now.

@bbondy
Copy link
Member Author

bbondy commented Nov 29, 2023

thanks for checking 👍

@bbondy bbondy closed this as completed Nov 29, 2023
@github-project-automation github-project-automation bot moved this from Important / Polish to Done in Browser AI Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed/wontfix OS/Android Fixes related to Android browser functionality OS/Desktop priority/P3 The next thing for us to work on. It'll ride the trains.
Projects
Status: Done
Development

No branches or pull requests

4 participants