-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chat: Evaluate prompt changes for smart apply #16
base: main
Are you sure you want to change the base?
Conversation
cc @abeatrix @jtibshirani Here's the eval results from the smart apply prompt changes. I made a small tweak to the prompt here sourcegraph/cody#5250 (comment) |
## Description Enables the "smart apply" feature by default, removing the feature flag. Chat Eval: sourcegraph/cody-leaderboard#16 Looker Dashboard: https://sourcegraph.looker.com/looks/2016 ## Test plan Test Smart Apply on: - [x] PLG - [x] Enterprise (with different model variants) - [X] JetBrains (with WebView enabled) - Note: Further testing will be done by the JetBrains team when the WebView is fully implemented <!-- Required. See https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles. -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this!
@@ -2,9 +2,24 @@ | |||
# CHAT | |||
# CHAT_QUESTION what is squirrel? | |||
|
|||
# CHAT_REPLY Based on the provided code context, Squirrel is an interface that represents a code intelligence API. It has a single method `symbolInformation` that takes a `file` string as input and returns an object of type `CodeIntelligence`, which contains an array of strings called `symbols`. | |||
# CHAT_REPLY ```typescript:mocked/enhanced_context/squirrel.ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did anything change here besides your prompt? Like how we pass context? Compared to before, it looks like we're much more likely to return quoted code snippets in the response. This response now contains code snippets, as well as several subsequent ones. (That's not necessarily bad, just trying to understand!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because we're including this at the end of a transcript, kinda similar to how we append "Answer positively without apologizing."
I'm experimenting with moving this to the system prompt here: sourcegraph/cody#5290
I'll run a few more evals and update this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've updated the prompt and eval here and, although this response is pretty similar, I think it's much better at providing code snippets when they are relevant.
There were a bunch of issues with the previous prompt where the LLM would prioritise returning a code snippet unnecessarily, it seems much better now (still going through the diff though)
# CHAT_REPLY | ||
# CHAT_REPLY ```regex:mocked/enhanced_context/regex.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another example where we're inventing a new file/ code snippet. It's not super helpful in this case, and makes the answer more verbose 🤔
# CHAT_REPLY | ||
# CHAT_REPLY The eval.go file contains the core search functionality of Zoekt. It defines various data structures and functions for simplifying and evaluating search queries, gathering matches from indexed documents, scoring and ranking search results, and handling different types of queries (e.g., regular expressions, substring matches, branch filters). | ||
# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This file seems to be a TypeScript file, but the provided context is incomplete, so it's difficult to determine its purpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the LLM is likely keener to mention files by their full file path with this change. I don't think it's necessarily a bad thing, as ideally we should link these files up in the chat response. (e.g. I know Copilot does this)
output/chat/gemini-1.5-flash-latest/input/chat/question_018/question.yaml
Outdated
Show resolved
Hide resolved
@jtibshirani Thanks for looking through this! I've made some more iterations through this PR. I think it is much better with this new prompt, still reliable producing file paths but does not put as much of a priority on producing a code block as the previous one did. That being said, I did notice some things:
|
0586f19
to
ff4763a
Compare
…#5290) ## Description This PR improves the smart apply prompt tweak so that: - We use the system prompt when available - We use the same desired Markdown syntax when _we_ include codebase context, which helps steer the LLM onto the right path. I have ran an eval on this here: sourcegraph/cody-leaderboard#16 ## Test plan - [x] Tested creating code blocks on all major models <!-- Required. See https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles. --> --------- Co-authored-by: Beatrix <[email protected]>
…#5290) ## Description This PR improves the smart apply prompt tweak so that: - We use the system prompt when available - We use the same desired Markdown syntax when _we_ include codebase context, which helps steer the LLM onto the right path. I have ran an eval on this here: sourcegraph/cody-leaderboard#16 ## Test plan - [x] Tested creating code blocks on all major models <!-- Required. See https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles. --> --------- Co-authored-by: Beatrix <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest results look good! I noticed a couple places where we surprisingly regressed. But I am not super concerned, because Claude Sonnet 3.5 still looks good for these questions.
# CHAT_REPLY The README also mentions that this is a fork of the original Google Zoekt repository (https://github.com/google/zoekt) by Sourcegraph (https://github.com/sourcegraph/zoekt). It outlines some of the differences between this fork and the original, such as the addition of a `zoekt-sourcegraph-indexserver` command, exposing the API via `keegancsmith/rpc`, support for searching across branches/repos, and returning symbol/ctag data in results. | ||
# CHAT_REPLY | ||
# CHAT_REPLY However, the actual source code files are not provided in the context. The README only gives an overview of the project and instructions for using the various components. | ||
# CHAT_REPLY Based on the provided README file, I do not have direct access to any code files. The README provides an overview of the Zoekt code search engine project and instructions for downloading, indexing, searching, and running the Zoekt services. However, it does not include the actual source code files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slight regression, but seems okay. (Weird that this would change in the first place 🤔)
# CHAT_REPLY This regex pattern would match strings like `"nb"`, `"nc"`, `"nd"`, `"nf"`, etc. | ||
# CHAT_REPLY | ||
# CHAT_REPLY Without more context about the intended use case, it's difficult to determine the exact meaning of the provided regex pattern. | ||
# CHAT_REPLY Unfortunately, there is no regex present in the provided code context. The files contain TypeScript interface definitions, Go code related to rate limiting and concurrency handling, and some comments explaining the purpose of the code. Without a specific regex pattern provided, I cannot analyze its functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a regression. This question tests whether we can answer general programming questions even if there is irrelevant context. Now, we refuse to answer the question. I wonder why this changed!
# CHAT_REPLY | ||
# CHAT_REPLY The README.md file provides an overview of Zoekt, including instructions for downloading, indexing, and searching code repositories using Zoekt. It also mentions that Zoekt supports symbol search and can be integrated with a web interface or JSON API. | ||
# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This appears to be a TypeScript file, but the provided context doesn't give enough information to determine its purpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're including code blocks more often, but the examples I'm seeing are totally sensible and feel like an improvement.
Description
Evaluating the
smart-apply
prompt changes.This prompt change adds the following prefix to chat messages:
The benefit of this is that it allows us to link code blocks with files in the users' file system