Create Evals #130

mathewpareles · 2024-10-28T12:19:14Z

We want to create evals for judging how well our LLMs perform on inline completions (ctrl+K), whole-file edits (ctrl+L), and autocomplete (tab).

A good starting task is to find an open-source eval for judging ctrl+L file completions (where the LLM rewrites a file given instructions). If you know of a high-quality data set for any of these tasks, we'd love to hear about it.

andrewpareles added the new feature New feature or request label Oct 28, 2024

mathewpareles self-assigned this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Evals #130

Create Evals #130

mathewpareles commented Oct 28, 2024 •

edited

Loading

Create Evals #130

Create Evals #130

Comments

mathewpareles commented Oct 28, 2024 • edited Loading

mathewpareles commented Oct 28, 2024 •

edited

Loading