Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CodeChange code #91

Closed
PCSwingle opened this issue Sep 12, 2023 · 4 comments
Closed

Refactor CodeChange code #91

PCSwingle opened this issue Sep 12, 2023 · 4 comments
Assignees

Comments

@PCSwingle
Copy link
Member

Refactoring CodeChange into a concrete and abstract format; the concrete format would manage parsing the model's output and would be able to convert into an abstract change. The abstract change would use an internal format and would handle actually changing files. This would allow us to easily create new formats and test them out, without having to change anything outside of the parsing (and maybe streaming?) logic. After talking with @biobootloader here is the current plan:

Overview

Concrete Change: Contains logic for parsing different model formats; can be converted to an abstract change. Can also be converted from an abstract change so that we can automatically generate examples for different models (and possibly tests) from a single example set.

Additions/Deletions: Make up an Abstract Change. These will be the simplest possible changes, either adding a block of code at a certain line or deleting a certain block of code. Ideally any conflicts will be resolved by the concrete change parser, but even if they aren't the Additions and Deletions will be processed in order and so conflicts shouldn't be an issue.

Abstract Change: Will be stored per file; each file will contain a list of Additions and Deletions; additionally, each Addition and Deletion will be marked with a specific Concrete Change; this will allow the user to use interactive mode to accept or reject (and eventually maybe ask the model to change?) specific changes that it outputs.

Plan

  1. Create Abstract Changes, Additions/Deletions, and modify CodeFileManager to use them when applying changes
  2. Create tests for Abstract Changes
  3. Create Concrete Change basic interface
  4. Look into seeing if possible to have unified streaming logic for different Concrete Change formats; if so, create it
  5. Migrate current format parsing and streaming over to Concrete Change
  6. Test current format Concrete Change
  7. Create two-way conversion between Abstract Change and Concrete Change
  8. Add testing for conversion
  9. Add un-parsing for current format Concrete Change -- this will be used for tests and prompt examples
  10. Create Abstract Change examples and tests, and use automatic conversion to Concrete Change all the way to pre-parsed text for both prompt examples and tests

Possible alternative?

Another design choice we could have made is to have Abstract Changes represent an entire file, and simply be the text of the file after applying all changes rather than having Additions and Deletions; this would resolve some problems with conflicts and renaming, but wouldn't let us be able to reference specific Concrete Changes from the Abstract Changes, meaning we would mostly likely end up simply using hunks (groups of changed lines) in interactive mode. For now, @biobootloader and I have decided to go with the previous plan to allow interactive mode to be more powerful.

@PCSwingle PCSwingle self-assigned this Sep 12, 2023
@PCSwingle
Copy link
Member Author

PCSwingle commented Sep 12, 2023

Slight change to the plan for AbstractChange after thinking and working on it for a bit: Planning on having Additions, Deletions, and Rename subchanges -- the Rename change will have a new name where None will represent a deletion, and a Rename when the file does not exist will represent a creation

@PCSwingle PCSwingle mentioned this issue Sep 13, 2023
5 tasks
@jakethekoenig
Copy link
Member

I really like the idea of decoupling the LLM specified file change format and our internal file change object. I wonder if instead of calling them ConcreteChange and AbstractChange we might want to call them something like LlmResponseFormat (or LlmChangeParser?) and FileChange. As I understand it the responsibilities/differences of the two objects are:

ConcreteChange or LlmResponseFormat

  • Provide a prompt which describes the format to specify a change in. Like is currently done here.
  • Provide a function which parses responses which have been received from an Llm prompted with the LlmResponseFormat's prompt block and returns a collection of FileChange objects.
  • We may make many of these and experiment with different prompts for different Llms and evaluate things like how consistently Llms can respond in the correct format, how many tokens they use both in prompt and response and if they affect response quality.
    (Maybe you imagine the ConcreteChange isn't the LlmResponseFormat which is a third thing but is the actual response describing a file change received from the Llm?)

AbstractChange or FileChange

  • Represents a change to a file precisely with all the information you'd need to make the change e.g. path, line #, text to insert.
  • Has a function to pretty print it for a user to accept/reject.
  • Has a function which actually makes the edit.
  • Once we settle on a design for this object it's unlikely to change much compared to LlmResponseFormat where there's a lot of experimentation to do.

Let me know it my summary and understanding aren't quite right but I think the dichotomy is more like Raw/Parsed than Concrete/Abstract.

@PCSwingle
Copy link
Member Author

@jakethekoenig completely agree! Last week while I was making these changes I realized that concrete change wasn't really needed, and mapped out a completely different architecture that I really like with @biobootloader. I'll keep this issue as-is, but in #94 I lay out the new structure we agreed upon; it pretty much turns the AbstractChange into FileEdit (although FileChange would be a good name too) and removes the ConcreteChange completely.

@waydegg
Copy link
Contributor

waydegg commented Oct 4, 2023

Can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants