Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor changes #94

Merged
merged 17 commits into from
Sep 21, 2023
Merged

Refactor changes #94

merged 17 commits into from
Sep 21, 2023

Conversation

PCSwingle
Copy link
Member

@PCSwingle PCSwingle commented Sep 13, 2023

Refactor CodeChange into AbstractChange and ConcreteChange as outlined in #91

Edit: After talking with @biobootloader, we've decided to change the structure quite a bit. This is now the general layout:

Model output -> Parser (this parser can be injected in, so any parser will work) -> new FileEdit format that groups by file and only has 'Replacement' subedits -> User filter (make these changes, keep these ones, don't use these ones, etc.) -> Conflict Resolution (handled by FileEdit; no conflict resoution needed on a per-parser basis). The final list of FileEdits can be converted into a final file's code lines, which is written to the file by the CodeFileManager.

Instead of completely changing the Parser and CodeChange files, I've extracted them into OriginalFormatParser and OriginalFormatChange classes; while we don't necessarily need intermediate CodeChanges for every parser, in this case it helps us use our previous parsing code instead of re-writing it all.

TODO:

  • Make Parser interface and make it easily injectable
  • Add prompts connected to Parser
  • Add tests for FileEdit, especially conflict resolution
  • Fix all TODO's scattered around the code
  • Restructure tests between format specific tests and non-format specific tests

Copy link
Member

@jakethekoenig jakethekoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction here. Thanks!

mentat/conversation.py Show resolved Hide resolved
mentat/parsers/block_parser.py Show resolved Hide resolved
from mentat.prompts import block_parser_prompt


class BlockParser(Parser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called block because the changes are given in "blocks" like this?

@@start
{
    "file": "core/script.py",
    "action": "insert",
    "insert-after-line": 3,
    "insert-before-line": 4
}
@@code
    if name == "Bob":
        print("Nice to see you again!")
@@end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha that is the reasoning; I don't like the name but I spent too long trying to think of a better name! Definitely let me know if you have any better ideas!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is fine. I was worried future formats could also be reasonably called "block" but thinking about it it seems like a lot of future formats we will try will be easier to name e.g. GitDiffParser, FullFileParser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FileBlockParser or PartialFileParser are better names? If we want to support inline edits in the future we can also have InlineFileParser to keep the naming consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure I understand what the File means in this context? Most likely all of our formats will need the model to specify the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OriginalFormatParser would be unambiguous, but BlockParser seems fine for now.

The name really just matters if we end up keeping this format around, which we'd only do if it is the best for some models / scenarios. I don't expect that but if it does happen we'll find a name that distinguishes it well from the other formats

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure I understand what the File means in this context?

"File" would mean a literal file on the file system. Technically GitDiffParser would be pulling from a file too :/

@PCSwingle
Copy link
Member Author

This PR is almost ready to merge; I'm going to wait on #101 to merge and then I'll resolve conflicts.

@PCSwingle PCSwingle marked this pull request as ready for review September 20, 2023 19:19
@jakethekoenig
Copy link
Member

This PR is almost ready to merge; I'm going to wait on #101 to merge and then I'll resolve conflicts.

Are the todos still in file_edit for a future PR?

@PCSwingle
Copy link
Member Author

This PR is almost ready to merge; I'm going to wait on #101 to merge and then I'll resolve conflicts.

Are the todos still in file_edit for a future PR?

The Replacement 'owner' TODO yes; the current parsing format has a one to one 'change' to Replacement ratio, but eventually we might have a format that creates 2 replacements that we want to represent as one 'change' that the user can filter. It isn't a big addition though and just complicates a few things, so I figured we might as well add that when we need it. As for the conflict resolution TODO's, I personally don't think user input for resolving them is super important, but if you or @biobootloader think we should get that done before merging this PR I'm 100% ok with doing it now.

else:
stored_lines = []

if file_edit.rename_file_path is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use git mv <oldname> <newname> here, instead of adding and deleting. With the current method, GitHub would show a bunch of deletions and additions, instead of just a rename.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not entirely sure about this because that stages the rename; I don't think we want mentat to stage anything unless the user specifically asks us to; and of course when the user eventually stages the changes git will recognize the rename either way. It is nice having git recognize it immediately though. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah we don't want to stage. I believe git mv forces git to recognize it as a rename (which it sometimes fails, if similarity isn't high enough). Let's leave as is for now so we don't stage. Perhaps in the future we could have Mentat "remember" a rename and so when it commits for the user it'll inform git

mentat/conversation.py Show resolved Hide resolved
other.ending_line = replacement.starting_line
other.starting_line = min(other.starting_line, other.ending_line)

def get_file_lines(self, file_lines: list[str]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more descriptive name might be helpful here, maybe like edit_file_lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that makes it sound a bit like it's editing the file; how about get_new_file_lines? Or get_edited_file_lines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_updated_file_lines ?

pyrightconfig.json Show resolved Hide resolved
mentat/parsers/block_parser.py Show resolved Hide resolved
from mentat.prompts import block_parser_prompt


class BlockParser(Parser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FileBlockParser or PartialFileParser are better names? If we want to support inline edits in the future we can also have InlineFileParser to keep the naming consistent.

mentat/parsers/file_edit.py Show resolved Hide resolved
mentat/parsers/original_format/original_format_change.py Outdated Show resolved Hide resolved
mentat/parsers/file_edit.py Show resolved Hide resolved
mentat/parsers/block_parser.py Show resolved Hide resolved
Copy link
Contributor

@waydegg waydegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

@waydegg waydegg dismissed their stale review September 21, 2023 01:07

Didn't mean to request changes, just wanted to leave general feedback

@biobootloader
Copy link
Member

Thanks @PCSwingle for this important and complicated refactor! And thanks everyone for the quick reviews 🚀

I'll take a closer look later today after you've had a chance to address comments, let me know when you're ready for that!

I'm excited to try some new output formats!

Comment on lines 177 to 179
# TODO: Ask user for conflict resolution
other.ending_line = replacement.starting_line
other.starting_line = min(other.starting_line, other.ending_line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's print something here so the user knows what happened. Perhaps just say "these two changes conflicted, they've been auto-merged back to back, double check: {the changes back to back}". From the user's perspective they may have seen each change individually and thought everything looked good.

self.replacements.sort(reverse=True)
for index, replacement in enumerate(self.replacements):
for other in self.replacements[index + 1 :]:
# TODO: another type of conflict (not caught here) would be both replacements being inserts on same line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what happens in that scenario now? does Mentat crash? or does it just randomly put one first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it just puts the insertion first in the list first and the next insertion last

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we treat that scenario the same as the other conflict scenario then, and alert / display to the user what's happening

Copy link
Member

@biobootloader biobootloader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff! Feel free to merge after addressing comments

@PCSwingle PCSwingle merged commit f791458 into main Sep 21, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants