-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor changes #94
Refactor changes #94
Conversation
3f001f8
to
93ef90e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the direction here. Thanks!
from mentat.prompts import block_parser_prompt | ||
|
||
|
||
class BlockParser(Parser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called block because the changes are given in "blocks" like this?
@@start
{
"file": "core/script.py",
"action": "insert",
"insert-after-line": 3,
"insert-before-line": 4
}
@@code
if name == "Bob":
print("Nice to see you again!")
@@end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha that is the reasoning; I don't like the name but I spent too long trying to think of a better name! Definitely let me know if you have any better ideas!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is fine. I was worried future formats could also be reasonably called "block" but thinking about it it seems like a lot of future formats we will try will be easier to name e.g. GitDiffParser, FullFileParser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe FileBlockParser
or PartialFileParser
are better names? If we want to support inline edits in the future we can also have InlineFileParser
to keep the naming consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure I understand what the File means in this context? Most likely all of our formats will need the model to specify the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OriginalFormatParser
would be unambiguous, but BlockParser
seems fine for now.
The name really just matters if we end up keeping this format around, which we'd only do if it is the best for some models / scenarios. I don't expect that but if it does happen we'll find a name that distinguishes it well from the other formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure I understand what the File means in this context?
"File" would mean a literal file on the file system. Technically GitDiffParser would be pulling from a file too :/
This PR is almost ready to merge; I'm going to wait on #101 to merge and then I'll resolve conflicts. |
Are the todos still in |
The Replacement 'owner' TODO yes; the current parsing format has a one to one 'change' to Replacement ratio, but eventually we might have a format that creates 2 replacements that we want to represent as one 'change' that the user can filter. It isn't a big addition though and just complicates a few things, so I figured we might as well add that when we need it. As for the conflict resolution TODO's, I personally don't think user input for resolving them is super important, but if you or @biobootloader think we should get that done before merging this PR I'm 100% ok with doing it now. |
3b74801
to
60d6b82
Compare
else: | ||
stored_lines = [] | ||
|
||
if file_edit.rename_file_path is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to use git mv <oldname> <newname>
here, instead of adding and deleting. With the current method, GitHub would show a bunch of deletions and additions, instead of just a rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not entirely sure about this because that stages the rename; I don't think we want mentat to stage anything unless the user specifically asks us to; and of course when the user eventually stages the changes git will recognize the rename either way. It is nice having git recognize it immediately though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah we don't want to stage. I believe git mv
forces git to recognize it as a rename (which it sometimes fails, if similarity isn't high enough). Let's leave as is for now so we don't stage. Perhaps in the future we could have Mentat "remember" a rename and so when it commits for the user it'll inform git
mentat/parsers/file_edit.py
Outdated
other.ending_line = replacement.starting_line | ||
other.starting_line = min(other.starting_line, other.ending_line) | ||
|
||
def get_file_lines(self, file_lines: list[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more descriptive name might be helpful here, maybe like edit_file_lines
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that makes it sound a bit like it's editing the file; how about get_new_file_lines? Or get_edited_file_lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_updated_file_lines
?
from mentat.prompts import block_parser_prompt | ||
|
||
|
||
class BlockParser(Parser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe FileBlockParser
or PartialFileParser
are better names? If we want to support inline edits in the future we can also have InlineFileParser
to keep the naming consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments!
Didn't mean to request changes, just wanted to leave general feedback
Thanks @PCSwingle for this important and complicated refactor! And thanks everyone for the quick reviews 🚀 I'll take a closer look later today after you've had a chance to address comments, let me know when you're ready for that! I'm excited to try some new output formats! |
mentat/parsers/file_edit.py
Outdated
# TODO: Ask user for conflict resolution | ||
other.ending_line = replacement.starting_line | ||
other.starting_line = min(other.starting_line, other.ending_line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's print something here so the user knows what happened. Perhaps just say "these two changes conflicted, they've been auto-merged back to back, double check: {the changes back to back}". From the user's perspective they may have seen each change individually and thought everything looked good.
mentat/parsers/file_edit.py
Outdated
self.replacements.sort(reverse=True) | ||
for index, replacement in enumerate(self.replacements): | ||
for other in self.replacements[index + 1 :]: | ||
# TODO: another type of conflict (not caught here) would be both replacements being inserts on same line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what happens in that scenario now? does Mentat crash? or does it just randomly put one first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it just puts the insertion first in the list first and the next insertion last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we treat that scenario the same as the other conflict scenario then, and alert / display to the user what's happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff! Feel free to merge after addressing comments
Refactor CodeChange into AbstractChange and ConcreteChange as outlined in #91
Edit: After talking with @biobootloader, we've decided to change the structure quite a bit. This is now the general layout:
Model output -> Parser (this parser can be injected in, so any parser will work) -> new FileEdit format that groups by file and only has 'Replacement' subedits -> User filter (make these changes, keep these ones, don't use these ones, etc.) -> Conflict Resolution (handled by FileEdit; no conflict resoution needed on a per-parser basis). The final list of FileEdits can be converted into a final file's code lines, which is written to the file by the CodeFileManager.
Instead of completely changing the Parser and CodeChange files, I've extracted them into OriginalFormatParser and OriginalFormatChange classes; while we don't necessarily need intermediate CodeChanges for every parser, in this case it helps us use our previous parsing code instead of re-writing it all.
TODO: