-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to move files to the framework #145
base: main
Are you sure you want to change the base?
Add script to move files to the framework #145
Conversation
Signed-off-by: David Horstmann <[email protected]>
This is useful for building file-moves on top of other not-yet-merged file moves. Signed-off-by: David Horstmann <[email protected]>
When there are no dirs to make we try anyway. This is bad and causes an error. Fix this. Signed-off-by: David Horstmann <[email protected]>
To allow branches to be built on other branches, resolve conflicts 'manually' when the 2 branches delete each others' files. Signed-off-by: David Horstmann <[email protected]>
Allow passing the --except flag, which stops a file from being moved (and perform the necessary path manipulation acrobatics to make that work). Created to be able to move tests/include except for tests/include/alt-dummy, like so: --path tests/include:include --except tests/include/alt-dummy Signed-off-by: David Horstmann <[email protected]>
This was failing if no exceptions were supplied. Signed-off-by: David Horstmann <[email protected]>
This causes the script to not perform the operation, but simply print the self._file_map dictionary of files to rename (this is useful for debugging the file-moving logic, especially with combinations of --path and --except options). Signed-off-by: David Horstmann <[email protected]>
When files are moved to the exact same location in the framework repo as they were in Mbed TLS, there is a problem. Since no git change has taken place on the Mbed TLS side, the most recent change to the file is its deletion when a previous set of files were moved to the framework (since all other files are deleted in this kind of move). As a result the moved files are deleted and do not appear in the final branch. Deal with this edge case explicitly by taking files which are not renamed from the file map and checking them out to the state in the temporary branch, before amending the resulting commit. Signed-off-by: David Horstmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a mostly full code review, more than we usually do in mbedtls-docs
. Arguably this script could live in the framework repository; for that level of quality, I'd want most of my comments to be handled. For the level of quality in mbedtls-docs
, all my comments are optional except one: this script seems to mess up if the destination repo is the framework
subdirectory of the source.
tools/bin/mbedtls-move-to-framework
Outdated
def add_file_move(self, src_path: str, dst_path: str) -> None: | ||
"""Move file at relative path src_path in the source repo to dst_path | ||
in the destination repo""" | ||
self._file_map[src_path] = dst_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no normalization like there is for the initialization of _file_map
?
cmd = [self.GIT_EXE] + git_cmd | ||
return subprocess.check_output(cmd, **kwargs) | ||
|
||
def add_file_move(self, src_path: str, dst_path: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't used anywhere. __init__
should probably call it instead of setting _file_map
on its own.
def __init__(self, source_repo: str, dest_repo: str, | ||
source_branch_name: str, dest_branch_name: str, | ||
file_map: Dict[str, str], | ||
dest_base_branch_name: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to document the parameters. It's not obvious what you can put in file_map
, or the difference between dest_branch_name
and dest_base_branch_name
.
tools/bin/mbedtls-move-to-framework
Outdated
def _direct_children(self, dir_path: str) -> List[str]: | ||
"""Get the direct children of a subdirectory (at least the ones | ||
that matter to git)""" | ||
leaf_files = self.run_git(['ls-files', dir_path]).split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever have a file name containing a space, this will split on the space. (Control characters in file names would also not work properly.) The robust way would be
leaf_files = self.run_git(['ls-files', dir_path]).split() | |
leaf_files = self.run_git(['ls-files', '-z', '--', dir_path]).rstrip('\0').split('\0') |
This applies to the other uses of ls-files
.
It would be good to make an auxiliary method that calls ls-files
and returns a list.
tools/bin/mbedtls-move-to-framework
Outdated
leaf_files = self.run_git(['ls-files', dir_path]).split() | ||
path_len = len(self._path_parts(dir_path)) + 1 | ||
direct_children = set([ | ||
os.path.join( | ||
*self._path_parts(f)[:path_len] | ||
) | ||
for f in leaf_files | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be equivalent but simpler: run git -C dir_path ls-files
and strip anything after a slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though looking at how this method is used, it seems that you're stripping the paths down to base names here, only to reconstruct the full path later in _expand_dir_move
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part removes the suffix that git ls-files
outputs. We might get:
foo/bar/a.c
foo/bar/b.c
foo/baz
Which we reduce to just the direct descendants of foo
, so:
foo/bar
foo/baz
We never reconstruct the suffix in _expand_dir_move()
, but we substitute the prefix, e.g. if foo
is being moved to new_foo
:
new_foo/bar
new_foo/baz
This is a part of the transformation required for the --except
option.
I was not aware that passing -C
with a directory had this effect though, that will certainly simplify this.
tools/bin/mbedtls-move-to-framework
Outdated
def main() -> int: | ||
"""Command line entry point.""" | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--path', action='append', nargs='?', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than require --path
to be repeated for each path, I think this should be the positional arguments. In particular it would allow calls like mbedtls-move-to-framework … subdir/*.py
.
GIT_EXE = 'git' | ||
FRAMEWORK_DEFAULT_BASE_BRANCH = 'main' | ||
|
||
def __init__(self, source_repo: str, dest_repo: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent when abbreviating “destination”: either dst
or dest
, not both.
parser.add_argument('--dst-base-branch', default=None, required=False, | ||
help='Base branch in the framework repo to make changes from') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't actually need to be a branch name, it can be any revision. In practice I'd want whatever is the current head of the framework submodule, and I think that should be the default, rather than main
.
In fact main
is likely not to work in practice, because it'll typically be some old reference made when the working directory was originally cloned I delete the default branch name in my local clones to avoid keeping that old reference around, so when I didn't pass --dst-branch-name
, I got the error
fatal: 'main' matched multiple (4) remote tracking branches
self.run_git(['checkout', self._framework_base_branch_name]) | ||
|
||
# Create a new branch with checkout -b | ||
self.run_git(['checkout', '-b', self._framework_branch_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add checks for common errors before starting to do anything. At least check that the branches don't already exist, which is likely to be the case the second time the same person uses the script.
# Delete all files except the ones we are moving | ||
for f in repo_files: | ||
if os.path.normpath(f) not in files_to_retain: | ||
self.run_git(['rm', f]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLOCKER: As far as I understand it (more info in Slack thread), this will in particular remove framework
. So from that point, when checking out the original head again, the framework submodule is no longer initialized. Then later the framework directory itself disappears, which I don't understand.
I don't fully understand what's going on, but as far as I can tell, you can't have the destination be the framework
directory of the source, it has to be a separate worktree. This is a very unintuitive limitation that needs to be at least documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now documented and I have added a check so that an error message is printed if you try to move files to a framework submodule within the Mbed TLS checkout that you are moving from.
I'm marking this as priority-high because we need this to move files to the framework with the history preserved. Arguably this should be in the framework repository as an essential maintainer tool (even if it'll only be essential for a few months). |
tools/bin/mbedtls-move-to-framework
Outdated
self.run_git(['mv', f, self._file_map[f]]) | ||
|
||
# Commit the result | ||
commit_message = "Move some files to framework repository" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the ability to specify a commit message. I can do it afterwards by rebasing, and that's not really a problem in the source branch, but in the destination branch, the commit message is behind a merge and git is reluctant to edit history beyond a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if I want to edit the commit message, I need to redo the conflict resolution which is implemented here as resolve_deletion_conflicts
. That's not easy!
I've now used the script successfully. As far as I can't tell it just can't work if the destination is the So this script should come at least with a warning, if not enforcement, that both the source and destination should be independent worktrees, freshly checked out, that are not used for doing anything else. Preferably, the script should create these worktrees itself (as |
I've not yet used this script myself, but looking at its result there seems to be extra commits that I wouldn't expect to be present in the history, see Mbed-TLS/mbedtls-framework#55 (comment) |
Agreed. I've created Mbed-TLS/mbedtls-framework#57 and added it to the first framework epic. (I don't care much about where the script lives, as long as it meets quality requirements.) |
We're preserving the commit history, not just the patch history. So each import branch must have the full history of the project up to the last commit that modifies one of the imported files. The first time we did this in a pull request, GitHub showed basically all the mbedtls history as part of the pull request, because these were all new commits in the There are tools that can rewrite a git branch to only retain commits that modify specific files, and construct a commit history with the same messages and the same patches for this subset of files. But you lose commit IDs (obviously), renamed files, content moved between files, etc. And these tools are harder to use. So I think we've made the right choice. |
Ah OK, I had misunderstood what the script does.
Yeah, that's what I thought the script was doing, and indeed now that you mention it that wouldn't be the right thing to do. Thanks for clearing that up. |
This allows the user to write a message that they prefer. While we're at it, we can supply a helpful template so that it's a bit clearer what each commit message is for. Signed-off-by: David Horstmann <[email protected]>
Signed-off-by: David Horstmann <[email protected]>
5c5b953
to
7847e77
Compare
Signed-off-by: David Horstmann <[email protected]>
Instead of passing a dictionary file map when creating a RepoFileMover object, remove this from the constructor. The correct way to add a file move is via the add_file_move() method (as this properly encapsulates the file map). Signed-off-by: David Horstmann <[email protected]>
Change _direct_children() to return only the child directory names (sans prefix), which can be done easily with: git -C dir ls-files Also reconstruct paths more explicitly in _expand_dir_move(). Signed-off-by: David Horstmann <[email protected]>
Replace some statements with equivalent, simpler alternatives. Signed-off-by: David Horstmann <[email protected]>
Update the description to be more accurate. In particular, document the limitation that a separate checkout is required. Signed-off-by: David Horstmann <[email protected]>
If the user tries to move files to the internal framework checkout, print an error and exit. Signed-off-by: David Horstmann <[email protected]>
Set nargs=1 for each of these arguments. Since this will return a single-element list containing the argument (not the argument itself), change the action to 'extend', which will concatenate lists rather than appending individual values to a list. Signed-off-by: David Horstmann <[email protected]>
@mpg I believe I have addressed the main blocker and a few other improvements, so this should now be in a reasonable state to be handed over to someone else for followup. |
@gilles-peskine-arm Unless you disagree, I'd like to merge this script mostly as it stands now: I think your main comment has been addressed, and in practice we (at least Elena, Valerio and I) have been using this script a few times now without trouble. Of course that doesn't mean we won't improve it later on if usability or robustness issues start being a problem. But this can be done as a different PR (probably by a different author now that David have moved to other priorities). Could you give this PR a second review with that in mind? |
The script now refuses to run if the destination is a submodule of the source, rather than failing mysteriously, so my blocker comment is resolved.
I'm sorry, I can't promise a review right now. On the other hand, I can see that my one blocker comment has been addressed: the script won't fail in a mysterious way if you try to use it in what is to me the most natural scenario, namely moving files to a submodule of the current checkout. So I've dismissed my review. Since you seem to be happy with it, I'll go ahead and merge. |
Well, not quite: I don't feel comfortable merging it without doing at least a cursory review. I'll try to do that later, but if someone else feels comfortable approving this PR, please go ahead. |
Pushing a small script that moves files to the
framework
repo from Mbed TLS, while retaining history.This script works but may not be in a fully-polished state.