Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'verdi node repo dump' command. #3623

Merged
merged 22 commits into from
Dec 19, 2019

Conversation

greschd
Copy link
Member

@greschd greschd commented Dec 9, 2019

Fixes #1121.

The command takes a node pk / uuid, zero or more file names, and an output directory as arguments. By default, it will create a sub-directory with the node uuid in the output directory, and copy the files with the given names from the repository to that directory. If no file names are given, all objects in the repository are copied.

The command has two optional flags:

  • --no-uuid: copies the files directly into the output directory, without
    uuid prefix
  • --force: Copy files even if they already exist at the target destination

Changed from the discussion with @ltalirz: For specifying the source file names, I am now not using a separate flag, since that mimics how regular cp works. For example:

verdi node repo cp 2323 _scheduler-stderr.txt _scheduler-stdout.txt output_dir 

Click help:

Usage: verdi node repo cp [OPTIONS] NODE [SRC_FILES]... OUTPUT_DIRECTORY

  Copy the repository files of a node to an output directory.

Options:
  --no-uuid    Do not prefix the output paths with the node uuid.
  -f, --force  Do not ask for confirmation.
  -h, --help   Show this message and exit.

The command takes a node pk / uuid, zero or more file names, and an
ouput directory as arguments. By default, it will create a sub-directory
with the node uuid in the output directory, and copy the files with
the given names from the repository to that directory. If no file names
are given, all objects in the repository are copied.
The command has two optional flags:
 --no-uuid: copies the files directly into the output directory, without
            uuid prefix
 --force: Copy files even if they already exist at the target destination
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this misses proper dealing with folders

@greschd greschd changed the title Add 'verdi node repo cp' command. [WIP] Add 'verdi node repo cp' command. Dec 9, 2019
The command is renamed to 'verdi node repo dump'. The option
of specifying specific file names to be copied is removed. The
output paths no longer prefixed with a UUID. When a file or directory
exists already, the command prompts to either abort, skip the current
file / directory, overwrite the file / directory, or (in the case
of directories only) merge the contents. A 'force' option can be
given to always overwrite files / directories.
Tests are not yet updated to the changed interface, since we want
to get feedback on that first.
@greschd
Copy link
Member Author

greschd commented Dec 10, 2019

Following the feedback session, made the following changes:

  • Changed the name of the command to verdi node repo dump.
  • The option to pass specific file names is removed
  • The UUID prefix is removed (might be added later as an option, if need arises)
  • When a file or directory exists already, prompt the user with the following options:
    • [a] abort the command
    • [s] skip the current file / directory
    • [m] (for directories only) merge contents
    • [r] replace content - in the case of directories removing the existing content
  • A --force option can be passed to always replace.

Current click help:

Usage: verdi node repo dump [OPTIONS] NODE OUTPUT_DIRECTORY

  Copy the repository files of a node to an output directory.

Options:
  -f, --force  Overwrite existing directories and files without prompting for
               confirmation.
  -h, --help   Show this message and exit.

@greschd
Copy link
Member Author

greschd commented Dec 10, 2019

Mentioning @giovannipizzi @ltalirz for feedback.

@giovannipizzi
Copy link
Member

The described behaviour looks fine to me thanks!

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @greschd
I tried it out locally - some suggestions on the wording to make the prompts less long

aiida/cmdline/commands/cmd_node.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_node.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_node.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_node.py Outdated Show resolved Hide resolved
@greschd
Copy link
Member Author

greschd commented Dec 10, 2019

Since there was quite some discussion, maybe we should show the updated interface again?

@greschd greschd changed the title [WIP] Add 'verdi node repo cp' command. [WIP] Add 'verdi node repo dump' command. Dec 10, 2019
Apply suggestions for simplified wording of the prompts that
ask if a file / directory should be replaced / merged / skipped,
or the operation should be aborted.

Co-Authored-By: Leopold Talirz <[email protected]>
@ltalirz
Copy link
Member

ltalirz commented Dec 10, 2019

Sure, we can do it briefly this evening. I think the others will approve.

Dominik Gresch and others added 5 commits December 10, 2019 22:27
The command takes a node pk / uuid, zero or more file names, and an
ouput directory as arguments. By default, it will create a sub-directory
with the node uuid in the output directory, and copy the files with
the given names from the repository to that directory. If no file names
are given, all objects in the repository are copied.
The command has two optional flags:
 --no-uuid: copies the files directly into the output directory, without
            uuid prefix
 --force: Copy files even if they already exist at the target destination
The command is renamed to 'verdi node repo dump'. The option
of specifying specific file names to be copied is removed. The
output paths no longer prefixed with a UUID. When a file or directory
exists already, the command prompts to either abort, skip the current
file / directory, overwrite the file / directory, or (in the case
of directories only) merge the contents. A 'force' option can be
given to always overwrite files / directories.
Tests are not yet updated to the changed interface, since we want
to get feedback on that first.
Apply suggestions for simplified wording of the prompts that
ask if a file / directory should be replaced / merged / skipped,
or the operation should be aborted.

Co-Authored-By: Leopold Talirz <[email protected]>
@greschd greschd changed the title [WIP] Add 'verdi node repo dump' command. Add 'verdi node repo dump' command. Dec 10, 2019
Dominik Gresch added 3 commits December 11, 2019 17:45
The code was currently broken because not all flags were changed
from 'o' to 'r' (in the rename from 'overwrite' to 'replace').
Adds a test which runs with '--force' where a single file is
replaced.
When passing a pathlib.Path to 'shutil', it needs to be explicitly
cast to 'str'. The compatibility of 'os' and 'shutil' with pathlib
objects was added only in Python 3.6.
@ltalirz ltalirz self-requested a review December 12, 2019 10:02
ltalirz
ltalirz previously approved these changes Dec 12, 2019
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @greschd
to me this looks ready to be merged

@greschd
Copy link
Member Author

greschd commented Dec 17, 2019

@giovannipizzi, do you want to have a final look?

@giovannipizzi
Copy link
Member

Thanks @greschd ! @sphuber was a bit doubtful on whether to keep the complex logic for merging, or just limit the possibility to create a new folder, specified on the cmd line by the user. Did you already discuss with him?

@greschd
Copy link
Member Author

greschd commented Dec 17, 2019

I did not, but I would be ok with removing the "merge" option and only leave abort / skip / overwrite.

@sphuber
Copy link
Contributor

sphuber commented Dec 17, 2019

Sorry, I did not take this up yet with @greschd . Apologies for coming with this after you have already implemented it but I was wondering if it is not a simpler/safer to make the target directory into which the files should be dumped a required argument and require that it cannot exist. If the goal of the command is to be as simple as possible and just dump the entire content of a node's repo this seems in line with that goal. I think for the same reason we decided not to provide flags to include/exclude certain paths. @ltalirz pointed out that it prevents one from dumping in the current directory, but my argument would be that doing mkdir some_folder just before is a lot easier then potentially responding to a bunch of prompts. I am afraid that the current solution provides unnecessary functionality that introduces quite some code complexity which is difficult to fully test and may provide surprising behavior. @giovannipizzi agreed with my suggestion to simplify the code/interface when we discussed this on Friday @fiesch. I realize that you have already invested quite some time in writing the code and it would be unfortunate if we decide not to include it. Let me know what you think about simplifying the command

@greschd
Copy link
Member Author

greschd commented Dec 17, 2019

[...] I was wondering if it is not a simpler/safer to make the target directory into which the files should be dumped a required argument and require that it cannot exist.

So the proposed syntax would be verdi node repo dump <ID> <PATH> where PATH does not exist, but its parent does, correct? Or would you also auto-create the parents?

[...] that doing mkdir some_folder

which means the user wouldn't have to mkdir the folder, right?

I realize that you have already invested quite some time in writing the code and it would be unfortunate if we decide not to include it. Let me know what you think about simplifying the command

No worries, I'm not particularly attached to my code - that also doesn't make a particularly strong technical argument 😄 I'm fine with making this as simple as possible.

@sphuber
Copy link
Contributor

sphuber commented Dec 17, 2019

I think it is fine to automatically create a nested folder, as long as the leaf folder does not already exist. This is just to prevent copying files by accident into an already existing folder. So interface:

verdi node repo dump <ID> <PATH>

with <ID> and <PATH> both required and <PATH> can be a relative nested path where the leaf folder should not exist but any sub paths can exist and if not will be created on the fly. The contents of the node repository are then copied wholesale into the leaf folder of <PATH> (omitting the still present but soon to be removed path or raw_input sub paths in the node repo).

But before we go ahead, lets get an OK on this proposal from @giovannipizzi and @ltalirz and yourself :)

@greschd
Copy link
Member Author

greschd commented Dec 17, 2019

Good for me. The path and raw_input are omitted by virtue of going through the list_objects interface, I think.

@giovannipizzi
Copy link
Member

I would not have created intermediate paths, but I do not see a major problem with this, so I'm ok with Sebastiaan's proposal.
Just to be clear, I agree that the simplest is to stop if the provided path already exists.

@greschd
Copy link
Member Author

greschd commented Dec 17, 2019

Just to be clear, I agree that the simplest is to stop if the provided path already exists.

Yes, that's the intended behavior. In principle this could be handled by click, by setting type=click.Path(exists=False, file_okay=False, dir_okay=False). However, the resulting error message

Usage: verdi node repo dump [OPTIONS] NODE OUTPUT_DIRECTORY
Try "verdi node repo dump -h" for help.

Error: Invalid value for "OUTPUT_DIRECTORY": Path "." is a directory.

is not very clear IMO, so I'll instead handle it directly and write

Error: Invalid value for "OUTPUT_DIRECTORY": Path "." exists.
Aborted!

Instead of prompting when a file / directory exists, we now only
allow specifying an OUTPUT_DIRECTORY that does not exist, and
simply abort the command if it does.
@greschd greschd requested a review from ltalirz December 18, 2019 18:12
ltalirz
ltalirz previously approved these changes Dec 19, 2019
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @greschd , looks good to me!

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor final changes and then its good to go, I promise :)

)
@with_dbenv()
def repo_dump(node, output_directory):
"""Copy the repository files of a node to an output directory."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a note saying that the output directory should not exist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I did notice though that this doesn't appear anywhere in the documentation, only the top-level command help is shown. That should probably be improved, we could show the help for all command groups.

try:
output_directory.mkdir(parents=True, exist_ok=False)
except FileExistsError:
click.echo('Error: Invalid value for "OUTPUT_DIRECTORY": Path "{}" exists.'.format(output_directory))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please use aiida.cmdline.utils.echo.echo_critical here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

for file in node.list_objects(key=key):
# Not using os.path.join here, because this is the "path"
# in the AiiDA node, not an actual OS - level path.
file_key = file.name if not key else key + '/' + file.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point and I now start to wonder whether we actually handle this correctly and consistently everywhere in the repositiory interface. Anyhow, this is good for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's only going to be a problem if os.sep is a backslash.. not the trickiest part for Windows support, I suppose 😄

Add a note that the output directory should not exist to the doc-
string. When it does, abort with the 'echo_critical' provided by
AiiDA instead of the click default.
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @greschd and especially for your patience

@sphuber sphuber merged commit d3e05c3 into aiidateam:develop Dec 19, 2019
@greschd greschd deleted the add_node_repo_cp_command branch December 19, 2019 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 'verdi node copyfiles' command
4 participants