-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt a consistent behavior when adding targets and paths #1008
Adopt a consistent behavior when adding targets and paths #1008
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for this beautiful PR, @sechkova. 💯 for clean and well-documented code and great commit-discipline.
I have left a few very minor inline comments and would like to share and discuss some high-level thoughts about the _check_relpath
function. Note that there is no need to fix inline comments before we have agreed on desired behavior.
Let's first agree on what we want here:
- Remove file existence checks in
add_target*
andadd_paths
methods (write
/writeall
should deal with that). - Make sure that the paths that end-up in roledb have forward slash path separators and don't start with a leading forward slash. Although the spec only recommends this (see spec#67) python-tuf seems to enforce this (see various
path.replace('\\', '/')
).
While you already fixed goal 1. (hooray ✅ ), I see a few minor issue in regards to goal 2.
isabs
,split(os.path)
andnormpath
all behave differently on Unix and Windows, so the warning/erroring conditions and normalization behavior vary depending on the combination of used platform and passed path.- I wonder if the mixed behavior of warning/erroring/normalizing won't confuse the caller.
Let's back up a little, I think there are two ways to achieve what we want (part 2 above).
- expect the caller to pass the right path (
/
as separator, no leading/
), no matter the platform, and raise an error if they don't:- Pro: clear behavior (no hidden normalizing magic)
- Con: inconvenient for the caller, e.g. if they want to pass the return value of tuf's
get_filepaths_in_directory
(returns abs paths), or if they are on windows and use (listdir
orwalk
).
- take anything and normalize, i.e. convert separators to
/
and remove leading/
.- Pro: convenient for the caller
- Con: unclear behavior (too much magic)
I strongly lean towards solution 1 (least surprise) plus providing normalizing utility functions for the caller's convenience (can be in a separate PR). What do others think?
Btw. in the future we might be even more sophisticated and have separate checker functions for target path, i.e. path-relative-scheme-less-URL, and path pattern, i.e. path-relative-scheme-less-URL but with glob pattern symbols. (see spec#67).
tuf/repository_tool.py
Outdated
tuf.formats.RELPATH_SCHEMA.check_match(pathname) | ||
|
||
if os.path.isabs(pathname): | ||
raise tuf.exceptions.InvalidNameError(repr(pathname) + ' contains a leading' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Could you please wrap before "leading", so that we don't exceed 80 characters per line. (note to myself: we need to fix the linter configuration!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9eef0bd
tuf/repository_tool.py
Outdated
separator. | ||
|
||
<Side Effects> | ||
Normalizes pathname. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: I wouldn't call this a side effect, as it doesn't change any state outside of this function (as e.g. I/O or modifying some globals would do). I know that other function docstrings in tuf sometimes get this wrong. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9eef0bd
tuf/repository_tool.py
Outdated
@@ -2772,6 +2755,61 @@ def _locate_and_update_target_in_bin(self, target_filepath, method_name): | |||
' in any of the bins.') | |||
|
|||
|
|||
def _check_relpath(self, pathname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: If we do normalize paths we should call this function something like _normalize_path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9eef0bd
tuf/repository_tool.py
Outdated
' a valid file in the repository\'s targets' | ||
' directory: ' + repr(self._targets_directory)) | ||
logger.debug('Replacing target: ' + repr(relative_path)) | ||
roleinfo['paths'].update({relative_path: custom}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the if/else is only needed for the debug message and we can put roleinfo['paths'].update({relative_path: custom})
below it (only once).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed by merging #1007
tuf/repository_tool.py
Outdated
roleinfo['paths'].update({relative_path: custom}) | ||
|
||
tuf.roledb.update_roleinfo(self._rolename, roleinfo, | ||
repository_name=self._repository_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Please use double indentation (4 spaces) for line continuation (here and elsewhere too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed by merging #1007
tuf/repository_tool.py
Outdated
Normalizes pathname. | ||
|
||
<Returns> | ||
The normazlied 'pathname'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/normazlied/normalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9eef0bd
tuf/repository_tool.py
Outdated
logger.warning(repr(path) + ' is not located in the repository\'s' | ||
' targets directory: ' + repr(self._targets_directory)) | ||
# Check if the delegated paths or glob patterns are relative and | ||
# normalize them. Paths' existense on the file system is not verified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/existense/existence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9eef0bd
I'd love to hear what others think about the two approaches I suggested above...
See #1008 (review) for more details. (ping @mnm678, @SantiagoTorres, @trishankatdatadog) |
Approach #2 seems friendlier, although some debug/info/warning notices to the logger might help users debug unexpected issues. |
I think #1 would be easier to implement on a wide range of systems. It adds a bit of work for the caller, but eliminates confusing errors when the path doesn't match a known format. |
Thanks for your 4 cents, @mnm678 and @trishankatdatadog. I think Marina and I outvote you in the favor of a strict approach, Trishank. :P I really feel like we should have a function that accepts exactly and only what the specification recommends. Nonetheless, I do share the friendliness concern, which I think we can accommodate by providing normalization tooling, such as |
I'm in favour of the strict approach with supporting helper functions; so that we follow the principle of least surprise, match the specification and support client developers (with the helper functions). |
Thanks a lot everyone for the helpful comments and suggestions! Related to path verification:
In terms of adding normalization functions, how do you feel about:
@lukpueh Parts of the tutorial seem to be suggesting absolute paths, which will be rejected by the new approach. Do you think the normalization tooling + updating the tutorials about its usage is appropriate for a separate PR? |
Yes, I think this is a good start. (Later in a follow-up PR, we should consider to only accept path-relative-scheme-less-URL strings for target paths (see TARGETPATH in spec), and something similar but with shell-style wildcards allowed for path patterns (see PATHPATTERN in spec)).
Yes!
Yes! But we can do this in a follow-up PR.
def get_normalized_target_paths_in_directory(path):
return normalize_file_paths(get_filepaths_in_directory(path)): We can also do that in in a follow-up PR.
Hm. I'd rather keep the tutorial usable and not fail the tests. Maybe we can add a quick intermediate fix that replaces the |
Yes 👀
Agree on keeping the tutorials consistent with the code and the tests working. A hard-coded list will do the job for now. I'll add it in the PR 👍 |
75ee328
to
7617e09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @sechkova, great work.
I've made a few minor inline comments, mostly just clarifying code comments in the tests.
tests/test_repository_tool.py
Outdated
# # Test invalid filepath argument (i.e., non-existent or invalid file.) | ||
# self.assertRaises(securesystemslib.exceptions.Error, | ||
# self.targets_object.remove_target, | ||
# '/non-existent.txt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to leave this commented out code, could you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in squashed commit 3a13b73
tests/test_repository_tool.py
Outdated
# Test for delegated targets that do not exist. | ||
# An exception should not be raised for non-existent delegated paths, | ||
# since at this point the file system should not be accessed yet | ||
self.targets_object.delegate(rolename, public_keys, [], threshold, | ||
terminating=False, list_of_targets=['non-existent.txt'], | ||
path_hash_prefixes=path_hash_prefixes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this test still makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the 'non-existent' target checks in all related tests just to make sure that the file system is not accessed at this point and no exception is raised.
tests/test_repository_tool.py
Outdated
self.targets_object.delegate, rolename, public_keys, [], | ||
list_of_targets=['/file1.txt']) | ||
|
||
# A path or target starting with a directory separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem right, aren't we testing Windows path separators here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in squashed commit 3a13b73
tests/test_repository_tool.py
Outdated
self.assertRaises(securesystemslib.exceptions.FormatError, | ||
self.targets_object._check_path, 3) | ||
|
||
# Test improperly formatted rolename argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, is this comment correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in squashed commit 3a13b73
tests/test_repository_tool.py
Outdated
self.targets_object._check_path('non-existent.txt') | ||
self.targets_object._check_path('subdir/non-existent') | ||
|
||
# Test improperly formatted rolename argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem to match the code, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in squashed commit 3a13b73
tests/test_tutorial.py
Outdated
os.path.abspath(os.path.join('repository', 'targets', 'file1.txt')), | ||
os.path.abspath(os.path.join('repository', 'targets', 'file2.txt')), | ||
os.path.abspath(os.path.join('repository', 'targets', 'file3.txt'))]) | ||
# List of targets is hardcoded since get_filepaths_in_directory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO item to switch to using get_filepaths_in_directory
+ a helper at some point? The comment doesn't make a lot of sense once the diff showing the deleted call to get_filepaths_in_directory
isn't visible. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I tried to improve the comment and keep the TODO idem: 6d8a84d
docs/TUTORIAL.md
Outdated
>>> target4_filepath = os.path.abspath("repository/targets/myproject/file4.txt") | ||
>>> octal_file_permissions = oct(os.stat(target4_filepath).st_mode)[4:] | ||
>>> custom_file_permissions = {'file_permissions': octal_file_permissions} | ||
>>> repository.targets.add_target(target4_filepath, custom_file_permissions) | ||
>>> repository.targets.add_target('myproject/file4.txt', custom_file_permissions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just me or does this pattern of assigning the full path to a variable and then passing the relative sub-path as a string look a little strange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it looks awkward but it is also a good example why a normalising path helper is needed :)
I tired to rewrite it in a more reasonable way: 6d8a84d
tests/test_tutorial.py
Outdated
@@ -243,7 +240,7 @@ def test_tutorial(self): | |||
'repository', 'targets', 'myproject', 'file4.txt')) | |||
octal_file_permissions = oct(os.stat(target4_filepath).st_mode)[4:] | |||
custom_file_permissions = {'file_permissions': octal_file_permissions} | |||
repository.targets.add_target(target4_filepath, custom_file_permissions) | |||
repository.targets.add_target('myproject/file4.txt', custom_file_permissions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as (several comments) above, does this pattern look a little strange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment above: 6d8a84d
tuf/repository_tool.py
Outdated
@@ -2526,11 +2523,15 @@ def delegate_hashed_bins(self, list_of_targets, keys_of_hashed_bins, | |||
ordered_roles.append(role) | |||
|
|||
for target_path in list_of_targets: | |||
# Check if the target path is relative and normalize it. File's existence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_path()
no longer does any normalising, right? Could you update the comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed during merge in ba8ffd2
7617e09
to
6d8a84d
Compare
Thank you @joshuagl for actually reading all comments sections! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many many thanks for the updates, @sechkova! I have left three minor comments. Please address and we are good to go here. :)
docs/TUTORIAL.md
Outdated
>>> custom_file_permissions = {'file_permissions': octal_file_permissions} | ||
>>> repository.targets.add_target(target4_filepath, custom_file_permissions) | ||
>>> repository.targets.add_target('target4_filepath', custom_file_permissions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the variable target4_filepath
and not the string, akin to test_tutorial.py
If we only had a tutorial test that actually tests the tutorial document (see #775 (review) and #808 (comment)) ... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing, fix is squashed in 87e1e11
tuf/repository_tool.py
Outdated
raise tuf.exceptions.InvalidNameError('Path ' + repr(pathname) | ||
+ ' does not use the forward slash (/) as directory separator.') | ||
|
||
if pathname.startswith(os.sep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that this would allow a windows user to pass an absolute path that starts with "/", which we also don't want. I suggest to change to if pathname.startswith('/'):
. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below
tests/test_repository_tool.py
Outdated
# Test invalid pathname - starting with os separator | ||
pathname = os.sep + 'file1.txt' | ||
self.assertRaises(tuf.exceptions.InvalidNameError, | ||
self.targets_object._check_path, pathname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will raise for different reasons on *nix and windows, as a consequence we don't trigger the starts with "/" condition when testing on windows. (also see comment in _check_path
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, the commit where I added os.sep in the tests is reverted and instead _check_path
uses if pathname.startswith('/')
as suggested, thanks!
Also added an additional test case for paths starting with '\', just in case for the future.
The new commit: e85a3b3
6d8a84d
to
e85a3b3
Compare
Rebased and new changes added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just stumbled across two more things that probably came from a rebase on some other work (plus a a minor docstring request). Would you mind fixing those? :)
tuf/repository_tool.py
Outdated
@@ -1854,6 +1854,8 @@ def add_paths(self, paths, child_rolename): | |||
securesystemslib.exceptions.Error, if 'child_rolename' has not been | |||
delegated yet. | |||
|
|||
tuf.exceptions.InvalidNameError, if 'pathname' does not match pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Would you mind replacing 'pathname'
with what's actually being checked, that is here "any of the passed 'paths'". (same comment applies to other occurrences of this line).
# forward slash as a separator or raise an exception. Paths' existence | ||
# on the file system is not verified. If the path is incorrect, | ||
# the targetfile won't be matched successfully during a client update. | ||
self._check_path(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we don't need the relative_paths
variable anymore.
tuf/repository_tool.py
Outdated
filepath[targets_directory_length + 1:].replace('\\', '/')) | ||
|
||
self._check_path(target) | ||
relative_list_of_targets.append(target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above in add_paths
. Given that we don't do no normalizing anymore, we don't have to copy the target paths to a new list, i.e. relative_list_of_targets
. Please remove.
This method checks if the passed path is relative and returns the normalized path. Checks are performed without accessing the file system. Signed-off-by: Teodora Sechkova <[email protected]>
Adopt the same behavior in all methods of Targets class (add_paths(), delegate()) that add new paths to a role. Signed-off-by: Teodora Sechkova <[email protected]>
Adopt the same path verification strategy in all methods of Targets class that add new target files by utilizing the _check_relpath() method. Signed-off-by: Teodora Sechkova <[email protected]>
Adopt a stict behavor allowing only paths that match the definition of a PATHPATTERN or aTARGETPATH (use the forward slash (/) as directory separator and do not start with a directory separator). Raise an exception otherwise. Rename the method to a more general _check_path(). Signed-off-by: Teodora Sechkova <[email protected]>
Normalization of the target path is no longer necessary during metadata generation, since an exception is raised earlier, on the addition of an incorrect target path or pattern. Signed-off-by: Teodora Sechkova <[email protected]>
Replace the absolute paths returned by get_filepaths_in_directory() in the tutorial with a hard-coded list of relaive filepaths since add_target(s) and delegate() methods raise excception on absolute paths. Remove an obsolete warning about path pattern's location. Signed-off-by: Teodora Sechkova <[email protected]>
- add a test for _check_path() method of Targets class. - update all tests calling _check_path() respectively - update test_tutorial Signed-off-by: Teodora Sechkova <[email protected]>
Test is updated to include checks for incorrect target paths. Signed-off-by: Teodora Sechkova <[email protected]>
Improve the coding style in TUTORIAL in the case where absolute path to a file is needed to perform file system access and at the same time is rejected by Targets methods. Signed-off-by: Teodora Sechkova <[email protected]>
Use a hard-coded unix separator ('/') so that an exception is also raised for paths starting with '/' when executing on Windows systems. Update test_check_path to explicitly test invalid paths starting with Windows style separator. Signed-off-by: Teodora Sechkova <[email protected]>
Remove unnecessary copying of paths to another list in add_targets() and add_paths() methods. Fix incorrect docstring text. Signed-off-by: Teodora Sechkova <[email protected]>
e85a3b3
to
882df8e
Compare
Rebased and fixed in 882df8e :) |
Fixes issue # : #957, #963
Description of the changes being introduced by the pull request:
As described in the issues above, the methods of Targets class use inconsistent checks when adding target files and delegated paths. As a result of this plus the need to avoid access to the file system in certain cases, this pull request suggests:
write()
orwriteall()
Code changes:
_check_relpath()
method to Targets classadd_paths()
,delegate()
) to utilise_check_relpath()
add_target(s)()
,delegate()
,delegate_hashed_bins()
,_locate_and_update_target_in_bin
) to utilise_check_relpath()
Note: Tests, delegation section in tutorial as well as
get_filepaths_in_directory()
helper also need updating but I am sharing the PR sooner as it includes changes needed by PEP 458 implementation. I will be updating them accordingly either in the PR or as a follow on it.Please verify and check that the pull request fulfills the following
requirements: