Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify delegation paths pattern matching wrt path separator #173

Closed
jku opened this issue Jul 12, 2021 · 2 comments · Fixed by #174
Closed

Clarify delegation paths pattern matching wrt path separator #173

jku opened this issue Jul 12, 2021 · 2 comments · Fixed by #174

Comments

@jku
Copy link
Member

jku commented Jul 12, 2021

This is about the pattern matching done on "paths" attribute of targets delegation. The language in the spec seems to contradict the reference implementation but neither is very clear.

I would like to see a mention of path separator "/" in the text: is it special cased WRT pattern matching or not.

Specification:

PATHPATTERN can include shell-style wildcards and supports the Unix filename pattern matching convention. Its format may either indicate a path to a single file, or to multiple paths with the use of shell-style wildcards.

I've been reading this spec for a year now and this has so far seemed clear to me: Unix filename pattern matching convention clearly means a glob where the path separator character never matches the wildcards. this is further emphasized by mention of shell-style wildcards: it's clear that the paths separator never matches in the shell.

Based on the above it seems clear that *.tgz should not match targets/foo.tgz -- this is how shell pattern matching works.

Examples in specification:

For example, the path pattern "targets/*.tgz" would match file paths "targets/foo.tgz" and "targets/bar.tgz", but not "targets/foo.txt". Likewise, path pattern "foo-version-?.tgz" matches "foo-version-2.tgz" and "foo-version-a.tgz", but not "foo-version-alpha.tgz"

The examples do not cover the interesting case: does *.tgz match targets/foo.tgz?

The reference implementation:

This is a simplified version, see https://github.com/theupdateframework/tuf/blob/develop/tuf/client/updater.py#L2868 for real code

if fnmatch.fnmatch("targets/foo.tgz", "*.tgz"):
    # path pattern matches

fnmatch documentation explains that the filename separator ('/' on Unix) is not special to this module. This means that in the reference implementation *.tgz does match targets/foo.tgz -- this is not what I expected from reading the spec.

@jku jku changed the title Clarify delegation paths pattern matching wrt to path separator Clarify delegation paths pattern matching wrt path separator Jul 12, 2021
@mnm678
Copy link
Collaborator

mnm678 commented Jul 12, 2021

My understanding of the spec's intention is that *.tgz should not match targets/foo.tgz. I don't think a wildcard delegation to multiple namespaces would make sense, especially as different namespaces would be controlled by different people/organizations.

Either way, this is a great example that we should add to the spec to make the intended behavior more clear.

@joshuagl
Copy link
Member

Thanks for the detailed issue @jku!

My interpretation of the spec matches both of yours, that this is shell pattern matching and therefore that a PATHPATTERN of *.tgz should not match targets/foo.tgz.

I think this is a reference implementation bug (and that the reference implementation should probably be using glob, not fnmatch).

The Tough developers seem to have made the same interpretation of the spec and are using glob to implement PATHPATTERN matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants