ngclient: reveal final form (finish API changes) #1580

jku · 2021-09-15T07:53:51Z

There's a bunch of small API changes that I think would still make the API better (more consistent, simpler, safer). The implementations of the individual changes can probably be separate but I think it makes sense to discuss the design as a whole: it'll be a lot easier to discuss when the final form of the API is visible.

simplify the caching support: I've advocated for just dropping the "cache support" in updated_targets() before (since it's not really a specification feature) but... it looks like it can be just simplified
solve potential issues with target paths as local file paths
download_target() should return the local file path, not force client to guess
make the API more obvious and consistent: Updater.updated_targets() return value is not obvious, and using a list is weird
Updater.get_one_valid_targetinfo(): name is long without adding value
Updater.refresh(): there is no reason to force client to explicitly call this

Planned API usage

NOTE: these examples have been updated as the ideas have crystallized: earlier comments may refer to a slightly different example

    updater = Updater(
        metadata_dir,
        "https://example.com/metadata/",
        targets_cache_dir,
        "https://example.com/targets/",
    )
    # Update metadata for a target, then download target into cache if needed
    info = updater.get_targetinfo("file.txt")
    path = updater.find_cached_target(info)
    if path is None:
        path = updater.download_target(info)

    # Alternatively if client knows the local filename it wants to use
    info = updater.get_targetinfo("file.txt")
    if not updater.find_cached_target(info, "./file.txt"):
        updater.download_target(info, "./file.txt")

Things to note there:

refresh() is available but not required: it will be done on first get_targetinfo() if not explicitly called before
the functions for finding a local file and downloading a file work the same way: signatures are almost identical
both functions return a file path: caller does not have to guess where the file is stored locally
the local path argument to both functions can be a filename or a directory. If it is a directory, then Updater chooses a safe filename for target path (similar to current functionality). If the client knows the filename it wants, it can just use that as the argument. This allows using the Updater in a caching manner (using the same cache dir over time), or just to download a file without any cache
no local directories are created by any of the calls: If download_target() chooses the filename, it will encode the target path before using it as a filename.
find_cached_target() name is not great but I want it to be a counterpart to download_target (and I believe it's much better than the ambiguous updated_targets())

See #1604 for more details.

Considered, but not included changes

I considered these as well, but am currently not planning to include them:

Does ngclient: feature request "download target as bytes" #1556 make sense: should there be version of download_target() that returns the target file as bytes for the use case where writing to a real file is not necessary? As far as I can tell these would be entirely new functions though so this could be done at a later time without breaking API, so I'm planning to not do this now.
Should we support pathlib Paths instead of / in addition to str file paths? I think pathlib is complete enough in the Python versions we support so that's no longer a barrier... I'm leaning on "no we should not" as it feels more like a purity test that makes API more complicated to average developer and not a major practical benefit: Sure, filenames are not strings but I think I'm fine limiting Updater to the subset of filenames that are strings.

The text was updated successfully, but these errors were encountered:

jku · 2021-09-15T09:27:16Z

related #1412, #1571

jku · 2021-09-20T07:19:34Z

re-implementing the current directory-creating cache style would be easy for a client that wants to do that (and knows that it is safe to do so).

jku · 2021-09-20T07:23:03Z

Would appreciate comments on this design: @sechkova, @joshuagl maybe?

This includes several things that are related enough that I wanted to cover them in one document: I don't intend to implement all in one PR, just wanted to document the end result I think would make sense.

joshuagl · 2021-09-20T10:35:38Z

download_target() should return the local file path, not force client to guess

💯 this will make the API much nicer in use.

make the API more obvious and consistent: Updater.updated_targets() return value is not obvious, and using a list is weird

💯 the pluralised API has always felt a little off.

Updater.get_one_valid_targetinfo(): name is long without adding value

👍

Updater.refresh(): there is no reason to force client to explicitly call this

I am less certain about this. We should try and get this right for 1.0 so we don't have to break API too soon. Is always refreshing going to cause problems anywhere? It might be problematic for an air-gapped client (with local/cached files) that can't reach out to the network to refresh()?

the local path argument to both functions can be a filename or a directory. If it is a directory, then Updater chooses a safe filename for target path (similar to current functionality). If the client knows the filename it wants, it can just use that as the argument. This allows using the Updater in a caching manner (using the same cache dir over time), or just to download a file without any cache

How do we indicate the difference between a directory and a filename? directory paths must end with /? Is this too magical?

no local directories are created by any of the calls: If download_target() chooses the filename, it will encode the target path before using it as a filename.

Sounds reasonable. Will this work with how, for example, pip does caching today?

get_local_target() name is not great but I want it to be a counterpart to download_target and that was the best I could come up with (and I believe it's much better than the ambiguous updated_targets())

Other options: get_cached_target(), find_cached_target(), find_local_target()

Signatures

This is what I think the changes will look like in function signatures
    # local_path can be dir or file path
    # return value is file path or None
    @staticmethod
    def get_local_target(
        targetinfo: TargetFile, 
        local_path: str
    ) -> Optional[str]

    # local_path can be dir or file path (if it's a path, then a safe file name is generated)
    # return value is a file path
    def download_target(
        self,
        targetinfo: TargetFile,
        local_path: str,
        target_base_url: Optional[str] = None,
    ) -> str
Considered, but not included changes

I considered these as well, but am currently not planning to include them:

Does ngclient: feature request "download target as bytes" #1556 make sense: should there be version of download_target() that returns the target file as bytes for the use case where writing to a real file is not necessary? As far as I can tell these would be entirely new functions though so this could be done at a later time without breaking API, so I'm planning to not do this now.

metadata.py is already too large, let's avoid adding these until we have a use-case.

it feels more like a purity test that makes API more complicated to average developer and not a major practical benefit

😆 yes!

jku · 2021-09-20T11:18:53Z

Updater.refresh(): there is no reason to force client to explicitly call this

I am less certain about this. We should try and get this right for 1.0 so we don't have to break API too soon. Is always refreshing going to cause problems anywhere? It might be problematic for an air-gapped client (with local/cached files) that can't reach out to the network to refresh()?

Doing the actual refresh has always been required so that's not changing with my plan... not being able to connect to remote always means failure (in this plan and current implementations). The question of "should we be able to do something with local metadata without checking remote metadata" is a reasonable one. Examples:

if I just ran the updater, then fix a typo in the target name and run again should it be possible for the client to not check root/timestamp?
(your example) should it be possible to check if a local target is valid according to local metadata, without updating newest timestamp?

These sounds like post 1.0 material to me but I'll write some ideas on implementation:

I think the first case is an optimization that might not be worth it... but a possible implementation could maybe be a Updater configuration where client could define cache_validity = 15 # seconds and for that time from last reference time Updater would then break specification and not update root/timestamp (and then would typically not need to update snapshot or targets or any delegated targets it happens to have locally). This would require the client or Updater to store the reference time and would require the Updater to believe the stored reference time later.

The second case I guess could be done with similar configuration -- but a practical limit with this will be the timestamp expiry: I don't think we want to add code for allowing expired metadata in some cases. But imagining an update system designed for this (with long enough expiries), I guess the same approach would work?

the local path argument to both functions can be a filename or a directory. If it is a directory, then Updater chooses a safe filename for target path (similar to current functionality). If the client knows the filename it wants, it can just use that as the argument. This allows using the Updater in a caching manner (using the same cache dir over time), or just to download a file without any cache

How do we indicate the difference between a directory and a filename? directory paths must end with /? Is this too magical?

my prototype code uses os.path.isdir(path_arg).

no local directories are created by any of the calls: If download_target() chooses the filename, it will encode the target path before using it as a filename.

Sounds reasonable. Will this work with how, for example, pip does caching today?

pips actual cache is the http cache (and we planned to let that be the case for TUF downloads as well). For the project index download case, the file would be temporary and filename does not matter so I suppose pip will let Updater choose the filename. For the real target files, I think pip will want to build the local filename itself (but it will want to use get_local_target() as in some cases the file might already exist).

joshuagl · 2021-09-20T12:48:22Z

Thanks for working through these with me Jussi.

if I just ran the updater, then fix a typo in the target name and run again should it be possible for the client to not check root/timestamp?

(your example) should it be possible to check if a local target is valid according to local metadata, without updating newest timestamp?

These sounds like post 1.0 material to me but I'll write some ideas on implementation:

I agree that this is post 1.0 material. I just want to keep it in mind so we are able to satisfy the requirements.

The second case I guess could be done with similar configuration -- but a practical limit with this will be the timestamp expiry: I don't think we want to add code for allowing expired metadata in some cases. But imagining an update system designed for this (with long enough expiries), I guess the same approach would work?

Agree, we do not want code to allow for expired metadata in some cases. What I think is the sensible approach for air-gaps is for the client to be able to use local valid metadata but not choke if fetching remote metadata fails. If expires has passed, the client needs new metadata.

Looking over ngclient interfaces, this might be an additional method (added post 1.0) to explicitly load local metadata and a config optional + checking in appropriate places (i.e. if config.offline: return in refresh()) to ignore refresh calls?

the local path argument to both functions can be a filename or a directory. If it is a directory, then Updater chooses a safe filename for target path (similar to current functionality). If the client knows the filename it wants, it can just use that as the argument. This allows using the Updater in a caching manner (using the same cache dir over time), or just to download a file without any cache

How do we indicate the difference between a directory and a filename? directory paths must end with /? Is this too magical?

my prototype code uses os.path.isdir(path_arg).

and caller owns creating directories, makes sense. I'm still a bit concerned about how intuitive this method will feel to users. Are we OK with requiring users to ensure directories exist and possibly doing unexpected things when they do not?

For example, the following to calls to download_target() behave the same if /tmp/badger is an existing directory, but what if it isn't?

updater.download_target(tgtinfo, "/tmp/badger/")
updater.download_target(tgtinfo, "/tmp/badger")

no local directories are created by any of the calls: If download_target() chooses the filename, it will encode the target path before using it as a filename.

Sounds reasonable. Will this work with how, for example, pip does caching today?

pips actual cache is the http cache (and we planned to let that be the case for TUF downloads as well). For the project index download case, the file would be temporary and filename does not matter so I suppose pip will let Updater choose the filename. For the real target files, I think pip will want to build the local filename itself (but it will want to use get_local_target() as in some cases the file might already exist).

👍

jku · 2021-09-20T12:49:46Z

get_local_target() name is not great but I want it to be a counterpart to download_target and that was the best I could come up with (and I believe it's much better than the ambiguous updated_targets())

Other options: get_cached_target(), find_cached_target(), find_local_target()

Hmm, yes, maybe 'cache' is a better term than 'local' (when contrasted with download_target())...

jku · 2021-09-21T15:59:10Z

What I think is the sensible approach for air-gaps is for the client to be able to use local valid metadata but not choke if fetching remote metadata fails. If expires has passed, the client needs new metadata.

Looking over ngclient interfaces, this might be an additional method (added post 1.0) to explicitly load local metadata and a config optional + checking in appropriate places (i.e. if config.offline: return in refresh()) to ignore refresh calls?

refresh() is currently responsible for loading local metadata and that probably still makes sense... my hunch is that supporting a local-only use case would not be terribly complex, but would be complex enough that implementation should happen when we know that a real use case exists.

How do we indicate the difference between a directory and a filename? directory paths must end with /? Is this too magical?

my prototype code uses os.path.isdir(path_arg).

and caller owns creating directories, makes sense. I'm still a bit concerned about how intuitive this method will feel to users. Are we OK with requiring users to ensure directories exist and possibly doing unexpected things when they do not?

requiring directories to exist seems fine to me but...

For example, the following to calls to download_target() behave the same if /tmp/badger is an existing directory, but what if it isn't?

This is a reasonable criticism. My prototype code would indeed try to write to a file (first one would fail and second would succeed...). The only alternatives I can think of are adding duplicate functions for the two types of arguments (this seems unappealing) and adding separate arguments for filename and directory which looks a bit unique but could work:

# separate arguments for directory (required, must exist) and filename (optional)
path = updater.download_target(tgtinfo, "/tmp")
path = updater.download_target(tgtinfo, "/tmp", filename="badger")

# or optional arguments, either dir or filepath must be set
path = updater.download_target(tgtinfo, dir="/tmp")
path = updater.download_target(tgtinfo, filepath="/tmp/badger")

I'm not super into either one of these.

joshuagl · 2021-09-22T09:03:35Z

This is a reasonable criticism. My prototype code would indeed try to write to a file (first one would fail and second would succeed...). The only alternatives I can think of are adding duplicate functions for the two types of arguments (this seems unappealing) and adding separate arguments for filename and directory which looks a bit unique but could work:
# separate arguments for directory (required, must exist) and filename (optional)
path = updater.download_target(tgtinfo, "/tmp")
path = updater.download_target(tgtinfo, "/tmp", filename="badger")

# or optional arguments, either dir or filepath must be set
path = updater.download_target(tgtinfo, dir="/tmp")
path = updater.download_target(tgtinfo, filepath="/tmp/badger")
I'm not super into either one of these.

Of these options I prefer the latter, users either tell us where to store a file we name or precisely where to write the file. Separate arguments makes it clear they are different use-cases. Mutually exclusive optional arguments might be a little confusing, but feels safer for users than a single argument.

@sechkova any thoughts on the proposed API here?

sechkova · 2021-09-23T10:51:20Z

From what I understand, the intention is that if download_target gets a directory and it does not exist, it raises an exception but if it gets a file path then it encodes it as the filename?

I guess mutual exclusive if I have to choose from the options above, otherwise the result from this misunderstanding of the API:

path = updater.download_target(tgtinfo, dir="/tmp", filename"/tmp/badger")

(if /tmp exists) may come as a surpeise ( /tmp/%2Ftmp%2Fbadger)

Other options: get_cached_target(), find_cached_target(), find_local_target()

The find* names make it clearer to me why we use info and a name to return a path or None.

jku · 2021-09-23T16:29:42Z

From what I understand, the intention is that if download_target gets a directory and it does not exist, it raises an exception but if it gets a file path then it encodes it as the filename?

can you explain which variant exactly you mean? I can't quite follow the question

To summarize there have been three options so far I think:

# original: single argument, will be understood as directory if the directory already exists
# case 1: argument is directory, target is saved in  "/tmp/<encoded_filename>"
path = updater.download_target(tgtinfo, "/tmp")
# case 2: argument is not directory, target is saved in  "/tmp/filename"
path = updater.download_target(tgtinfo, "/tmp/filename")

# variant 1: separate arguments for directory (required, dir must exist) and filename (optional)
# case 1: target is saved in  "/tmp/<encoded_filename>"
path = updater.download_target(tgtinfo, "/tmp")
# case 2: target is saved in  "/tmp/badger"
path = updater.download_target(tgtinfo, "/tmp", filename="badger")

# variant2: optional arguments, one of dir or filepath must be set
# case 1: target is saved in  "/tmp/<encoded_filename>"
path = updater.download_target(tgtinfo, dir="/tmp")
# case 1: target is saved in  "/tmp/badger"
path = updater.download_target(tgtinfo, filepath="/tmp/badger")

Variant 1 feels ugly and unusual. I can live with variant 2 or original.

joshuagl · 2021-09-23T16:50:12Z

My preference is variant 2, due to the concern I raised with the original proposal.

sechkova · 2021-09-24T09:05:12Z

can you explain which variant exactly you mean? I can't quite follow the question

Sorry, I've misunderstood partially. Let's go for variant 2 then.

Doing so is not always safe and has various other issues (like target paths "a/../b" and "b" ending up as the same local path). Instead URL-encode the target path to make it a plain filename. This removes any opportunity for path trickery and removes the need to create the required sub directories (which we were not doing currently, leading to failed downloads). URL-encoding encodes much more than we really need but doing so should not hurt: the important thing is that it encodes all path separators. Return the actual filepath as return value. I would like to modify the destination argument to be either a directory or a filename so caller could decide the filename if they want to. But I won't do it now because updated_targets() (the caching mechanism) relies on filenames being chosen by TUF. The plan is to make it possible for caller to choose the filename though. This is part of bigger plan in theupdateframework#1580 Fixes theupdateframework#1571 Signed-off-by: Jussi Kukkonen <[email protected]>

Doing so is not always safe and has various other issues (like target paths "a/../b" and "b" ending up as the same local path). Instead URL-encode the target path to make it a plain filename. This removes any opportunity for path trickery and removes the need to create the required sub directories (which we were not doing currently, leading to failed downloads). URL-encoding encodes much more than we really need but doing so should not hurt: the important thing is that it encodes all path separators. Return the actual filepath as return value. I would like to modify the arguments so caller could decide the filename if they want to. But I won't do it now because updated_targets() (the caching mechanism) relies on filenames being chosen by TUF. The plan is to make it possible for caller to choose the filename though. This is clearly a "filesystem API break" for anyone depending on the actual target file names, and does not make sense if we do not plan to go forward with other updated_targets()/download_target() changes listed in theupdateframework#1580. This is part of bigger plan in theupdateframework#1580 Fixes theupdateframework#1571 Signed-off-by: Jussi Kukkonen <[email protected]>

jku · 2021-09-27T13:06:16Z

Status report for this:

tests: Add target support to RepositorySimulator #1587 is merged
ngclient: Don't use target path as local path #1592 is currently in review

Other than those my WIP branch contains

a few test cleanups
change of updated_targets -> get_cached_target (including the directory and filepath arguments)
get_one_valid_targetinfo rename
implicit refresh

I think those will be ~two more PRs -- I will make them when current open one merges, but if you want to see the end result, see https://github.com/jku/python-tuf/commits/ngclient-final-form

jku · 2021-10-07T07:05:33Z

Curently active PR for this is #1604 : Since there's a lot of target dir / filepath discussion here already, I'll mention that has a design that improves on the ones discussed here

After this the only thing left is the implicit refresh which is quite trivial.

Doing so is not always safe and has various other issues (like target paths "a/../b" and "b" ending up as the same local path). Instead URL-encode the target path to make it a plain filename. This removes any opportunity for path trickery and removes the need to create the required sub directories (which we were not doing currently, leading to failed downloads). URL-encoding encodes much more than we really need but doing so should not hurt: the important thing is that it encodes all path separators. Return the actual filepath as return value. I would like to modify the arguments so caller could decide the filename if they want to. But I won't do it now because updated_targets() (the caching mechanism) relies on filenames being chosen by TUF. The plan is to make it possible for caller to choose the filename though. This is clearly a "filesystem API break" for anyone depending on the actual target file names, and does not make sense if we do not plan to go forward with other updated_targets()/download_target() changes listed in theupdateframework#1580. This is part of bigger plan in theupdateframework#1580 Fixes theupdateframework#1571 Signed-off-by: Jussi Kukkonen <[email protected]>

sechkova · 2021-11-22T16:20:42Z

First I think all points from this issue are done and it can be closed.
But just before that I wanted to suggest adding a user-friendly error message when refresh is called a second time during the lifetime of an Updater object. This is already documented but the error you get is "RuntimeError: Cannot update timestamp after snapshot". I suggest we add something more appropriate like "refresh()" can be done only once during the lifetime of an Updater".

MVrachev · 2021-11-22T16:49:20Z

Whoever closes that issue please have a look at #1135 as well as I think that's the only issue that stops it from being closed.

jku · 2021-11-30T08:41:07Z

closing: client looks complete to me

jku added the ngclient label Sep 15, 2021

sechkova added the backlog Issues to address with priority for current development goals label Sep 15, 2021

sechkova assigned jku Sep 15, 2021

sechkova added this to the Sprint 8 milestone Sep 15, 2021

jku mentioned this issue Sep 17, 2021

Design client library #1135

Closed

jku mentioned this issue Sep 27, 2021

ngclient: Don't use target path as local path #1592

Merged

3 tasks

jku mentioned this issue Sep 28, 2021

ngclient updater: API functions argument validation #1594

Closed

4 tasks

sechkova modified the milestones: Sprint 8, Sprint 9 Sep 29, 2021

sechkova mentioned this issue Sep 29, 2021

ngclient: experiment with separating local target caching from updater #1412

Closed

joshuagl mentioned this issue Sep 30, 2021

Create migration guide prior to 1.0 #1597

Closed

jku mentioned this issue Oct 8, 2021

Ngclient api polish #1604

Merged

3 tasks

jku modified the milestones: Sprint 9, Sprint 10 Oct 13, 2021

jku mentioned this issue Oct 29, 2021

ngclient: Make refresh() optional #1650

Closed

jku closed this as completed Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngclient: reveal final form (finish API changes) #1580

ngclient: reveal final form (finish API changes) #1580

jku commented Sep 15, 2021 •

edited

Loading

jku commented Sep 15, 2021

jku commented Sep 20, 2021

jku commented Sep 20, 2021

joshuagl commented Sep 20, 2021

Signatures

Considered, but not included changes

jku commented Sep 20, 2021 •

edited

Loading

joshuagl commented Sep 20, 2021

jku commented Sep 20, 2021

jku commented Sep 21, 2021

joshuagl commented Sep 22, 2021

sechkova commented Sep 23, 2021

jku commented Sep 23, 2021

joshuagl commented Sep 23, 2021

sechkova commented Sep 24, 2021

jku commented Sep 27, 2021

jku commented Oct 7, 2021 •

edited

Loading

sechkova commented Nov 22, 2021

MVrachev commented Nov 22, 2021

jku commented Nov 30, 2021

ngclient: reveal final form (finish API changes) #1580

ngclient: reveal final form (finish API changes) #1580

Comments

jku commented Sep 15, 2021 • edited Loading

Planned API usage

Considered, but not included changes

jku commented Sep 15, 2021

jku commented Sep 20, 2021

jku commented Sep 20, 2021

joshuagl commented Sep 20, 2021

Signatures

Considered, but not included changes

jku commented Sep 20, 2021 • edited Loading

joshuagl commented Sep 20, 2021

jku commented Sep 20, 2021

jku commented Sep 21, 2021

joshuagl commented Sep 22, 2021

sechkova commented Sep 23, 2021

jku commented Sep 23, 2021

joshuagl commented Sep 23, 2021

sechkova commented Sep 24, 2021

jku commented Sep 27, 2021

jku commented Oct 7, 2021 • edited Loading

sechkova commented Nov 22, 2021

MVrachev commented Nov 22, 2021

jku commented Nov 30, 2021

jku commented Sep 15, 2021 •

edited

Loading

jku commented Sep 20, 2021 •

edited

Loading

jku commented Oct 7, 2021 •

edited

Loading