-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tuf: improve TUF client concurrency and caching #1953
Conversation
Signed-off-by: Asra Ali <[email protected]>
Signed-off-by: Asra Ali <[email protected]> update Signed-off-by: Asra Ali <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1953 +/- ##
==========================================
+ Coverage 34.00% 34.69% +0.69%
==========================================
Files 153 153
Lines 9981 10076 +95
==========================================
+ Hits 3394 3496 +102
Misses 6208 6208
+ Partials 379 372 -7
Continue to review full report at Codecov.
|
Should we drop WIP from the title? |
I just realized one more clean-up I can do to this PR more clean, and will drop it then! I'll ping back for a review |
Cool! Nice job here :) |
Co-authored-by: Ville Aikas <[email protected]> Signed-off-by: Asra Ali <[email protected]>
cc @dlorenc @vaikas @haydentherapper @znewman01 Ready for review! Addressed some of the clean-up issues, expect more to come, including (listed in #1935):
|
func GetRekorPubs(ctx context.Context) (map[string]RekorPubKey, error) { | ||
// SIGSTORE_TRUST_REKOR_API_PUBLIC_KEY - If specified, fetches the Rekor public | ||
// key from the Rekor server using the provided rekorClient. | ||
// TODO: Rename SIGSTORE_TRUST_REKOR_API_PUBLIC_KEY to be test-only or remove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd say remove entirely - Ultimately this will get used outside of testing, and at least with the other env var, the file is provided out of band rather than fetching the key directly from the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought here is that being able to fetch the public key from Rekor is part of the Rekor API, and whether this flag is here or not is only one part of getting used 'outside of testing'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s fine to provide an API to fetch the public key, but there needs to be an explicit strategy around root of trust management. Ideally it’d always be TUF for Sigstore, but consumers may choose to TOFU - but they must persist the key, verification logic can’t rely on fetching the key each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it should be removed -- but I'd like to do this in another PR because the current webhook and tests rely on it and I need to dig in separately into that.
pkg/cosign/tuf/client.go
Outdated
// singletonTUF holds a single instance of TUF that will get reused on | ||
// subsequent invocations of initializeTUF | ||
singletonTUF *TUF | ||
singletonTUFMu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use a Once to control access instead of a mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could. I just wasn't sure if we'd be swapping the TUF object on update, or just modify parts of it on update. I think we might just be able to modify the in-mem map during updates, but as I said wasn't sure what the level of modification was right so started with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are modifying parts of it on update -- Once is feasible, I can work with reseting it during tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I dug in while I was attempting this change: we don't end up updating the TUF client unless we re-initialize. Since this PR uses the singleton/mutex we don't actually support long-running processes. I'm going to change to Once
since it's behavior will be the same as a mutex, and call out a follow-up to support long-running process updates in the Cleanup issue (in practice this isn't that big of an issue right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think Once is cleaner since it explicitly shows that the singleton should be initialized only once.
pkg/cosign/tuf/client.go
Outdated
@@ -105,6 +108,12 @@ type remoteCache struct { | |||
Mirror string `json:"mirror"` | |||
} | |||
|
|||
func resetForTests() { | |||
singletonTUFMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be wrapped in a mutex? If tests aren't run in parallel the lock isn't needed. If tests are being run in parallel, then wouldn't deleting the singleton be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some of the tests are expecting a 'fresh' TUF setup, I added this to make sure we don't get any errors from modifying it without a lock in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests within the tuf pkg will be sequential, and we need the reset (but as you said, don't need the lock).
There is 1 test outside the TUF package that uses a custom TUF root. I'm concerned about that, that may be run in parallel. I will be refactoring that anyway to use a custom CheckOpts
for the Rekor key rather than a custom TUF set up.
In the interim, I need to think how to lock that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK resolution: I added a TODO for that test that uses a custom root outside this package, expect a PR this afternoon addressing that.
return t.getRootStatus() | ||
} | ||
|
||
// Close closes the local TUF store. Should only be called once per client. | ||
func (t *TUF) Close() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, this is no longer needed because we only access the TUF DB to write the in-memory metadata on initialization if not present, or read it if present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's right! For longer running processes (like the webhook) there should be a way to refresh the data, but @asraa has some ideas there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For longer running processes we are also refreshing now -- this is handled in updateClient
: we Update our local in-memory metadata, then syncMetadata to the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused why we need the mutex. Is it because there's a concern that initializeTUF
will called multiple times by other processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't -- I switched over to Once
initialization, and reset the sync.Once
for tests.
The only time it will be called multiple times potentially in parallel is because of the one test outside the TUF pkg that may run in parallel with the TUF tests.
} | ||
// Sync the in-memory local store to the on-disk cache. | ||
tufDB := filepath.FromSlash(filepath.Join(rootCacheDir(), "tuf.db")) | ||
diskLocal, err := tuf_leveldbstore.FileLocalStore(tufDB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this API expected to be used, or are we reaching into private APIs? I just wanna make sure this is maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, FileLocalStore
is meant to be used!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @asraa for getting this wrangled into shape and @haydentherapper for the reviews!.
return t.getRootStatus() | ||
} | ||
|
||
// Close closes the local TUF store. Should only be called once per client. | ||
func (t *TUF) Close() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's right! For longer running processes (like the webhook) there should be a way to refresh the data, but @asraa has some ideas there.
pkg/cosign/tuf/client.go
Outdated
// singletonTUF holds a single instance of TUF that will get reused on | ||
// subsequent invocations of initializeTUF | ||
singletonTUF *TUF | ||
singletonTUFMu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could. I just wasn't sure if we'd be swapping the TUF object on update, or just modify parts of it on update. I think we might just be able to modify the in-mem map during updates, but as I said wasn't sure what the level of modification was right so started with this.
pkg/cosign/tuf/client.go
Outdated
@@ -105,6 +108,12 @@ type remoteCache struct { | |||
Mirror string `json:"mirror"` | |||
} | |||
|
|||
func resetForTests() { | |||
singletonTUFMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some of the tests are expecting a 'fresh' TUF setup, I added this to make sure we don't get any errors from modifying it without a lock in place.
Signed-off-by: Asra Ali <[email protected]>
singletonTUF *TUF | ||
singletonTUFMu sync.Mutex | ||
singletonTUF *TUF | ||
singletonTUFOnce = new(sync.Once) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I actually didn't know that go had allocation via new
.
I don't mind this approach, but I found that someone else had the same problem and made sync but with Reset - https://github.com/matryer/resync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i saw this as well and ideally wanted to avoid a totally new dependency just for testing -- I'm not sure if there's a big downside to this approach though.
if ok && !isExpiredTimestamp(trustedTimestamp) && !forceUpdate { | ||
// We're golden so stash the TUF object for later use | ||
singletonTUF = t | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this returning nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's returning out of the sync.Once
func, but we still return singletonTUF
from initializeTUF
pkg/cosign/tuf/client.go
Outdated
var err error | ||
t.local, err = newLocalStore() | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn, is this the consequence of sync.Once? If so, this isn't viable imo, we shouldn't add a panic into the client because if a server starts depending on it, it opens up an opportunity for query of deaths. Is there a way to return an error? If not, what you had before would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on not panicing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do a global variable - https://stackoverflow.com/questions/42069615/too-many-arguments-to-return-error
Just ran into this same issue in fulcioroots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the global!
Signed-off-by: Asra Ali <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
LGTM, great work on this! |
Summary
GetRekorPubs
.Ticket Link
Partial fix #1935
Release Note