Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config.json and log.json can get out of sync, preventing startup of the credential manager #14

Closed
jchartrand opened this issue Jul 31, 2023 · 7 comments
Assignees

Comments

@jchartrand
Copy link

If I pass a badly formed VC to the allocateStatus method, it fails, but leaves the repos in a state such that when I try to restart, it gives me an error. I think the problem is that when allocating a status, the manager first updates the config, and in particular increments the number of credentials issued and saves this back to Github:

// update status config data
configData.credentialsIssued = credentialsIssued;
configData.latestList = latestList;
await this.updateConfigData(configData);

And then when it goes on to issue the credential, that fails, and so the credential never gets issued and the log doesn't get updated, which leaves the number of credentials issued in the config ('credentialsIssued') one greater than are in the log. Which is what later I think causes the error when restarting, specifically because of this check:

const hasProperLogEntries = credentialIdsUnique.length === credentialsIssued;

Fundamentally I think the problem is that the two operations (updateConfig and updateLog) aren't atomic and so the log and config can get out of whack. I could imagine this causing other problems too.

If the 'credentialsIssued` in the log is only used to know when the current list is full and so that a new list needs to be created, then maybe better to just calculate the # of credentials issued from the log?

@kezike kezike self-assigned this Aug 4, 2023
@kezike
Copy link
Contributor

kezike commented Aug 4, 2023

At the moment, I am leaning toward adding a credential validation check early in allocateStatus. If that fails then neither the config nor the log file would be updated.

With this solution, there is still a valid argument to be made regarding consistency and fault tolerance in the face of network failure. Many of the operations in this library involve updating files both in the same repo and in separate repos (i.e., status credential in the main repo and config or log file in the metadata repo). Considering that this library will likely be used in large scale course distributions, such as Harvard's CS50, we may want to consider a mechanism that implements transaction rollback when one operation of a logical transaction is interrupted.

@dmitrizagidulin your thoughts are welcome here.

@kezike
Copy link
Contributor

kezike commented Aug 7, 2023

I have done a bit more thinking on this over the last couple of days and I think I have a solution that is an extension of the locking/serialization work that I have already done with this library.

The main update would be to check core invariants in the relationship between the three status documents (status credential, log file, config file) prior to releasing the lock in transactional functions. If any of the invariants fail, restore the value of these three documents, release the lock to allow other processes a chance to execute, and try again. The main code delta would involve storing the current values of these documents in memory at the beginning of all transactional functions and restoring/retrying as needed.

@jchartrand @dmitrizagidulin thoughts on this approach?

@jchartrand
Copy link
Author

jchartrand commented Aug 7, 2023 via email

@kayaelle kayaelle moved this to Backlog in DCC Engineering Aug 9, 2023
@kezike
Copy link
Contributor

kezike commented Aug 10, 2023

Resolution: we will apply my proposal from above, plus a modification that addresses @jchartrand's concerns: instead of temporarily saving the repo data locally, we will temporarily save it in the metadata repo, such that a disrupted client service that finds this data on restart understands that it needs to restore the repo data to this previous state. Additionally, we will combine the config and log files to prevent them from getting out of sync, per @jchartrand's recommendation.

@kezike kezike moved this from Backlog to In Progress in DCC Engineering Aug 16, 2023
This was referenced Aug 25, 2023
@kayaelle kayaelle moved this from In Progress to To Do (Current sprint) in DCC Engineering Nov 1, 2023
@alexfigtree alexfigtree moved this from To Do (Current sprint) to Release Ready in DCC Engineering Feb 21, 2024
@alexfigtree
Copy link
Member

Tested.

@alexfigtree
Copy link
Member

Keeping open until after deployment

@alexfigtree alexfigtree reopened this Feb 21, 2024
@alexfigtree alexfigtree moved this from Release Ready to Done (Deployed) in DCC Engineering Jun 21, 2024
@alexfigtree
Copy link
Member

Deployed to both Google Play and App Store (release 2.1.0-build80), closing ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done (Deployed)
Development

No branches or pull requests

3 participants