-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs commit handling should propagate content cache errors back to user #792
Comments
The description of store() above assumes #788 gets merged. |
With PR #801 it becomes possible to put a dangling blobref to the KVS. In addition to managing the errors above, the commit logic should ensure that any new references are loaded into the rank 0 cache to verify that they work. Alternatively we might want to add a content store operation that simply checks if a blobref has been stored without returning it, since one use case for creating such a reference outside the KVS would be to avoid running a large value through the KVS commit logic. |
Update here given all the recent changes to the KVS:
Still true, but now is in the area of
Synchronous errors are handled now.
I believe these error paths have now been handled properly. |
Per discussion in #1227 - a good test to add once this is complete is to have a valref with multiple blobs in it contain an illegal blobref. |
Do we still have any content errors that are not handled properly? I'm going to tentatively close this as it seems like we've handled them all. @chu11 please reopen if I'm wrong. |
Yeah, the asynchronous cases still exist. Basically the replay of a stalled request is still triggered when a value is updated in the local kvs cache. If an error occurs such that the cache is never updated, then the replay of the original request won't ever be triggered to complete. |
As noted in #1799, it's not isolated to "content cache" errors that are part of this issue. If a blobref happens to be missing, this is equivalent to a "content cache" error. With the completion of #1696 / #1318, I think an easy way to fix this is via msg aux_set/aux_get. If an error occurs in asynchronous communications (such as in I'll need some new function in the cache API to iterate through waitqueue entries and set the msg aux field, but this seems like a not so horrible path to finally fix this. |
doh, the above assumed that all waiters had messages, but that's not always the case. What I really need is to set some type of "error callback" on a |
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
Allow many errors during an asynchronous store to be returned to the original caller, instead of just hanging. A few catastrophic errors cannot be recovered from. Fixes flux-framework#792
If an error occurs while making an RPC to the content-cache service, it is not handled properly in the following cases:
Further, after an async cache fill or cache flush error occur, the cache entry can be left dirty or invalid forever.
The text was updated successfully, but these errors were encountered: