-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker: content-cache acct_dirty
can be greater than the number of flushable entries
#4472
Comments
this bothered me so much i continued looking into it and I figured out the problem, This issue occurred via regression test for issue 1760, in which the KVS drops its internal cache via dropcache. The content-cache in the broker therefore got content store requests for the same data twice, ultimately leading to dirty entries being added to the flush list twice, and messing up the flush list and making it inconsistent to the It appeared during my work on #4267 because the content backing store wasn't loaded by default. Trying to see if I can reproduce this issue in master as it is a semi-severe bug that might even cause segfault / memory corruption. Note that I think the issues I describe above in my first comment are still issues, so I may make another issue for this specific one. |
Oof, glad you ran that down! |
So sounds like this is still an issue?
|
yeah, I split out the list corruption into #4482, what I wrote above is still possible ... and perhaps there's other paths I did not see. |
while I'm in the middle of doing so much work on the broker's content-cache, I thought to try and fix the two potential error paths I noted above. As I think about it more, issue 2 above shouldn't be a problem if issue 1 is solved. The content-cache should eventually get an ENOSYS that the backing module is gone. As for issue 1, it'd be easy to simply just put an entry back on the flush list if an error occurs. But not all errors are equal. If the file system is full and the backing module can never write anything to disk, then all we're doing is just constantly flushing things to disk that will never succeed, ever increasing the flush list. On the other hand, if we get ENOSYS, we can be confident that the backing module is gone for now and we can just stick it back on the flush list for later. An optimal solution would be to mark each cache entry with a timeout of sorts saying, "don't try to flush this for awhile" if we believe the backing module to be borked. But this is probably more feature / work than is truly necessary at this point in time. So I'm thinking if an error occurs on the backing module, just stick it back on the flush list is a more than suitable solution. Edit: oh no, issue 2 has to be handled in a similar way to issue 1, since we can't guarantee the message received by the backing module. we just have to assume the request failed and have to put the dirty entry back on the flush list. |
Problem: If an error occurs during a content store, a dirty cache entry can be lost forever as it not on the flush list or the lru list. As a result, the "dirty count" of entries will be inconsistent to the known dirty entries (i.e. entries in the flush list or in the process of being stored). Solution: If a content store fails, add the entry to a new flush_errors list so it can be tried again at a later time. Fixes flux-framework#4472
Per some discussion in #4524 on this branch: https://github.com/chu11/flux-core/tree/issue4472_put_back_on_flush_list
@garlick then noted:
This is more complex than perhaps desired and its not for a scenario that we have ever seen and such a scenario is likely rare (ENOSPC, i.e. disk full, is the most likely). Since there is no current need for any of the above, decision is to log to LOG_CRIT instead of LOG_ERR, so that the error is more obvious to system administrators that things are bad bad bad. |
we'll consider this closed with #4526, but the branch listed above is available for reference of "fuller" solution |
While working on #4267, I noticed that sometimes the
acct_dirty
field can be greater than the length of theflush
list, even when a backing module is not loaded. aflux content flush
will therefore hang, b/cacct_dirty
will never become zero, so the content flush will never respond to the original request.I'm not sure how this happened, but here are several possibilities when looking through code. Some of this is duplicated text from #4267.
one potential path is in
cache_store()
andcache_store_continuation()
. An entry from the flush list is sent to the content backing module for backing and removed from the flush list. If an error occurs,cache_store_continuation()
will see the error but not put it back on the flush list andacct_dirty
will not be decremented.Another potential (similar) path may be when we unload the backing-module, so that backing store requests get lost.
acct_dirty
will stay incremented even though flush entries have been taken off the list.(Update, now that I think about it #2 is a strong possibility. I noticed this in the regression tests, which one test fix of #4267 went into affect, content-sqlite was being loaded / reloaded regularly in the regression tests.)
The text was updated successfully, but these errors were encountered: