broker: content-cache `acct_dirty` can be greater than the number of flushable entries #4472

chu11 · 2022-08-05T00:30:16Z

While working on #4267, I noticed that sometimes the acct_dirty field can be greater than the length of the flush list, even when a backing module is not loaded. a flux content flush will therefore hang, b/c acct_dirty will never become zero, so the content flush will never respond to the original request.

I'm not sure how this happened, but here are several possibilities when looking through code. Some of this is duplicated text from #4267.

one potential path is in cache_store() and cache_store_continuation(). An entry from the flush list is sent to the content backing module for backing and removed from the flush list. If an error occurs, cache_store_continuation() will see the error but not put it back on the flush list and acct_dirty will not be decremented.
Another potential (similar) path may be when we unload the backing-module, so that backing store requests get lost. acct_dirty will stay incremented even though flush entries have been taken off the list.

(Update, now that I think about it #2 is a strong possibility. I noticed this in the regression tests, which one test fix of #4267 went into affect, content-sqlite was being loaded / reloaded regularly in the regression tests.)

The text was updated successfully, but these errors were encountered:

chu11 · 2022-08-09T21:07:52Z

this bothered me so much i continued looking into it and I figured out the problem, list_add_tail() does not check if the node is already on the list or manipulate pointers to move the node to the end of the list like I think code writers (including myself) thought it would.

This issue occurred via regression test for issue 1760, in which the KVS drops its internal cache via dropcache. The content-cache in the broker therefore got content store requests for the same data twice, ultimately leading to dirty entries being added to the flush list twice, and messing up the flush list and making it inconsistent to the acct_dirty count.

It appeared during my work on #4267 because the content backing store wasn't loaded by default.

Trying to see if I can reproduce this issue in master as it is a semi-severe bug that might even cause segfault / memory corruption.

Note that I think the issues I describe above in my first comment are still issues, so I may make another issue for this specific one.

garlick · 2022-08-09T22:42:41Z

Oof, glad you ran that down!

garlick · 2022-08-15T18:51:30Z

So sounds like this is still an issue?

one potential path is in cache_store() and cache_store_continuation(). An entry from the flush list is sent to the content backing module for backing and removed from the flush list. If an error occurs, cache_store_continuation() will see the error but not put it back on the flush list and acct_dirty will not be decremented.

chu11 · 2022-08-15T20:13:15Z

yeah, I split out the list corruption into #4482, what I wrote above is still possible ... and perhaps there's other paths I did not see.

chu11 · 2022-08-25T23:47:26Z

while I'm in the middle of doing so much work on the broker's content-cache, I thought to try and fix the two potential error paths I noted above.

As I think about it more, issue 2 above shouldn't be a problem if issue 1 is solved. The content-cache should eventually get an ENOSYS that the backing module is gone.

As for issue 1, it'd be easy to simply just put an entry back on the flush list if an error occurs. But not all errors are equal. If the file system is full and the backing module can never write anything to disk, then all we're doing is just constantly flushing things to disk that will never succeed, ever increasing the flush list. On the other hand, if we get ENOSYS, we can be confident that the backing module is gone for now and we can just stick it back on the flush list for later.

An optimal solution would be to mark each cache entry with a timeout of sorts saying, "don't try to flush this for awhile" if we believe the backing module to be borked. But this is probably more feature / work than is truly necessary at this point in time. So I'm thinking if an error occurs on the backing module, just stick it back on the flush list is a more than suitable solution.

Edit: oh no, issue 2 has to be handled in a similar way to issue 1, since we can't guarantee the message received by the backing module. we just have to assume the request failed and have to put the dirty entry back on the flush list.
Edit2: Nope, I think I was right the first time :P

Problem: If an error occurs during a content store, a dirty cache entry can be lost forever as it not on the flush list or the lru list. As a result, the "dirty count" of entries will be inconsistent to the known dirty entries (i.e. entries in the flush list or in the process of being stored). Solution: If a content store fails, add the entry to a new flush_errors list so it can be tried again at a later time. Fixes flux-framework#4472

chu11 · 2022-08-26T21:37:09Z

Per some discussion in #4524 on this branch:

https://github.com/chu11/flux-core/tree/issue4472_put_back_on_flush_list

A) this is based on code review while trying to figure out #4482. We actually haven't had this be an issue.

B) while not very complex, it maybe adds a bit more complexity than I wanted to. I couldn't put the dirty entries back on the flush list b/c that could lead to "infinite loop" as dirty cache entries get retried over and over again to be stored (assuming error on-going).

B2) we could make more optimal by checking for certain error conditions, but elected not to do that for now

C) I don't know how to test this without introducing a lot of instrumentation. So punted on that for now.

@garlick then noted:

We could mark it read-only so that current cache entries remain dirty like none and then maybe try to put the system in some kind of safe mode, or shut it down with a dump (to some configured alternate location).

This is more complex than perhaps desired and its not for a scenario that we have ever seen and such a scenario is likely rare (ENOSPC, i.e. disk full, is the most likely).

Since there is no current need for any of the above, decision is to log to LOG_CRIT instead of LOG_ERR, so that the error is more obvious to system administrators that things are bad bad bad.

chu11 · 2022-09-01T04:53:41Z

we'll consider this closed with #4526, but the branch listed above is available for reference of "fuller" solution

chu11 mentioned this issue Aug 10, 2022

broker: content-cache flush list can be corrupted #4482

Closed

chu11 self-assigned this Aug 25, 2022

chu11 mentioned this issue Aug 26, 2022

[WIP] broker: do not drop dirty cache entries on error #4524

Closed

chu11 closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broker: content-cache `acct_dirty` can be greater than the number of flushable entries #4472

broker: content-cache `acct_dirty` can be greater than the number of flushable entries #4472

chu11 commented Aug 5, 2022 •

edited

Loading

chu11 commented Aug 9, 2022

garlick commented Aug 9, 2022

garlick commented Aug 15, 2022

chu11 commented Aug 15, 2022

chu11 commented Aug 25, 2022 •

edited

Loading

chu11 commented Aug 26, 2022 •

edited

Loading

chu11 commented Sep 1, 2022

broker: content-cache acct_dirty can be greater than the number of flushable entries #4472

broker: content-cache acct_dirty can be greater than the number of flushable entries #4472

Comments

chu11 commented Aug 5, 2022 • edited Loading

chu11 commented Aug 9, 2022

garlick commented Aug 9, 2022

garlick commented Aug 15, 2022

chu11 commented Aug 15, 2022

chu11 commented Aug 25, 2022 • edited Loading

chu11 commented Aug 26, 2022 • edited Loading

chu11 commented Sep 1, 2022

broker: content-cache `acct_dirty` can be greater than the number of flushable entries #4472

broker: content-cache `acct_dirty` can be greater than the number of flushable entries #4472

chu11 commented Aug 5, 2022 •

edited

Loading

chu11 commented Aug 25, 2022 •

edited

Loading

chu11 commented Aug 26, 2022 •

edited

Loading