Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: content-cache flush list can be corrupted #4482

Closed
chu11 opened this issue Aug 9, 2022 · 0 comments · Fixed by #4484
Closed

broker: content-cache flush list can be corrupted #4482

chu11 opened this issue Aug 9, 2022 · 0 comments · Fixed by #4484
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Aug 9, 2022

While working on #4267 (and leading to #4472) I saw flux content flush hang during some regression tests. Debugging I eventually realized that the acct_dirty in the content-cache and the length of the flush list were not consistent. With acct_dirty never reaching zero, flux content flush would hang never getting a reply.

After trial and error I determined regression test for issue #1760 to be the problem when the content backing module was not loaded.

In the content-cache, if a duplicate piece of data arrives, a cache entry can be added to the flush list twice and corrupt the flush list. This happens b/c list_add_tail() does not check if the node is already on the list.

Duplicate data can be sent twice by issuing a flux kvs dropcache between KVS writes that are identical, thus the KVS has no cached data and can send "previously sent" data to the content cache again.

While this bug is conceptually possible when there is a loaded content backing module, it would be very racy. (Update: duplicate data would have to be sent after a KVS dropcache AND the entry has not yet been flushed to the backing store in the content-cache). It is far easier to reproduce when the content backing module isn't loaded.

Here's a reproducer and some debug output I put into the broker to illustrate the badness (some spacing edits for readability):

cat <<-EOF >t4500.sh
#!/bin/sh -e

flux kvs put issue4500A.a="abcdefghijk"
flux kvs put issue4500A.b="lmnopqrstuv"
flux kvs put issue4500A.c="wxyz0123456"
flux kvs put issue4500A.d="7890ABCDEFG"
flux kvs put issue4500A.e="HIJKLMNOPQR"
flux kvs put issue4500A.f="STUVWXYZ!!!"
flux kvs put issue4500A.g="<<<<<:>>>>>"

flux kvs dropcache

flux kvs put issue4500B.a="abcdefghijk"
flux kvs put issue4500B.b="lmnopqrstuv"
flux kvs put issue4500B.c="wxyz0123456"
flux kvs put issue4500B.d="7890ABCDEFG"
flux kvs put issue4500B.e="HIJKLMNOPQR"
flux kvs put issue4500B.f="STUVWXYZ!!!"
flux kvs put issue4500B.g="<<<<<:>>>>>"

flux module load content-sqlite

flux content flush

EOF

chmod +x t4500.sh

flux start -s 1 \
    -o,--setattr=broker.rc1_path=${SHARNESS_TEST_DIRECTORY}/rc/rc1-issue4500 \
    -o,--setattr=broker.rc3_path=${SHARNESS_TEST_DIRECTORY}/rc/rc3-issue4500 \
    ./t4500.sh

(the rc1/rc3 are identical to rc1-kvs, but don't load content-sqlite)

cache entry on flush list already? no
before add to tail - len flush list 14 - acct_dirty = 15
after add to tail - len flush list 15 - acct_dirty = 15

2022-08-09T23:37:26.461026Z kvs.alert[0]: dropped 15 of 15 cache entries

cache entry on flush list already? no
before add to tail - len flush list 15 - acct_dirty = 16
after add to tail - len flush list 16 - acct_dirty = 16

cache entry on flush list already? yes
before add to tail - len flush list 16 - acct_dirty = 16
after add to tail - len flush list 3 - acct_dirty = 16

cache entry on flush list already? no
before add to tail - len flush list 3 - acct_dirty = 17
after add to tail - len flush list 4 - acct_dirty = 17

cache entry on flush list already? yes
before add to tail - len flush list 4 - acct_dirty = 17
after add to tail - len flush list 5 - acct_dirty = 17

while no segfaults or memory corruption have occurred under reproducers, its easy to see how that could happen if we also introduced content cache expirations. And needless to say, data we would want sent to the backing module could be never flushed and thus lost on a restart.

@chu11 chu11 self-assigned this Aug 9, 2022
chu11 added a commit to chu11/flux-core that referenced this issue Aug 10, 2022
Problem: A dirty cache entry has to potential to be added onto the
flush list twice.  This double addition can lead to list corruption.
The observed side effect was a list that was shortened and no longer
accurate with respects to the `acct_dirty` counter.  This could lead to
hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Check if the cache entry is already on the flush list before
adding it.

Fixes flux-framework#4482
chu11 added a commit to chu11/flux-core that referenced this issue Aug 10, 2022
Problem: A dirty cache entry has to potential to be added onto the
flush list twice.  This double addition can lead to list corruption.
The observed side effect was a list that was shortened and no longer
accurate with respects to the `acct_dirty` counter.  This could lead to
hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Remove the cache entry from the flush list before adding it.
The remove is a no-op if it is not already on a list.  Add regression
test.

Fixes flux-framework#4482
chu11 added a commit to chu11/flux-core that referenced this issue Aug 10, 2022
Problem: A dirty cache entry has to potential to be added onto the
flush list twice.  This double addition can lead to list corruption.
The observed side effect was a list that was shortened and no longer
accurate with respects to the `acct_dirty` counter.  This could lead to
hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Remove the cache entry from the flush list before adding it.
The remove is a no-op if it is not already on a list.  Add regression
test.

Fixes flux-framework#4482
chu11 added a commit to chu11/flux-core that referenced this issue Aug 10, 2022
Problem: A dirty cache entry has to potential to be added onto the
flush list twice.  This double addition can lead to list corruption.
The observed side effect was a list that was shortened and no longer
accurate with respects to the `acct_dirty` counter.  This could lead to
hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Remove the cache entry from the flush list before adding it.
The remove is a no-op if it is not already on a list.  Add regression
test.

Fixes flux-framework#4482
chu11 added a commit to chu11/flux-core that referenced this issue Aug 11, 2022
Problem: A dirty cache entry has to potential to be added onto the
content cache's flush list twice.  This double addition can lead to list
corruption.  The observed side effect was a list that was shortened and no
longer accurate with respect to the `acct_dirty` counter.  This could lead
to hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Remove the cache entry from the flush list before adding it.
The remove is a no-op if it is not already on a list.

Fixes flux-framework#4482
@mergify mergify bot closed this as completed in #4484 Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant