Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: fix content-cache flush list corruption #4484

Merged
merged 2 commits into from
Aug 11, 2022

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Aug 10, 2022

Problem: A dirty cache entry has to potential to be added onto the flush list twice. This double addition can lead to list corruption.
The observed side effect was a list that was shortened and no longer accurate with respects to the acct_dirty counter. This could lead to hangs with content flush, missed flushes to the backing store, and segfault/memory corruption in the worst case.

Solution: Check if the cache entry is already on the flush list before adding it.

Fixes #4482

@chu11
Copy link
Member Author

chu11 commented Aug 10, 2022

nit for debate: is "corruption" the right word to use here for comments / commit messages / descriptions? Lets say we got a list like:

a -> b -> c -> d -> e -> NULL

lets say c gets double appended, I think the result is

a-> b -> c -> NULL

b/c code would set e->next = c, c.prev = e, c.next = NULL, but b/c b's pointer still points to c, we got a shortened list as a result. We've lost pointers to d and e.

it's not really "corrupted" in the usual sense of the word, but I couldn't think of a better one. Like just "damaged"?

@chu11
Copy link
Member Author

chu11 commented Aug 10, 2022

hmmm, one builder failed my regression test b/c of:

  expecting success: run_timeout 120 /usr/src/flux-core-0.42.0-46-g410dedf74/t/issues/t4482-flush-list-corruption.sh
  2022-08-10T20:30:35.659903Z broker.err[0]: rc1.0: /bin/bash: /usr/src/flux-core-0.42.0-46-g410dedf74/_build/sub/t/rc/rc1-issue4482: No such file or directory
  2022-08-10T20:30:35.660029Z broker.err[0]: rc1.0: /usr/src/flux-core-0.42.0-46-g410dedf74/_build/sub/t/rc/rc1-issue4482 Exited (rc=127) 0.0s
  2022-08-10T20:30:35.662138Z broker.err[0]: rc3.0: /bin/bash: /usr/src/flux-core-0.42.0-46-g410dedf74/_build/sub/t/rc/rc3-issue4482: No such file or directory
  2022-08-10T20:30:35.662266Z broker.err[0]: rc3.0: /usr/src/flux-core-0.42.0-46-g410dedf74/_build/sub/t/rc/rc3-issue4482 Exited (rc=127) 0.0s

not sure why every other builder works. Lemme try using FLUX_SOURCE_DIR instead of SHARNESS_TEST_DIRECTORY.

@garlick
Copy link
Member

garlick commented Aug 10, 2022

Well first, excellent job tracking this down, and it seems like the effect is actually pretty insidious.

Here's a thought. In at least one other place In content-cache.c, I see we call

        list_del_from (&cache->lru, &e->list);
        list_add (&cache->lru, &e->list);

would doing that be sufficient rather than creating a new list_ function?

If we need to add a new function, we may want to add it directly to libccan and submit the change upstream. Or if it's really only useful to us, then possibly add it with a "namespace" other than "list_" so it is evident to casual perusers of our code that it's not part of the original class. But if we can do something simple to use the class as designed without adding anything then maybe it's better to do that.

@chu11
Copy link
Member Author

chu11 commented Aug 10, 2022

would doing that be sufficient rather than creating a new list_ function?

Hmmm, I don't think that will specifically work b/c it requires the entry to be on a list already (there's an assert in the ccan code that checks for this fact). but list_del() + list_add_tail() should be sufficient.

If we need to add a new function, we may want to add it directly to libccan and submit the change upstream.

I went down this path just b/c I remember needing the check in the KVS. But doing something similar to what you suggested should be fine.

@chu11 chu11 force-pushed the issue4482_flush_list_corruption branch 2 times, most recently from 7d9d663 to f904744 Compare August 10, 2022 23:29
@chu11
Copy link
Member Author

chu11 commented Aug 10, 2022

re-pushed, doing list_del() and list_add_tail() together instead of the previous solution. so the PR is now just one commit :-)

@chu11 chu11 force-pushed the issue4482_flush_list_corruption branch from f904744 to b15711f Compare August 10, 2022 23:30
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, simple is good.
Just a suggestion for improving the test.

Comment on lines 23 to 38
flux kvs put issue4482A.a="abcdefghijk"
flux kvs put issue4482A.b="lmnopqrstuv"
flux kvs put issue4482A.c="wxyz0123456"
flux kvs put issue4482A.d="7890ABCDEFG"
flux kvs put issue4482A.e="HIJKLMNOPQR"
flux kvs put issue4482A.f="STUVWXYZ!!!"
flux kvs put issue4482A.g="<<<<<:>>>>>"

flux kvs dropcache

flux kvs put issue4482B.a="abcdefghijk"
flux kvs put issue4482B.b="lmnopqrstuv"
flux kvs put issue4482B.c="wxyz0123456"
flux kvs put issue4482B.d="7890ABCDEFG"
flux kvs put issue4482B.e="HIJKLMNOPQR"
flux kvs put issue4482B.f="STUVWXYZ!!!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is doing what it looks like at face value, although it still may work. Those short values will be cached inside the directory entry for "issue4482B" so really what you're doing is creating multiple versions of that directory and the root in the content store.

Suggestion: use flux content store since the problem this pokes at has nothing to do with the kvs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh yeah, you're right my description is not correct, but I the the effect is identical, the multiple versions of the directory are the "data", not the junk I'm writing.

Let me try with flux content store

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review, but I did want to point something out in how the test is organized.

Also, in practice we usually split additional tests into a separate commit (unless code changes break tests) in keeping with the idea that commits should "do one thing". However, I don't feel strongly about that so this is fine with me.


chmod +x t4482.sh

flux start -s 1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test script is generated in place, it might be easier (and keep all test components together) to generate the custom rc1 and rc3 scripts here as well. I'd also hate to proliferate the one-off rc scripts in t/rc and end up with many rc1-issue* files in there in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, you could do away with the rc scripts and just load and unload necessary modules directly in the test script as you are doing with content-sqlite

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be to make the test self-contained as well. It should probably drop the bash -e option and handle errors explicitly then, so that it doesn't bail out leaving modules unloaded for the next test.

chu11 added 2 commits August 10, 2022 21:42
Problem: A dirty cache entry has to potential to be added onto the
content cache's flush list twice.  This double addition can lead to list
corruption.  The observed side effect was a list that was shortened and no
longer accurate with respect to the `acct_dirty` counter.  This could lead
to hangs with content flush, missed flushes to the backing store, and
segfault/memory corruption in the worst case.

Solution: Remove the cache entry from the flush list before adding it.
The remove is a no-op if it is not already on a list.

Fixes flux-framework#4482
Problem: No test covers duplicate content cache entries
being added to the content cache's flush list.

Solution: Add a regression test.
@chu11 chu11 force-pushed the issue4482_flush_list_corruption branch from b15711f to 8c51d6f Compare August 11, 2022 05:45
@chu11
Copy link
Member Author

chu11 commented Aug 11, 2022

re-pushed, cleaned up the test a ton, it looks far simpler / better now, and split it off into its own commit.

@codecov
Copy link

codecov bot commented Aug 11, 2022

Codecov Report

Merging #4484 (8c51d6f) into master (8e53bf5) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4484      +/-   ##
==========================================
- Coverage   83.37%   83.37%   -0.01%     
==========================================
  Files         401      401              
  Lines       67527    67529       +2     
==========================================
- Hits        56303    56301       -2     
- Misses      11224    11228       +4     
Impacted Files Coverage Δ
src/broker/content-cache.c 85.74% <100.00%> (+0.05%) ⬆️
src/common/libterminus/terminus.c 85.82% <0.00%> (-0.25%) ⬇️
src/shell/output.c 76.54% <0.00%> (-0.16%) ⬇️
src/cmd/flux-job.c 87.29% <0.00%> (-0.14%) ⬇️

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 56a8c5e into flux-framework:master Aug 11, 2022
@chu11 chu11 deleted the issue4482_flush_list_corruption branch August 17, 2022 20:29
chu11 added a commit to chu11/flux-core that referenced this pull request Mar 29, 2023
Problem: It would be nice to get a simple answer of what nodes/ranks are
up vs down in the overlay network.

Solution: Support a new flux overlay whatsup subcommand that is modeled
after the whatsup(1) command.

Fixes flux-framework#4484
chu11 added a commit to chu11/flux-core that referenced this pull request Mar 29, 2023
Problem: It would be nice to get a simple answer of what nodes/ranks are
up vs down in the overlay network.

Solution: Support a new flux overlay whatsup subcommand that is modeled
after the whatsup(1) command.

Fixes flux-framework#4484
chu11 added a commit to chu11/flux-core that referenced this pull request Mar 29, 2023
Problem: It would be nice to get a simple answer of what nodes/ranks are
up vs down in the overlay network.

Solution: Support a new flux overlay whatsup subcommand that is modeled
after the whatsup(1) command.

Fixes flux-framework#4484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

broker: content-cache flush list can be corrupted
3 participants