Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ENTRY_ABORTED assertion in sendClientOldEntry() #1903

Conversation

rousskov
Copy link
Contributor

FATAL: assertion failed: client_side_reply.cc:392:
"!EBIT_TEST(http->storeEntry()->flags, ENTRY_ABORTED)"

The replaced assertion was wrong because a stale entry may be aborted
while we are revalidating it. The exact real-world conditions that
triggered this assertion are unknown, but many conditions lead to
aborted entries. The assertion can be triggered in lab tests using a hit
transaction that collapses on a miss transaction that runs into storage
problems (e.g., a memory cache that ran out of usable space).

Also adjusted cacheHit() to avoid similar problems. We have not
reproduced them, but no code maintains the asserted invariant on the
cacheHit() path either. Moreover, async-called cacheHit() initiates
processExpired() that leads to problematic sendClientOldEntry() call, so
seeing an aborted StoreEntry at cacheHit() time is probably a matter of
sufficient concurrency levels and asynchronous callback delays.

    FATAL: assertion failed: client_side_reply.cc:392:
    "!EBIT_TEST(http->storeEntry()->flags, ENTRY_ABORTED)"

The replaced assertion was wrong because a stale entry may be aborted
while we are revalidating it. The exact real-world conditions that
triggered this assertion are unknown, but many conditions lead to
aborted entries. The assertion can be triggered in lab tests using a hit
transaction that collapses on a miss transaction that runs into storage
problems (e.g., a memory cache that ran out of usable space).

Also adjusted cacheHit() to avoid similar problems. We have not
reproduced them, but no code maintains the asserted invariant on the
cacheHit() path either. Moreover, async-called cacheHit() initiates
processExpired() that leads to problematic sendClientOldEntry() call, so
seeing an aborted StoreEntry at cacheHit() time is probably a matter of
sufficient concurrency levels and asynchronous callback delays.
Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have checked other ENTRY_ABORTED assertions in src/client_side_reply.cc and did not find any additional ones that were similar enough to the two removed ones. Some of the remaining assertions are questionable, but their removal (if they should be removed) requires more work and somewhat different justification.

Copy link
Contributor

@kinkie kinkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rousskov rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Sep 18, 2024
squid-anubis pushed a commit that referenced this pull request Sep 19, 2024
    FATAL: assertion failed: client_side_reply.cc:392:
    "!EBIT_TEST(http->storeEntry()->flags, ENTRY_ABORTED)"

The replaced assertion was wrong because a stale entry may be aborted
while we are revalidating it. The exact real-world conditions that
triggered this assertion are unknown, but many conditions lead to
aborted entries. The assertion can be triggered in lab tests using a hit
transaction that collapses on a miss transaction that runs into storage
problems (e.g., a memory cache that ran out of usable space).

Also adjusted cacheHit() to avoid similar problems. We have not
reproduced them, but no code maintains the asserted invariant on the
cacheHit() path either. Moreover, async-called cacheHit() initiates
processExpired() that leads to problematic sendClientOldEntry() call, so
seeing an aborted StoreEntry at cacheHit() time is probably a matter of
sufficient concurrency levels and asynchronous callback delays.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Sep 19, 2024
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 20, 2024
kinkie pushed a commit to kinkie/squid that referenced this pull request Oct 12, 2024
    FATAL: assertion failed: client_side_reply.cc:392:
    "!EBIT_TEST(http->storeEntry()->flags, ENTRY_ABORTED)"

The replaced assertion was wrong because a stale entry may be aborted
while we are revalidating it. The exact real-world conditions that
triggered this assertion are unknown, but many conditions lead to
aborted entries. The assertion can be triggered in lab tests using a hit
transaction that collapses on a miss transaction that runs into storage
problems (e.g., a memory cache that ran out of usable space).

Also adjusted cacheHit() to avoid similar problems. We have not
reproduced them, but no code maintains the asserted invariant on the
cacheHit() path either. Moreover, async-called cacheHit() initiates
processExpired() that leads to problematic sendClientOldEntry() call, so
seeing an aborted StoreEntry at cacheHit() time is probably a matter of
sufficient concurrency levels and asynchronous callback delays.
@rousskov rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants