Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deadlock on nested DropHelper #15322

Merged
merged 1 commit into from
Nov 20, 2024
Merged

fix deadlock on nested DropHelper #15322

merged 1 commit into from
Nov 20, 2024

Conversation

msmouse
Copy link
Contributor

@msmouse msmouse commented Nov 19, 2024

Description

When droping is slow and DropHelpers are nested, AsyncConcurrentDropper::schedule_drop() can deadlock on self.num_tasks_tracker.inc();

How Has This Been Tested?

added unit test, verified failing before fix

Key Areas to Review

Type of Change

  • Bug fix

Which Components or Systems Does This Change Impact?

  • Validator Node

Copy link

trunk-io bot commented Nov 19, 2024

⏱️ 3h 12m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 54m 🟩🟥🟩
rust-smoke-tests 27m 🟩
rust-move-tests 12m 🟩
rust-move-tests 12m 🟩
rust-move-tests 12m 🟩
test-target-determinator 10m 🟩🟩
execution-performance / test-target-determinator 9m 🟩🟩
rust-cargo-deny 9m 🟩🟩🟩🟩🟩
check 8m 🟩🟩
check-dynamic-deps 7m 🟩🟩🟩🟩🟩 (+1 more)
rust-move-tests 6m
rust-move-tests 5m
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
fetch-last-released-docker-image-tag 3m 🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

@msmouse msmouse requested review from zekun000 and a team November 19, 2024 23:21
@ibalajiarun ibalajiarun changed the title fix deadlock on dested DropHelper fix deadlock on nested DropHelper Nov 19, 2024
@msmouse msmouse force-pushed the 1119-alden-fix-nested-drop branch 2 times, most recently from e5f9bc0 to 265c821 Compare November 19, 2024 23:33
@msmouse msmouse enabled auto-merge (squash) November 19, 2024 23:53

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

pub fn wait_for_backlog_drop(&self, no_more_than: usize) {
let _timer = TIMER.timer_with(&[self.name, "wait_for_backlog_drop"]);
self.num_tasks_tracker.wait_for_backlog_drop(no_more_than);
}

fn schedule_drop_impl<V: Send + 'static>(&self, v: V, notif_sender_opt: Option<Sender<()>>) {
if IN_ANY_DROP_POOL.get() {
drop(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to notifier the sender in notif_sender_opt in this case?

@msmouse msmouse disabled auto-merge November 20, 2024 00:29

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 46443515cbfc0276727a62e6c0b9e359bfe7e42d

two traffics test: inner traffic : committed: 14276.97 txn/s, latency: 2787.40 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5428460
two traffics test : committed: 100.04 txn/s, latency: 1736.95 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 11100 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.000, avg: 1.583", "ConsensusProposalToOrdered: max: 0.325, avg: 0.295", "ConsensusOrderedToCommit: max: 0.372, avg: 0.361", "ConsensusProposalToCommit: max: 0.666, avg: 0.656"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.90s no progress at version 1592894 (avg 0.22s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.84s no progress at version 1592892 (avg 8.61s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite compat success on be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d

Compatibility test results for be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d (PR)
1. Check liveness of validators at old version: be3ccc58ef518259d63fab7f3e3613f2a6214268
compatibility::simple-validator-upgrade::liveness-check : committed: 16202.90 txn/s, latency: 2099.73 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 2800 ms), latency samples: 523420
2. Upgrading first Validator to new version: 46443515cbfc0276727a62e6c0b9e359bfe7e42d
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6362.45 txn/s, latency: 4489.45 ms, (p50: 5100 ms, p70: 5400, p90: 5500 ms, p99: 5600 ms), latency samples: 112820
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6461.88 txn/s, latency: 5083.01 ms, (p50: 5500 ms, p70: 5600, p90: 5700 ms, p99: 5700 ms), latency samples: 218880
3. Upgrading rest of first batch to new version: 46443515cbfc0276727a62e6c0b9e359bfe7e42d
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7210.82 txn/s, latency: 3730.72 ms, (p50: 4000 ms, p70: 4700, p90: 5300 ms, p99: 5400 ms), latency samples: 129400
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7835.46 txn/s, latency: 4105.91 ms, (p50: 4300 ms, p70: 4400, p90: 6200 ms, p99: 6400 ms), latency samples: 257680
4. upgrading second batch to new version: 46443515cbfc0276727a62e6c0b9e359bfe7e42d
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11517.06 txn/s, latency: 2391.91 ms, (p50: 2400 ms, p70: 2600, p90: 3700 ms, p99: 3900 ms), latency samples: 198540
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11433.40 txn/s, latency: 2718.32 ms, (p50: 2400 ms, p70: 2600, p90: 5200 ms, p99: 6400 ms), latency samples: 370280
5. check swarm health
Compatibility test for be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d passed
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d

Compatibility test results for be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d (PR)
Upgrade the nodes to version: 46443515cbfc0276727a62e6c0b9e359bfe7e42d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1384.45 txn/s, submitted: 1386.91 txn/s, failed submission: 2.46 txn/s, expired: 2.46 txn/s, latency: 2357.89 ms, (p50: 2100 ms, p70: 2400, p90: 3900 ms, p99: 5400 ms), latency samples: 123760
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1430.17 txn/s, submitted: 1433.81 txn/s, failed submission: 3.64 txn/s, expired: 3.64 txn/s, latency: 2121.03 ms, (p50: 2100 ms, p70: 2400, p90: 3300 ms, p99: 4500 ms), latency samples: 125880
5. check swarm health
Compatibility test for be3ccc58ef518259d63fab7f3e3613f2a6214268 ==> 46443515cbfc0276727a62e6c0b9e359bfe7e42d passed
Upgrade the remaining nodes to version: 46443515cbfc0276727a62e6c0b9e359bfe7e42d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1439.38 txn/s, submitted: 1441.15 txn/s, failed submission: 1.77 txn/s, expired: 1.77 txn/s, latency: 2218.32 ms, (p50: 2100 ms, p70: 2200, p90: 3600 ms, p99: 5000 ms), latency samples: 129980
Test Ok

@msmouse msmouse merged commit e2af7c4 into main Nov 20, 2024
78 of 79 checks passed
@msmouse msmouse deleted the 1119-alden-fix-nested-drop branch November 20, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants