Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

fix: concurrency insert between pending and delivered #467

Merged
merged 10 commits into from
Mar 19, 2024

Conversation

Freyskeyd
Copy link
Member

@Freyskeyd Freyskeyd commented Feb 27, 2024

Description

This PR is covering a corner case which involve the delivery of a prev certificate and a child certificate.

The scenario is this one:

  • A prev certificate (C1) has been delivered (by the broadcast) and need to be persisted
    The persist method will hold a lock while performing multiple insert/delete to avoid
    insert race condition.
  • At the same time, another node is sending a certificate (C2) which have C1 as prev_id.
    C2 is looking at the storage to find if the prev_id C1 is delivered but find nothing as
    the persist method is still working at creating the WriteBatch. It led the node to put
    C2 in the precedence_pool waiting for C1 to be delivered while it is in fact already
    delivered.

To avoid that and as a first step, when trying to insert a certificate in the pending pool,
The node will try to acquire a lock guard on the certificate but also on the prev_id.

PR Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added or updated tests that comprehensively prove my change is effective or that my feature works

@Freyskeyd Freyskeyd force-pushed the chore/improve-delivery-time branch 3 times, most recently from 707fe93 to 4217661 Compare February 29, 2024 16:09
Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 87.87879% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 71.69%. Comparing base (a5299c8) to head (37d59d6).

Files Patch % Lines
...es/topos-p2p/src/runtime/handle_event/peer_info.rs 55.55% 8 Missing ⚠️
crates/topos-tce-storage/src/fullnode/mod.rs 89.47% 2 Missing ⚠️
crates/topos-p2p/src/behaviour/gossip.rs 50.00% 1 Missing ⚠️
crates/topos-tce/src/app_context/api.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #467      +/-   ##
==========================================
- Coverage   71.87%   71.69%   -0.18%     
==========================================
  Files         225      227       +2     
  Lines       12675    12743      +68     
==========================================
+ Hits         9110     9136      +26     
- Misses       3565     3607      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Freyskeyd Freyskeyd force-pushed the fix/concurrency-insert branch 3 times, most recently from 7e0db52 to 0dc637d Compare March 2, 2024 18:35
@Freyskeyd Freyskeyd force-pushed the chore/improve-delivery-time branch from 4217661 to 22cf5ba Compare March 6, 2024 20:56
@Freyskeyd Freyskeyd force-pushed the fix/concurrency-insert branch 2 times, most recently from 93e9db3 to 5b1ad56 Compare March 7, 2024 10:17
@Freyskeyd Freyskeyd force-pushed the chore/improve-delivery-time branch 3 times, most recently from 89f59ca to ae3e1b5 Compare March 12, 2024 14:32
@Freyskeyd Freyskeyd force-pushed the chore/improve-delivery-time branch 3 times, most recently from 5a97b89 to dc67f92 Compare March 12, 2024 16:29
Base automatically changed from chore/improve-delivery-time to main March 12, 2024 18:20
@Freyskeyd Freyskeyd force-pushed the fix/concurrency-insert branch 3 times, most recently from b0166e1 to e47bbfa Compare March 13, 2024 10:23
@Freyskeyd Freyskeyd marked this pull request as ready for review March 13, 2024 10:23
@Freyskeyd Freyskeyd requested a review from a team as a code owner March 13, 2024 10:23
Copy link
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thank you for the high quality comments explaining what is going on.

Beyond this PR, I think this need for strong synchronization is unfortunate. It makes the code less clear and introduces delays that could harm performance. Not sure how we could do better but I'll try to give it some thought. I have a gut feeling that treating pending and delivered certificates separately is part of the problem, but I'll have to think more on this.

crates/topos-tce-storage/src/fullnode/mod.rs Outdated Show resolved Hide resolved
crates/topos-tce-storage/src/tests/pending_certificates.rs Outdated Show resolved Hide resolved
crates/topos-tce-storage/src/validator/mod.rs Outdated Show resolved Hide resolved
crates/topos-tce-storage/src/validator/mod.rs Outdated Show resolved Hide resolved
crates/topos-tce-storage/src/validator/mod.rs Outdated Show resolved Hide resolved
@Freyskeyd Freyskeyd force-pushed the fix/concurrency-insert branch 3 times, most recently from a440b18 to 6b299e1 Compare March 18, 2024 17:43
Signed-off-by: Simon Paitrault <[email protected]>
Signed-off-by: Simon Paitrault <[email protected]>
@Freyskeyd Freyskeyd merged commit bd5e3f5 into main Mar 19, 2024
22 checks passed
@Freyskeyd Freyskeyd deleted the fix/concurrency-insert branch March 19, 2024 09:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants