Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4902 Add mutex for concurrent access in GetLatestLedgerSequence #4903

Merged
merged 3 commits into from
Jun 13, 2023

Conversation

urvisavla
Copy link
Contributor

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Add a mutex in CaptiveStellarCore to ensure concurrent read access to GetLatestLedgerSequence method. The mutex ensures that no write operations occur concurrently with read operations. This is required because of read access of c.nextLedger when posting captive_stellar_core_latest_ledger metric while being concurrently written in handleMetaPipeResult.

This issue does not occur in master because we're not posting captive_stellar_core_latest_ledger metric in master. However, GetLatestLedgerSequence does get called from outside in maybeReapLookupTables in master but it's in the same thread as the ingestion so the mutex is not necessary.

Why

Fixes data race reported in #4902

Known limitations

Locking can have an impact on the performance but we're not expecting any noticeable impact in this case.

@urvisavla urvisavla marked this pull request as ready for review June 9, 2023 18:19
Copy link
Contributor

@tamirms tamirms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good but can you fix the lint errors in ingest/ledgerbackend/captive_core_backend_test.go ?

@urvisavla urvisavla force-pushed the 4902/fix-race-condition branch 2 times, most recently from a72f84b to 1000601 Compare June 9, 2023 19:50
@urvisavla urvisavla force-pushed the 4902/fix-race-condition branch from 1000601 to 3ed7f10 Compare June 9, 2023 20:06
@urvisavla
Copy link
Contributor Author

looks good but can you fix the lint errors in ingest/ledgerbackend/captive_core_backend_test.go ?

Fixed linter errors.

c.prepared = &ran
c.nextLedger = c.roundDownToFirstReplayAfterCheckpointStart(from)
c.lastLedger = &to
c.ledgerSequenceLock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should use go's defer to run the Unlock, to follow the idiomatic go approach to achieve 'finally' semantics and ensure lock is not left inconsistent/deadlocked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@urvisavla urvisavla requested a review from sreuland June 9, 2023 22:40
Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks for responding to suggestion.

@urvisavla urvisavla merged commit e20e8bc into stellar:soroban-xdr-next Jun 13, 2023
tsachiherman added a commit that referenced this pull request Jun 13, 2023
* updated core git ref for tests (#4879)

* update toml to support LIMIT_TX_QUEUE_SOURCE_ACCOUNT (#4882)

* horizon/ingest/processors: SAC storage entry by different key name (#4884)

* services/horizon: Reenable InvokeHostFunction integration tests (#4887)

* horizon: 4446/supress core timeout error (#4894)

* services/horizon: Suppress Core timeout error (#4860)
* services/horizon: Protect 'currentState' variable using Mutex to prevent race condition. (#4889)
* services/horizon: Modify the tests due to changes in the Begin function signature.

* xdr: changes for auth and removal of mutli-invoke (wip)

* services/horizon: Reenable SAC integration tests (#4891)

* services/horizon: optionally add soroban-rpc to integration tests (#4892)

* Update core image

* Update stellar-xdr next commit

* Update horizon ingestion

* update txnbuild

* Updates

* Update fmt

* fix txnbuild tests

* updates

* Update horizon tests

* add ingest test

* Update operation processor test

* Update tests

* Update tests

* Remove sac test changes to resolve merge conflicts for now

* Formatting

* 4902 Add mutex for concurrent access in GetLatestLedgerSequence (#4903)

* Update tests

* Update sac test

* update tests

---------

Co-authored-by: shawn <[email protected]>
Co-authored-by: Tsachi Herman <[email protected]>
Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: urvisavla <[email protected]>
@tsachiherman tsachiherman mentioned this pull request Jun 13, 2023
7 tasks
@urvisavla urvisavla linked an issue Jun 15, 2023 that may be closed by this pull request
tsachiherman added a commit that referenced this pull request Jul 11, 2023
* all: enforce simplified Golang code (#4852)

* Update completed sprint on issue/pr closed (#4857)

* Bump core image to latest stable release v19.10.0

* Add a simple test for asset case sorting in ascii (#4876)

* services/horizon: Suppress Core timeout error (#4860)

Suppress Core timeout error when ingestion state machine is in build state.

* Update CHANGELOG.md for latest release (#4828)

* Bump core image to latest release v19.11.0 (#4885)

* services/horizon: Protect 'currentState' variable using Mutex to prevent race condition. (#4889)

* services/horizon: Update default for  --captive-core-use-db to true (#4877)

* 4856: Update default for  --captive-core-use-db to true
* Update CHANGELOG.md

* xdr: changes for auth and removal of mutli-invoke (wip) (#4900)

* updated core git ref for tests (#4879)

* update toml to support LIMIT_TX_QUEUE_SOURCE_ACCOUNT (#4882)

* horizon/ingest/processors: SAC storage entry by different key name (#4884)

* services/horizon: Reenable InvokeHostFunction integration tests (#4887)

* horizon: 4446/supress core timeout error (#4894)

* services/horizon: Suppress Core timeout error (#4860)
* services/horizon: Protect 'currentState' variable using Mutex to prevent race condition. (#4889)
* services/horizon: Modify the tests due to changes in the Begin function signature.

* xdr: changes for auth and removal of mutli-invoke (wip)

* services/horizon: Reenable SAC integration tests (#4891)

* services/horizon: optionally add soroban-rpc to integration tests (#4892)

* Update core image

* Update stellar-xdr next commit

* Update horizon ingestion

* update txnbuild

* Updates

* Update fmt

* fix txnbuild tests

* updates

* Update horizon tests

* add ingest test

* Update operation processor test

* Update tests

* Update tests

* Remove sac test changes to resolve merge conflicts for now

* Formatting

* 4902 Add mutex for concurrent access in GetLatestLedgerSequence (#4903)

* Update tests

* Update sac test

* update tests

---------

Co-authored-by: shawn <[email protected]>
Co-authored-by: Tsachi Herman <[email protected]>
Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: urvisavla <[email protected]>

* services/horizon: Improve error handling for when stellar-core crashes (#4893)

* Parse LIMIT_TX_QUEUE_SOURCE_ACCOUNT in core config

* updated changelog for 2.26.0 release notes

* Pinning and updates golang and ubuntu images

* xdr: update per 7b403105 (#4923)

* Simplify ScError.Equals

* Add missing case for ScVal.Equals

* Add LedgerEntry.SetContractData and LedgerEntry.SetContractCode methods

* xdr updates

* fix formatting

* update

* update transfer_event_xdr.bin

* update

* update

* update

* bugfix

* update

* update

* update

* handle nil entries

* fix few linting issues

* additional linting

* Fix missing err check

* fix missing nil check

* Minor code checker fix

* Fix minor naming

* fix hard-coded values.

* warn and don't set expiration ledger

* Warn and don't set expirationLedger

* update new xdr

* update per peer review.

* Add effect for bumpFootprintExpirationOp

* Fix govet-caught bug

* update per 7b403105788e33044e089c4c2f957df8ddabaca8

* update fmt

* fix unit test

* Update core image

---------

Co-authored-by: Paul Bellamy <[email protected]>
Co-authored-by: Simon Chow <[email protected]>

* services/horizon: Fix ledger endpoint url in HAL (#4928)

* Don't crash on LedgerCloseMetaV2 ingestion (#4927)

* xdr: update per xdr version e372df9 (#4930)

* update

* update

* update

* bugfix

* update

* update per linter

* update per peer review.

* Update to use 19.11.1-1349.fae91b092.focal~soroban

* Goreplay middleware (#4932)

* tools/goreplay-middleware: Add goreplay middleware
* Fix linter errors

---------
Co-authored-by: Bartek Nowotarski <[email protected]>

* Update to final core preview 10 image

* all: Fix improper use of errors.Wrap (#4926)

* all: Fix improper use of errors.Wrap

`errors.Wrap` method returns nil if the first argument passed is also nil.
If `errors.Wrap` is copied from a condition like `if err != nil` to another
one which  also returns `errors.Wrap` but does not overwrite `err` before
the returned value will always be `nil`.

* Update services/horizon/internal/db2/history/claimable_balances.go

Co-authored-by: George <[email protected]>

---------

Co-authored-by: George <[email protected]>
Co-authored-by: Tsachi Herman <[email protected]>

* fix apt repo reference to focal now (#4929)

* LedgerChangeReader: Support State Expiration & Eviction (#4941)

* Simplify ScError.Equals

* Add missing case for ScVal.Equals

* Add LedgerEntry.SetContractData and LedgerEntry.SetContractCode methods

* update generated xdr to 0f5e556

* Add LedgerCloseMetaV2 support

* Update ledgerTransaction.GetOperationEvents

* Make LedgerChangeReader emit evictions

* Fixing up after merge

* Add test for LedgerChangeReader extensions and evictions

* Fix typo

* Add xdr.LedgerEntryData.ExpirationLedgerSeq helper

* review feedback

* merge latest master to soroban-xdr-next-next (#4937)

* all: enforce simplified Golang code (#4852)

* Update completed sprint on issue/pr closed (#4857)

* Bump core image to latest stable release v19.10.0

* Add a simple test for asset case sorting in ascii (#4876)

* services/horizon: Suppress Core timeout error (#4860)

Suppress Core timeout error when ingestion state machine is in build state.

* Update CHANGELOG.md for latest release (#4828)

* Bump core image to latest release v19.11.0 (#4885)

* services/horizon: Protect 'currentState' variable using Mutex to prevent race condition. (#4889)

* services/horizon: Update default for  --captive-core-use-db to true (#4877)

* 4856: Update default for  --captive-core-use-db to true
* Update CHANGELOG.md

* services/horizon: Improve error handling for when stellar-core crashes (#4893)

* Parse LIMIT_TX_QUEUE_SOURCE_ACCOUNT in core config

* updated changelog for 2.26.0 release notes

* Pinning and updates golang and ubuntu images

* services/horizon: Fix ledger endpoint url in HAL (#4928)

* Goreplay middleware (#4932)

* tools/goreplay-middleware: Add goreplay middleware
* Fix linter errors

---------
Co-authored-by: Bartek Nowotarski <[email protected]>

* all: Fix improper use of errors.Wrap (#4926)

* all: Fix improper use of errors.Wrap

`errors.Wrap` method returns nil if the first argument passed is also nil.
If `errors.Wrap` is copied from a condition like `if err != nil` to another
one which  also returns `errors.Wrap` but does not overwrite `err` before
the returned value will always be `nil`.

* Update services/horizon/internal/db2/history/claimable_balances.go

Co-authored-by: George <[email protected]>

---------

Co-authored-by: George <[email protected]>
Co-authored-by: Tsachi Herman <[email protected]>

* fix apt repo reference to focal now (#4929)

* fixed go fmt on bindata

* fixed merge conflict snippet

* fixed manual merge commit omition

---------

Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: Paul Bellamy <[email protected]>
Co-authored-by: Mehmet <[email protected]>
Co-authored-by: mlo <[email protected]>
Co-authored-by: urvisavla <[email protected]>
Co-authored-by: stellarsaur <[email protected]>
Co-authored-by: Molly Karcher <[email protected]>
Co-authored-by: Bartek Nowotarski <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Tsachi Herman <[email protected]>

* Refactor `xdr.LedgerEntry.LedgerKey` method (#4942)

* Simplify ScError.Equals

* Add missing case for ScVal.Equals

* Add LedgerEntry.SetContractData and LedgerEntry.SetContractCode methods

* update generated xdr to 0f5e556

* Add LedgerCloseMetaV2 support

* Update ledgerTransaction.GetOperationEvents

* Make LedgerChangeReader emit evictions

* Fixing up after merge

* Add test for LedgerChangeReader extensions and evictions

* Fix typo

* Add xdr.LedgerEntryData.ExpirationLedgerSeq helper

* review feedback

* Refactor LedgerEntry.LedgerKey method to avoid panics, and only have a single copy

* Fix govet.sh

* Fixing govet

* Fix bug

* Remove unneeded code bits

* s/marshalling/marshaling

* Review feedback

* integration tests: Add horizon test support for new bump/restore footprint ops (#4944)

* use the new core image for local integration tests

* Add bump/restoreFootprint ops to the horizon reingestion integration tests

* services/horizon: Remove command line flag --remote-captive-core-url (#4940)

* txnbuild: Make bump and restore footprint soroban operations (#4946)

* integration tests: fix integration tests for preview 10 (#4938)

this pr fixes the invokehostfunciton_test only, sac_test and contract event tests will be addressed in separate, follow-onw pr.

* fix sac tests for preview 10 data model (#4951)

* fix merge issue.

---------

Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: Paul Bellamy <[email protected]>
Co-authored-by: Mehmet <[email protected]>
Co-authored-by: mlo <[email protected]>
Co-authored-by: urvisavla <[email protected]>
Co-authored-by: stellarsaur <[email protected]>
Co-authored-by: chowbao <[email protected]>
Co-authored-by: shawn <[email protected]>
Co-authored-by: Molly Karcher <[email protected]>
Co-authored-by: Shawn Reuland <[email protected]>
Co-authored-by: Simon Chow <[email protected]>
Co-authored-by: Bartek Nowotarski <[email protected]>
Co-authored-by: George <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: race condition found in integration tests
3 participants