Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Parallel Indexer initialization issue [DPP-542] #10889

Merged
merged 1 commit into from
Sep 15, 2021

Conversation

nmarton-da
Copy link
Contributor

RCA: if at parallel indexer initialization some error happening, then a promise never completes, which causes an initialization future never complete
Expected: failure should be propagated, and recovering indexer should retry 10 seconds later
Actual: failure not propagated, a zombie future freezes initialization, preventing retries
Impact: this is a corner case - if no problems at indexer initialization, the issues is not surfacing

  • Extracts critical logic into helper function initializeHandle
  • Adds unit tests for initializeHandle
  • Fixes issue by completing the promise in all cases

CHANGELOG_BEGIN
CHANGELOG_END

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

@nmarton-da nmarton-da requested review from meiersi-da and a team as code owners September 14, 2021 20:54
Copy link
Contributor

@mziolekda mziolekda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good

RCA: if at parallel indexer initialization some error happening, then a promise never completes, which causes an initialization future never complete
Expected: failure should be propagated, and recovering indexer should retry 10 seconds later
Actual: failure not propagated, a zombie future freezes initialization, preventing retries
Impact: this is a corner case - if no problems at indexer initialization, the issues is not surfacing

* Extracts critical logic into helper function initializeHandle
* Adds unit tests for initializeHandle
* Fixes issue by completing the promise in all cases

CHANGELOG_BEGIN
CHANGELOG_END
@nmarton-da nmarton-da force-pushed the dpp-542-fix-initialization-freeze-in-indexer branch from 8681736 to 4cfbc8b Compare September 15, 2021 09:39
@mergify mergify bot merged commit 0c32e3b into main Sep 15, 2021
@mergify mergify bot deleted the dpp-542-fix-initialization-freeze-in-indexer branch September 15, 2021 11:12
fabiotudone-da pushed a commit that referenced this pull request Sep 15, 2021
RCA: if at parallel indexer initialization some error happening, then a promise never completes, which causes an initialization future never complete
Expected: failure should be propagated, and recovering indexer should retry 10 seconds later
Actual: failure not propagated, a zombie future freezes initialization, preventing retries
Impact: this is a corner case - if no problems at indexer initialization, the issues is not surfacing

* Extracts critical logic into helper function initializeHandle
* Adds unit tests for initializeHandle
* Fixes issue by completing the promise in all cases

CHANGELOG_BEGIN
CHANGELOG_END
akrmn added a commit that referenced this pull request Sep 15, 2021
Manual release process. @akrmn is in charge of this release.

Commit log:
```
b5648c0 Make `CommandTracker` distinguish submissions of the same command using `submissionId` [KVL-1104] (#10868)
b4328b3 ledger-api-test-tool - Add conformance test for parallel command deduplication using CommandSubmissionService [KVL-1099] (#10869)
0c32e3b Fix Parallel Indexer initialization issue [DPP-542] (#10889)
b3e4975 Chore slow migration error removal (#10886)
e4cce53 Create a new grpc exception for each duplicate result [KVL-1099] (#10887)
a939594 Sandbox on H2 - performance improvements for the append-only schema [DPP-600] (#10888)
9a1a101 Increase timeout for heavy tests in ParticipantPruningIT (#10894)
9093c6c Improve wording for the active contracts service description (#10880)
c12f546 Document #10780 (#10781)
5814f6a update NOTICES file (#10893)
38227a8 [Ledger API error codes] ErrorCode enrichments [DPP-591] (#10874)
e7c443a enable json index for all fields that are queried with JSON_EXISTS (#10658)
6c1c02a document complete authorized auth0 setup (#10881)
e4230dc Do not drop generated `submissionId`s in `GrpcCommandService` [KVL-1104] (#10882)
b4750a4 trigger reach auth on internal network (#10844)
7908083 add Auth0 support to create-daml-app (#10673)
b86490c Add @adriaanm-da to the release rotation (#10872)
9e918c3 Update trigger-service docs to use --dar option in the corresponding example (#10877)
49a9556 [docs] Fix minor typo in doc of exerciseByKey in TS. (#10863)
f7c07ea interfaces: scala protobuf encoder (#10878)
be4e064 Ledger API Test Tool: support `--additional` tests [KVL-1100] (#10829)
97e14de [Ledger API error codes] ErrorCode interfaces and generator [DPP-591] (#10836)
6dcdaa4 [DPP-589] Add CLI flag to select minimum enabled TLS version (#10854)
1fc58d9 Navigator customviews highlight and choices button, apply custom theme on the login screen (#10859)
6faddc9 Update Daml Documentation to reflect command deduplication related changes [KVL-1094] (#10852)
7c29eee Cleanup normalize from svalue (#10873)
053f22a Convert SValue to Value, and normalize, in a single code pass. (#10828)
37a1cb2 compatibility-tests - Exclude CommandDeduplicationIT from running for existing 1.17 snapshots (#10866)
dfae9f6 Command deduplication - better support for different deduplication modes in conformance tests [KVL-1099] (#10864)
6f151e2 save kibana exports (#10861)
99f0362 [JSON-API] drop package token doc changes (#10865)
b50bb8e Populate `definite_answer` in `ApiException` [KVL-1004] (#10832)
a471225 LF: Add missing collision check for type synonyms (#10841)
1e1c452 LF: drop ad-hoc FrontStack builders (#10839)
8f5b4fa interfaces: protobuf encoder haskell side (#10850)
63f6678 ParticipantPruningIT divulgence test fixes to avoid flakiness on canton (#10860)
8a9d19a Command deduplication - KV conformance test for usage of max deduplication duration [KVL-1098] (#10846)
24fff88 LF: Simplify TransactionBuilder (#10753)
9a4c9df Implement LF desugaring of interface definitions (#10834)
2aaf601 interfaces: protobuf decoder haskell side (#10849)
6dc769b interfaces: lf typechecker implementation (#10843)
d9178d2 Clarify version usage in test tool exclusion docs (#10858)
c113954 Clarify docs for test tool exclusions (#10855)
8c9edd8 es cluster tweaks (#10853)
842c5b1 Drop early access notice from profiler docs (#10856)
7c47aca Improvements to wording in ledger-api protobuf docs (#10851)
cff0358 ledger-api: Remove unimplemented fields [KVL-1094] (#10822)
dcec6ea kvutils: Populate `definite_answer` in rejections [KVL-1004] (#10801)
1c4f173 Command deduplication - kvutils - Always use max deduplication duration as deduplication period [KVL-1098] (#10824)
567fe43 tweak trigger-service docs (#10845)
fb5ab5d setvar doesn't like new lines in assignment, refactor (#10842)
7225c04 [docs] Replace AdoptOpenJDK suggestion by Adoptium (#10837)
6a9c8a6 release 1.17.0-snapshot.20210910.7786.0.976ca400 (#10838)
6ed2124 LF: clean up useless version tests. (#10833)
85f6f36 Modify the name of the secrets-url CLI flag to tls-secrets-url [DPP-604] (#10840)
d809fd9 [JSON-API] surrogate template id cache (#10806)
```
Changelog:
```

- [Sandbox] - Added a CLI parameter for configuring the number of connections in the database connection pool used for serving ledger API requests
[Docs] Improved description of the purpose and usage of the active contracts service
[Docs/JSON API] documented 256B limitation of Oracle query store
- The Trigger Service can now accept separate `--auth-internal` and
  `--auth-external` CLI arguments, where `--auth-internal` is the
  address used by the Trigger Service to reach the Auth Middleware
  directly, and `--auth-external` is the address the Trigger Service uses
  in generated URLs sent back to the client. The `--auth` option remains
  and keeps working as before, setting both internal and external
  addresses to the same given value.
- The `create-daml-app` template now includes support for a third
  authentication scheme (in addition to the existing "dev mode" and Daml
  Hub support): Auth0.
Sandbox: Add CLI flag to select minimum enabled TLS version for ledger API server.
- [Navigator] The currently selected custom view is now highlighted on the sidebar

kvutils - committer side deduplication always uses max_deduplication_duration + min_skew as a deduplication period for all the requests.
Modify the name of the secrets-url CLI flag to tls-secrets-url.
```

changelog_begin
changelog_end
fabiotudone-da added a commit that referenced this pull request Oct 5, 2021
…[KVL-1057] (#10901)

* Rename Completion.deduplication_time to deduplication_duration

CHANGELOG_BEGIN
CHANGELOG_END

* Breaking protobuf change: regenerate `buf` image

Breaking-Proto: true

* Create a new grpc exception for each duplicate result [KVL-1099] (#10887)

* Create a new grpc exception for each duplicate result

The metadata in the exception is not thread safe, and when being converted into server headers netty.Utils.convertServerHeaders, it calls discardAll which mutates the metadata. Because this was reused for all duplicate exceptions then we got corrupted metadata.

CHANGELOG_BEGIN

CHANGELOG_END

* Do not call duplicate command exception twice

* Chore slow migration error removal (#10886)

* Avoid slow-progress timeout StoppedProgressing to avoid flakiness

CHANGELOG_BEGIN
CHANGELOG_END

* Make schema migration retries and backoff configurable

* Review feedback - use RetryStrategy.constant instead

* Remove unused tailrec import

* Simplifications and runF by Marton

* Rename config options from retry to attempt and default to 30 attempts

* Fix Parallel Indexer initialization issue [DPP-542] (#10889)

RCA: if at parallel indexer initialization some error happening, then a promise never completes, which causes an initialization future never complete
Expected: failure should be propagated, and recovering indexer should retry 10 seconds later
Actual: failure not propagated, a zombie future freezes initialization, preventing retries
Impact: this is a corner case - if no problems at indexer initialization, the issues is not surfacing

* Extracts critical logic into helper function initializeHandle
* Adds unit tests for initializeHandle
* Fixes issue by completing the promise in all cases

CHANGELOG_BEGIN
CHANGELOG_END

* ledger-api-test-tool - Add conformance test for parallel command deduplication using CommandSubmissionService [KVL-1099] (#10869)

* Extract deduplication "features" into a configuration to be used around the tests.
Better naming for assertions that support sync and async deduplication

CHANGELOG_BEGIN

CHANGELOG_END

* Fix broken test and use consistency for tests

* ledger-api-test-tool - Add conformance test for parallel command deduplication

CHANGELOG_BEGIN
CHANGELOG_END

* Add import for 2.12 compat

* Add silencer plugin

* Split parallel command deduplication scenario into it's own test suite

* Add the parallel command deduplication test to append only ledgers

* Run parallel command deduplication tests for append only ledgers

* Apply suggestions from code review

Co-authored-by: fabiotudone-da <[email protected]>

* Code review renames

* Add compat import

* Run the test concurrently

* Further rename completions' `deduplication_time` to `*_duration` in participant's codebase

CHANGELOG_BEGIN
CHANGELOG_END

* Re-compute sha256s of migrations

* Fix expected error message

* Improve consistency of "deduplication time" Vs. "deduplication duration" references

* Revert Oracle schema whitespace change

Co-authored-by: Gerolf Seitz <[email protected]>
Co-authored-by: nicu-da <[email protected]>
Co-authored-by: Oliver Seeliger <[email protected]>
Co-authored-by: Marton Nagy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants