-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Skewed Bootstrap timings after applying cdc_write_post_apply_metadata and cdc_immediate_transaction_cleanup gflags #21741
Labels
2024.1 Backport Required
2024.1.1_blocker
area/cdcsdk
CDC SDK
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
shamanthchandra-yb
added
area/docdb
YugabyteDB core features
priority/high
High Priority
status/awaiting-triage
Issue awaiting triage
labels
Mar 29, 2024
es1024
added a commit
that referenced
this issue
Jun 20, 2024
…et bootstrap Summary: When CDC streams are lagging, there may be a large number of intent SST files whose contents have all been applied already, but must be maintained for CDC purposes. We work around the performance implications of having a large number of these files by filtering out SST files by min running hybrid time (D33131 / 97536b4), but this approach does not work as is for bootstrap, since min running hybrid time is not currently determined until bootstrap has finished. This change adds the saving of min running hybrid time periodically with retryable requests state, and then loads this min running hybrid time into the transaction participant early in bootstrap, to allow the SST file filter used in D33131 / 97536b4 to be used at bootstrap time as well. To avoid reintroducing the issue introduced by D34389 / 2458c08, this diff also removes the requirement that min running hybrid time must not be set before bootstrap, by moving the requirement to `transactions_loaded_`. **Upgrade/Rollback safety:** This change is not guarded by a gflag or autoflag. If the newly added min running hybrid time field is missing (upgrade), we do not apply a filter (the current behavior), and the presence of the optional protobuf field when downgrading is entirely ignored (the old behavior is to unconditionally not apply a filter). There are no correctness issues involved with either applying or not applying the filter, as it is entirely a performance optimization. Jira: DB-10615 Test Plan: Jenkins Reviewers: yyan, qhu Reviewed By: yyan, qhu Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D35639
karthik-ramanathan-3006
pushed a commit
to karthik-ramanathan-3006/yugabyte-db
that referenced
this issue
Jun 24, 2024
…ing tablet bootstrap Summary: When CDC streams are lagging, there may be a large number of intent SST files whose contents have all been applied already, but must be maintained for CDC purposes. We work around the performance implications of having a large number of these files by filtering out SST files by min running hybrid time (D33131 / 97536b4), but this approach does not work as is for bootstrap, since min running hybrid time is not currently determined until bootstrap has finished. This change adds the saving of min running hybrid time periodically with retryable requests state, and then loads this min running hybrid time into the transaction participant early in bootstrap, to allow the SST file filter used in D33131 / 97536b4 to be used at bootstrap time as well. To avoid reintroducing the issue introduced by D34389 / 2458c08, this diff also removes the requirement that min running hybrid time must not be set before bootstrap, by moving the requirement to `transactions_loaded_`. **Upgrade/Rollback safety:** This change is not guarded by a gflag or autoflag. If the newly added min running hybrid time field is missing (upgrade), we do not apply a filter (the current behavior), and the presence of the optional protobuf field when downgrading is entirely ignored (the old behavior is to unconditionally not apply a filter). There are no correctness issues involved with either applying or not applying the filter, as it is entirely a performance optimization. Jira: DB-10615 Test Plan: Jenkins Reviewers: yyan, qhu Reviewed By: yyan, qhu Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D35639
es1024
added a commit
that referenced
this issue
Jun 27, 2024
… time during tablet bootstrap Summary: Original commit: 8b23a4e / D35639 When CDC streams are lagging, there may be a large number of intent SST files whose contents have all been applied already, but must be maintained for CDC purposes. We work around the performance implications of having a large number of these files by filtering out SST files by min running hybrid time (D33131 / 97536b4), but this approach does not work as is for bootstrap, since min running hybrid time is not currently determined until bootstrap has finished. This change adds the saving of min running hybrid time periodically with retryable requests state, and then loads this min running hybrid time into the transaction participant early in bootstrap, to allow the SST file filter used in D33131 / 97536b4 to be used at bootstrap time as well. To avoid reintroducing the issue introduced by D34389 / 2458c08, this diff also removes the requirement that min running hybrid time must not be set before bootstrap, by moving the requirement to `transactions_loaded_`. **Upgrade/Rollback safety:** This change is not guarded by a gflag or autoflag. If the newly added min running hybrid time field is missing (upgrade), we do not apply a filter (the current behavior), and the presence of the optional protobuf field when downgrading is entirely ignored (the old behavior is to unconditionally not apply a filter). There are no correctness issues involved with either applying or not applying the filter, as it is entirely a performance optimization. Jira: DB-10615 Test Plan: Jenkins Reviewers: yyan, qhu Reviewed By: yyan Subscribers: ybase, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36102
es1024
added a commit
that referenced
this issue
Jun 29, 2024
…nts by min running hybrid time during tablet bootstrap" Summary: This reverts commit 717fbc5, which caused CDC tests to start failing, in order to unblock 2024.1 branch. Test Plan: Jenkins: urgent, rebase: 2024.1 Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36255
es1024
added a commit
that referenced
this issue
Jun 29, 2024
…tents by min running hybrid time during tablet bootstrap" Summary: This reverts commit 717fbc5, which caused CDC tests to start failing, in order to unblock 2024.1.1 branch. Test Plan: Jenkins: urgent, rebase: 2024.1.1 Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36261
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2024.1 Backport Required
2024.1.1_blocker
area/cdcsdk
CDC SDK
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-10615
Description
This issue was observed in manual LRU. slack thread could be found in JIRA description.
This is CDC LRU, where CDC was enabled recently to start with.
March 26, 2024:
March 27, 2024:
cdc_write_post_apply_metadata
-> true,cdc_immediate_transaction_cleanup
-> true. Bootstrap timings, on each node was ~7 mins, 40 secondsMarch 28, 2024:
Cannot proceed, the client to [172.151.17.61:7100, 172.151.26.208:7100, 172.151.31.137:7100] has already been closed.
in connector log.In summary, 2 issues:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: