-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: enable time travel by default #18854
Conversation
This PR can only be merged after etcd is deprecated. |
Given that SST sync is triggered after collecting barriers, the main issue here is for spilled SSTs, which can happen at any time before barrier collection. In other words, if a spilling happens 1hour before the barrier is completed, commit_epoch will fail. Otherwise if the streaming pipeline is stuck by compaction lag or backpressure with no spilling happens, commit_epoch can still succeed even if barrier latency is high. However, we do see in production that when join amp is high for some use cases, spilling can happen and barrier latency can go higher than 1 hour occasionally. Prior to this PR, the occasional spilling won't be an issue but after this PR, I believe their clusters will enter a recovery loop, which makes me a little bit concerned. Brainstorm idea: now that we have bidi-streaming RPC for barrier injection/collection and the spilled SST information is maintained in a single place in |
Sounds good to me. The marked SSTs are filtered out for both GC and timestamp verification of commit_epoch. I'll work on it. Before that is ready, what about increasing min_sst_retention_time_sec to 3 hours? Note that it can still be modified on demand for exisiting clusters, when larger barrier latency is expected. |
SGTM |
The time_travel_enabled check will ignore |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9425213 | Triggered | Generic Password | 6940309 | e2e_test/source/tvf/postgres_query.slt | View secret |
9425213 | Triggered | Generic Password | 6940309 | e2e_test/source/tvf/postgres_query.slt | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR modifies several configurations to enable time travel by default, ensuring regular batch query remains functional even if there is a missing recent version in compute node.
min_sst_retention_time_sec
is reduced from 86400 to 10800 (3 hour). Since after enabling time travel, the deletion of SSTs is entirely dependent on full GC. A smallermin_sst_retention_time_sec
permits the earlier deletion of SST. However it cannot be too small, because it also determines the maximum permitted barrier latency and compaction latency.full_gc_interval_sec
is reduced from 86400 to 600, aiming to trigger full gc more frequently.time_travel_retention_ms
is increased from 0 to 600000 (10 minutes).min_sst_retention_time_sec
will keep recent SSTs anyway,time_travel_retention_ms
can actually be as large asmin_sst_retention_time_sec
without introducing object store overhead. However it will introduce meta store overhead.min_sst_retention_time_sec
.A new configuration
time_travel_version_cache_capacity
is added.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.