Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TTL v1 meta issue #75428

Closed
28 of 30 tasks
otan opened this issue Jan 24, 2022 · 3 comments · Fixed by #77741
Closed
28 of 30 tasks

sql: TTL v1 meta issue #75428

otan opened this issue Jan 24, 2022 · 3 comments · Fixed by #77741
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@otan
Copy link
Contributor

otan commented Jan 24, 2022

This is a meta issue for the v1 TTL meta issue.

⛔ = blocked
✔️ = in review
♻️ = in progress

Done

out of scoped

  • ttl_expiration_expression (maybe a stretch)

By February 25

  • ✔️ Plug in admission priority for DELETE. (ttljob: add row statistics #76837)
  • ✔️ implement a scan of deleted vs active rows to report as metrics: In addition to the current metrics, we need to write a loop that counts all the expired rows and existing rows in the table. For now, this will be an extra goroutine on the TTL job. This needs to be gated by a table setting and cluster setting. (in review: ttljob: add row statistics #76837)

Stability items

  • ✔️ Add telemetry (ttl: add telemetry #77108)
  • (happy for someone to take this) add validation builtins (see below); incorporate into debug doctor
  • Write some builtins which scan the descriptors and validate all tables with TTL have an associated scheduled job & scans row level TTL jobs and ensure they map to a valid table.
  • Write some builtins which can create the scheduled jobs you might expect if they don't exist.
  • Re-do performance testing of TTL

Product Questions

  • Figure out how TTL priority configuration should align with SET ADMISSION PRIORITY (i.e. using numbers versus enums)
  • figure out advice for "child" metrics (since we use aggmetric for metrics, child labels are not applied by default... see slack)
  • i'd advocate for calling row-level TTL "preview" / "experimental" for this release. should we?

Epic CRDB-10488

Jira issue: CRDB-12666

@otan otan added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 24, 2022
@otan otan self-assigned this Jan 24, 2022
craig bot pushed a commit that referenced this issue Feb 1, 2022
75602: sql: persist TTL metadata to the descriptor r=postamar a=otan

These commits allow the WITH option in a table to symbolize TTL.
Furthermore, these values can be introspected.

Refs: #75428

See individual commits for details.

75688: ptreconcile,server: rework ptreconciler for multi-tenant r=ajwerner,miretskiy a=adityamaru

The ptreconciler is responisble for periodically scanning the
`system.pts_records` table, and checking if there are any stale
records to be released, based on callbacks registerd on server
startup. Previously, the ptreconciler used the meta1leaseholder
to ensure that there was only one instance of it running in the
cluster. Additionally, it was reliant on the ptcache to iterate
over the records when checking whether they were stale.

In the multi-tenant version of the protected timestamp subsystem,
the SQL pod running the reconciler cannot use the meta1leaseholder
to determine whether or not it should run the reconciliation loop.
To get around this, we move the `Start` of the ptreconciler to the
Resume hook of the auto span config job. We are guaranteed via the
spanconfig manager, that there will always be atmost one instance of
this job in a cluster. Further, this is a forever running job, and so
we can tie the execution of the ptreconciler to the lifetime of the
spanconfig job resumer. Additionally, since we will be doing away
with the ptcache, we switchover to doing a full table scan every time
the reconciliation loop is run. While not ideal, this is not alarming
since we have a conservative limit on the total size of all records
that can be stored in the table, and reconciliation only runs once
every 5mins by default. Additionally, we do not expect many
concurrent BACKUP/CDC jobs to exist in the cluster at a given point in
time.

This change also refactors some of the server and tenant code to plumb
a ptreconciler to the ExecutorConfig, for use by the auto span config
job. We move the relevant job+schedule tests into a ccl pacakge to allow
testing from within a secondary tenant.

Informs: #73727

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
@exalate-issue-sync exalate-issue-sync bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Feb 3, 2022
@vy-ton
Copy link
Contributor

vy-ton commented Feb 22, 2022

Figure out how TTL priority configuration should align with SET ADMISSION PRIORITY (i.e. using numbers versus enums)

If we think we might need the flexibility/tuning with QoS for TTL, I am ok deferring this alignment until later.

i'd advocate for calling row-level TTL "preview" / "experimental" for this release. should we?

In docs, we will call it beta support but that shouldn't apply to any exposed settings or code.

@vy-ton
Copy link
Contributor

vy-ton commented Mar 10, 2022

Should we file issues for out of scope items like ttl_expiration_expression?

@otan
Copy link
Contributor Author

otan commented Mar 10, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants