storage: GC `system.rangelog` #21260

bdarnell · 2018-01-05T15:38:03Z

We currently allow the system.rangelog table to grow without bound, on the assumption that its rate of growth will be negligible, but this isn't true. On long-lived clusters of many nodes, the rangelog table can grow to a significant fraction of the total database size. We need some sort of garbage collection here (and we should also take the opportunity to evaluate the usefulness of this log and whether we could be writing less data to it).

The text was updated successfully, but these errors were encountered:

This shrinks the size of the info field in rangelog entries. gogoproto automatically adds the omitempty json tag for all proto3 fields (that aren't explicitly marked as nullable with a gogoproto tag), but it can't do so if the user specifies a custom jsontag. I don't know for sure why we initially added these, but now I can't remove them without messing up backwards compatibility with the old json field names, so just add the necessary omitempty annotations. Helps with range log size as related to cockroachdb#21260. Release note (sql change): Reduced size of entries stored in the system.rangelog table by not storing empty JSON fields.

a-robinson · 2018-01-11T19:28:55Z

@BramGruneir, is there any legacy reason that we need to keep printing the RangeLogEvent_Info protos the way that we do? I don't see any existing readers of it other than tests. It would break existing readers, but we could save a ton of space if we printed them in a more compact format.

I'm separately going to shrink down the allocator details, because I'm most familiar with what's safe to change there, but there'll still be a bunch of space used in the first chunk of the info column. For example, this:

"UpdatedDesc":{"range_id":32,"start_key":"uw==","end_key":"u4mAoBdEMtxE94Q=","replicas":[{"node_id":1,"store_id":1,"replica_id":1},{"node_id":5,"store_id":5,"replica_id":2},{"node_id":2,"store_id":2,"replica_id":3},{"node_id":6,"store_id":6,"replica_id":4}],"next_replica_id":5},"AddReplica":{"node_id":6,"store_id":6,"replica_id":4}

Could just as easily be printed using the more compact RangeDescriptor.String() method, making it:

"UpdatedDesc":"r32:/Table/51{-/1/-6910…} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n6,s6):4,
 next=5]","AddReplica":"(n6,s6):4"

There's a whole bunch of stuff that we typically don't need to print. Omit all such fields when we can. Up-replicate details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}" to: "Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"} Down-replicate details go from: "Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}" to: "Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}" Rebalance details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}" To: "Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"} Touches cockroachdb#21260, but there's still more that can likely be shrunk down, as described on the issue. Release note (sql change): Reduce size of system.rangelog entries to save disk space.

a-robinson · 2018-01-12T17:21:36Z

I looked into the above questions and found that the RangeLog admin endpoint relies on being able to parse the aforementioned fields, so we'd have to only start writing the more compact versions after all nodes have updated to v2.0 or later, and we'd have to still support parsing the old version. So it'd be a bit of work, perhaps more than it's worth. There aren't any other dependencies in our code base, though, so the only question mark is whether anyone externally is parsing these entries, which seems unlikely

BramGruneir · 2018-01-15T22:19:51Z

I think the best of way doing this would be with a proper migration. It would be great to consider normalizing the table a bit, or changing the string columns to json and adding an index or two.

For now, it might be a lot easier to just delete all tail entires after some limit to stop the table from getting too big. Even better would be to limit the number of entries per range, but without an index, i worry about how slow that transaction might be.

There's a whole bunch of stuff that we typically don't need to print. Omit all such fields when we can. Up-replicate details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}" to: "Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"} Down-replicate details go from: "Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}" to: "Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}" Rebalance details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}" To: "Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"} Touches cockroachdb#21260, but there's still more that can likely be shrunk down, as described on the issue. Release note (sql change): Reduce size of system.rangelog entries to save disk space.

kaavee315 · 2018-05-21T11:51:49Z

Is someone working on it actively? If not, we were planning to take this up. I have some ideas and questions regarding the same:-

I see the various implementations of replicaQueues like GCQueue, raftLogQueue etc. which scan all the replicas and process the required parts. We can implement this as a similar rangeLog queue as part of the store. I think the issue over here is that this framework is at KV layer level and not at SQL level, so reading the data and finding what to be deleted might be a trouble.
Is there a similar framework of processing data periodically at SQL layer level? I couldn't find any.
What should be the criteria to decide if we want to delete a particular entry? Should it just be a limit per rangeId?

bdarnell · 2018-05-22T18:47:04Z

No one is currently working on this.

I don't think the replica queues are a great fit for this because they work at the wrong level. I think we'd need something new at the SQL level to handle the GC of this data. Ideally it would be flexible enough to handle TTLs for any table instead of something specific to system.rangelog. This is a commonly-requested feature (#20239).

What should be the criteria to decide if we want to delete a particular entry? Should it just be a limit per rangeId?

Ranges are an implementation detail and should not be used in deciding when to GC a row. Either make it based on the timestamp in the row or the total number of rows in the table. (I'd generally prefer to make it time-based).

tbg · 2018-05-29T11:18:39Z

I don't really see this in the CF&S area until we add row-level TTLs (#20239). The more likely option right now is some periodic job that deletes from the table (via SQL). @jordanlewis feel free to move this elsewhere. I'm leaving in CF&S mostly because I don't know where to put it.

system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, GC of system.rangelog is disabled. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog

system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog

system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days, and that for system.eventlog to 90 days. Fixes cockroachdb#21260. Release note (sql change): the range log and system events logs will automatically purge records older than 30 and 90 days, respectively. This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl cluster settings.

30913: server: add a configuration to enable GC of system.rangelog r=tschottdorf a=mvijaykarthik system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes #21260 Release note: Add configuration to enable GC of system.rangelog 31239: sql: attempt to deflake distsql physical planner tests r=tschottdorf a=jordanlewis Make sure the range cache is populated before verifying things about it. This seems like a hack, but otherwise I think these will just keep flaking. Closes #25808. Closes #31235. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Jordan Lewis <[email protected]>

system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days, and that for system.eventlog to 90 days. Fixes cockroachdb#21260. Release note (sql change): the range log and system events logs will automatically purge records older than 30 and 90 days, respectively. This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl cluster settings.

bdarnell added the O-community Originated from the community label Jan 5, 2018

a-robinson mentioned this issue Jan 8, 2018

storage: Omit empty fields from rangelog json #21318

Merged

a-robinson mentioned this issue Jan 11, 2018

storage: Shrink allocator details strings in system.rangelog entries #21410

Merged

a-robinson added this to the 2.1 milestone Feb 26, 2018

kaavee315 mentioned this issue Apr 13, 2018

Roach test for disk space usage #24795

Closed

bdarnell added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 26, 2018

tbg added the A-kv-client Relating to the KV client and the KV interface. label May 29, 2018

tbg added A-sql-execution Relating to SQL execution. and removed A-kv-client Relating to the KV client and the KV interface. labels Jul 19, 2018

tbg assigned jordanlewis Jul 19, 2018

petermattis modified the milestones: 2.1, 2.2 Sep 25, 2018

mvijaykarthik mentioned this issue Oct 3, 2018

server: add a configuration to enable GC of system.rangelog #30913

Merged

petermattis removed this from the 2.2 milestone Oct 5, 2018

craig bot closed this as completed in #30913 Oct 11, 2018

tbg mentioned this issue Oct 12, 2018

backport-2.1: server: add a configuration to enable GC of system.rangelog #31328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: GC `system.rangelog` #21260

storage: GC `system.rangelog` #21260

bdarnell commented Jan 5, 2018

a-robinson commented Jan 11, 2018

a-robinson commented Jan 12, 2018

BramGruneir commented Jan 15, 2018

kaavee315 commented May 21, 2018

bdarnell commented May 22, 2018

tbg commented May 29, 2018

storage: GC system.rangelog #21260

storage: GC system.rangelog #21260

Comments

bdarnell commented Jan 5, 2018

a-robinson commented Jan 11, 2018

a-robinson commented Jan 12, 2018

BramGruneir commented Jan 15, 2018

kaavee315 commented May 21, 2018

bdarnell commented May 22, 2018

tbg commented May 29, 2018

storage: GC `system.rangelog` #21260

storage: GC `system.rangelog` #21260