-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: GC system.rangelog
#21260
Comments
This shrinks the size of the info field in rangelog entries. gogoproto automatically adds the omitempty json tag for all proto3 fields (that aren't explicitly marked as nullable with a gogoproto tag), but it can't do so if the user specifies a custom jsontag. I don't know for sure why we initially added these, but now I can't remove them without messing up backwards compatibility with the old json field names, so just add the necessary omitempty annotations. Helps with range log size as related to cockroachdb#21260. Release note (sql change): Reduced size of entries stored in the system.rangelog table by not storing empty JSON fields.
@BramGruneir, is there any legacy reason that we need to keep printing the I'm separately going to shrink down the allocator details, because I'm most familiar with what's safe to change there, but there'll still be a bunch of space used in the first chunk of the
Could just as easily be printed using the more compact
|
There's a whole bunch of stuff that we typically don't need to print. Omit all such fields when we can. Up-replicate details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}" to: "Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"} Down-replicate details go from: "Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}" to: "Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}" Rebalance details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}" To: "Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"} Touches cockroachdb#21260, but there's still more that can likely be shrunk down, as described on the issue. Release note (sql change): Reduce size of system.rangelog entries to save disk space.
I looked into the above questions and found that the |
I think the best of way doing this would be with a proper migration. It would be great to consider normalizing the table a bit, or changing the string columns to json and adding an index or two. For now, it might be a lot easier to just delete all tail entires after some limit to stop the table from getting too big. Even better would be to limit the number of entries per range, but without an index, i worry about how slow that transaction might be. |
There's a whole bunch of stuff that we typically don't need to print. Omit all such fields when we can. Up-replicate details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}" to: "Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"} Down-replicate details go from: "Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}" to: "Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}" Rebalance details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}" To: "Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"} Touches cockroachdb#21260, but there's still more that can likely be shrunk down, as described on the issue. Release note (sql change): Reduce size of system.rangelog entries to save disk space.
There's a whole bunch of stuff that we typically don't need to print. Omit all such fields when we can. Up-replicate details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}" to: "Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"} Down-replicate details go from: "Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}" to: "Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}" Rebalance details go from: "Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}" To: "Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"} Touches cockroachdb#21260, but there's still more that can likely be shrunk down, as described on the issue. Release note (sql change): Reduce size of system.rangelog entries to save disk space.
Is someone working on it actively? If not, we were planning to take this up. I have some ideas and questions regarding the same:-
|
No one is currently working on this. I don't think the replica queues are a great fit for this because they work at the wrong level. I think we'd need something new at the SQL level to handle the GC of this data. Ideally it would be flexible enough to handle TTLs for any table instead of something specific to
Ranges are an implementation detail and should not be used in deciding when to GC a row. Either make it based on the timestamp in the row or the total number of rows in the table. (I'd generally prefer to make it time-based). |
I don't really see this in the CF&S area until we add row-level TTLs (#20239). The more likely option right now is some periodic job that deletes from the table (via SQL). @jordanlewis feel free to move this elsewhere. I'm leaving in CF&S mostly because I don't know where to put it. |
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, GC of system.rangelog is disabled. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes cockroachdb#21260 Release note: Add configuration to enable GC of system.rangelog
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days, and that for system.eventlog to 90 days. Fixes cockroachdb#21260. Release note (sql change): the range log and system events logs will automatically purge records older than 30 and 90 days, respectively. This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl cluster settings.
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days, and that for system.eventlog to 90 days. Fixes cockroachdb#21260. Release note (sql change): the range log and system events logs will automatically purge records older than 30 and 90 days, respectively. This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl cluster settings.
30913: server: add a configuration to enable GC of system.rangelog r=tschottdorf a=mvijaykarthik system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days. Fixes #21260 Release note: Add configuration to enable GC of system.rangelog 31239: sql: attempt to deflake distsql physical planner tests r=tschottdorf a=jordanlewis Make sure the range cache is populated before verifying things about it. This seems like a hack, but otherwise I think these will just keep flaking. Closes #25808. Closes #31235. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Jordan Lewis <[email protected]>
system.rangelog table currently grows unboundedly. The rate of growth is slow (as long as there is no replica rebalancing thrashing), but it can still become a problem in long running clusters. This commit adds cluster settings to specify interval and TTL for rows in system.rangelog. By default, TTL of system.rangelog is set to 30 days, and that for system.eventlog to 90 days. Fixes cockroachdb#21260. Release note (sql change): the range log and system events logs will automatically purge records older than 30 and 90 days, respectively. This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl cluster settings.
We currently allow the
system.rangelog
table to grow without bound, on the assumption that its rate of growth will be negligible, but this isn't true. On long-lived clusters of many nodes, the rangelog table can grow to a significant fraction of the total database size. We need some sort of garbage collection here (and we should also take the opportunity to evaluate the usefulness of this log and whether we could be writing less data to it).The text was updated successfully, but these errors were encountered: