-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: use raft log size (in bytes) as metric for truncation #7065
Comments
@petermattis Is there any reason that we need to store the size of the raft long other than in memory? Servers don't restart very often so we could just do a range scan over the raft log entries to figure out the size at start up. |
My implementation description was following the approach of other raft-log state such as the last index. While it is true that servers don't restart often, when they do we don't want there to be a time consuming scan operation. That said, @bdarnell should also comment on this approach. Avoiding additional persistent state would be nice if it doesn't come with significant downsides. |
We're going to need code to do the scan anyways to upgrade old nodes. Might as well have that be the primary mechanism? Though I guess you'd potentially have to scan 64MB of raft log per range which could be a lot. |
That's a good point about needing the mechanism for upgrades. My concern is one of scale. If it takes 100ms to scan through the raft log to calculate its size we've added significant startup time to load 1000 ranges. Seems like you can get started on the scanning mechanism as we're going to need it and we can wait for @bdarnell to weigh in on whether we should persistently store the length in bytes. |
It's important that we store the last index instead of scanning for it because the range is basically useless if it doesn't know its last index; we must block at startup while doing this scan. We could scan for the range size asynchronously (modulo some tricky synchronization with log entries written during the scan); we just wouldn't be able to GC the log until it had completed. So it might be OK to avoid the persistent state and just scan the logs at startup. On the other hand, consistency of this value is not that important (it's not in the consistency checker's scope) so we could do without the scan even as a one-time upgrade. We could start the counter at zero and GC the log when enough new entries have been written, regardless of how much had been present in the logs from previous runs. On the third hand, one of the reasons we've wanted to avoid scanning the logs is because we let them grow so large. If we're better about log GC, maybe we can afford to just do the scan and block. This makes me nervous, though. It could set us up for trouble when something else prevents log GC. Overall, I think my preference is to store the log size persistently, but we can skip the backfill unless it's easy. |
An option which hasn't been discussed is simply not restoring the in-memory On Sat, Jun 11, 2016 at 5:42 PM Ben Darnell notifications@github.com
-- Tobias |
Yeah, I had that in mind in my second option but didn't spell it out. Since we're already storing things like the last and applied index, though, I think it's better to move towards storing this as well so that all replicas will make the same decisions about GC (instead of that decision varying based on who is leader). |
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. This change will make the raft log be used only when it is smaller than sending a snapshot. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. This change will make the raft log be used only when it is smaller than sending a snapshot. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. This change will make the raft log be used only when it is smaller than sending a snapshot. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. The in-memory size is the approximate size in bytes of the persisted raft log. On server restart, this value is assumed to be zero to avoid costly scans of the raft log. After the first raft log truncation it will be correct. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. The in-memory size is the approximate size in bytes of the persisted raft log. On server restart, this value is assumed to be zero to avoid costly scans of the raft log. After the first raft log truncation it will be correct. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. The in-memory size is the approximate size in bytes of the persisted raft log. On server restart, this value is assumed to be zero to avoid costly scans of the raft log. After the first raft log truncation it will be correct. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. The in-memory size is the approximate size in bytes of the persisted raft log. On server restart, this value is assumed to be zero to avoid costly scans of the raft log. After the first raft log truncation it will be correct. See cockroachdb#7065.
The raft log currently truncates based of the number of entries (max 10000). These entries can be of any size so it makes more sense to use the byte size of the raft log as a metric for truncation. If the raft log is greater than 64MB and there is a behind node, it will truncate the log to the quorum committed index. The in-memory size is the approximate size in bytes of the persisted raft log. On server restart, this value is assumed to be zero to avoid costly scans of the raft log. After the first raft log truncation it will be correct. See cockroachdb#7065.
Raft log truncation currently uses the number of entries in the raft log as the metric for truncation. Using the size in bytes is preferable as it makes the tradeoff between using a snapshot or the raft log more obvious. The implementation should be fairly straightforward as we write raft log entries in
Replica.append
and delete them inReplica.append
andReplica.TruncateLog
. The size of the raft log would be stored under a new unreplicated local range key (keys.RaftLogSizeKey(rangeID)
) and cached in memory.See #6012.
The text was updated successfully, but these errors were encountered: