-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: add changefeed.lagging_ranges metric #109835
changefeedccl: add changefeed.lagging_ranges metric #109835
Conversation
2341592
to
6e29018
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 12 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava)
pkg/ccl/changefeedccl/metrics.go
line 788 at r1 (raw file):
func (s *sliMetrics) getLaggingRangesCallback() func(int64) { // Because this gauge is shared between changefeeds in the same metrics scope, // must instead modify it using `Inc` and `Dec` (as opposed to `Update`) to
nit: something is missing (e.g. care must be take
, or we must instead
)
pkg/ccl/changefeedccl/metrics.go
line 800 at r1 (raw file):
// If 3 ranges catch up, last=10,i=7: X.Dec(10 - 7) = X.Dec(3) // If 4 ranges fall behind, last=7,i=11: X.Dec(7 - 11) = X.Inc(4) // If 1 lagging range is deleted, last=7,i=10: X.Dec(11-10) = X.Dec(1)
very nice explannation.
pkg/ccl/changefeedccl/kvfeed/testing_knobs.go
line 37 at r1 (raw file):
ModifyTimestamps func(*hlc.Timestamp) // RangefeedOptions which will be passed to the rangefeed which lies under // the kvfeed.
nit: can you rephrase "lies" part? just sounds a bit strange. "Which lets kvfeed override regular settings" or some such?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 97 at r1 (raw file):
// LaggingRangesThreshold is how far behind a range must be from the present to // be considered as 'lagging' behind in metrics var LaggingRangesThreshold = settings.RegisterDurationSetting(
There are 2 things I'm pretty sure about:
- These two settings do not belong here. The configuration option you added can be passed in these thresholds .
- I'm not sure these should be settings in the first place. Perhaps these should be changefeed options with some default values assigned to them?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 288 at r1 (raw file):
func (ds *DistSender) monitorLaggingRanges( ctx context.Context, rr *rangeFeedRegistry, updateLaggingRanges func(int64), ) error {
There are few things that I think should be changed.
First, I think the responsibility of managing the frequency of this function, plus determination
if/how this function should be called (i.e. use goroutine on a timer, or not) belongs with the caller. I just don't
think we should be adding additional goroutines to rangefeed itself. That can be easily done by changefeed
(kvfeed).
This will allow you to completely remove the settings from distsender (which I mention above).
The second change I would like to see is that, I think ultimately the entire logic of the counting case also belongs with changefeed. Your code to calculate things is fine, of course. It's just this code should not live here. It's too changefeed specific. Instead, I think you can easily use the refactoring you already did (introduction of ForEachPartialRangefeed
method), and do something like this:
type ForEachRangeFn func(fn ActiveRangeFeedIterFn) error
// Change WithLaggingRangesUpdate configuration option to be something like:
func WithRangeObserver(observer func(ForEachRangeFn) RangeFeedOption {
return optionFunc(func(c *rangeFeedConfig) {
c.rangeObserver = observer
})
}
// Then, once you have setup rangefeed registry (rr):
if cfg.rangeObserver != nil {
cfg.rangeObserver(rr.ForEachPartialRangefeed)
}
That's it -- rangefeed library will invoke a function that will be passed an observer function that the caller
may invoke when it wants to examine the state of all ranges that are part of the rangfeed.
The majority of this function simply moves to changefeed code, which I think is much better.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 306 at r1 (raw file):
// we consider the time the partial rangefeed was created to be its starting time. ts := hlc.Timestamp{WallTime: feed.CreatedTime.UnixNano()} if !feed.Resolved.EqOrdering(hlc.Timestamp{}) {
if feed.Resolved.IsSet() ...
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 399 at r1 (raw file):
// ForEachPartialRangefeed invokes provided function for each partial rangefeed. Use manageIterationErrs // if the fn uses iterutil.StopIteration to stop iteration.
this manageIterationErrs bit is might strange. Why wouldn't we want to use stopIteration unconditionally?
6e29018
to
3b592af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)
pkg/ccl/changefeedccl/metrics.go
line 800 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
very nice explannation.
Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 97 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
There are 2 things I'm pretty sure about:
- These two settings do not belong here. The configuration option you added can be passed in these thresholds .
- I'm not sure these should be settings in the first place. Perhaps these should be changefeed options with some default values assigned to them?
Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 288 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
There are few things that I think should be changed.
First, I think the responsibility of managing the frequency of this function, plus determination
if/how this function should be called (i.e. use goroutine on a timer, or not) belongs with the caller. I just don't
think we should be adding additional goroutines to rangefeed itself. That can be easily done by changefeed
(kvfeed).This will allow you to completely remove the settings from distsender (which I mention above).
The second change I would like to see is that, I think ultimately the entire logic of the counting case also belongs with changefeed. Your code to calculate things is fine, of course. It's just this code should not live here. It's too changefeed specific. Instead, I think you can easily use the refactoring you already did (introduction of
ForEachPartialRangefeed
method), and do something like this:type ForEachRangeFn func(fn ActiveRangeFeedIterFn) error // Change WithLaggingRangesUpdate configuration option to be something like: func WithRangeObserver(observer func(ForEachRangeFn) RangeFeedOption { return optionFunc(func(c *rangeFeedConfig) { c.rangeObserver = observer }) } // Then, once you have setup rangefeed registry (rr): if cfg.rangeObserver != nil { cfg.rangeObserver(rr.ForEachPartialRangefeed) }
That's it -- rangefeed library will invoke a function that will be passed an observer function that the caller
may invoke when it wants to examine the state of all ranges that are part of the rangfeed.The majority of this function simply moves to changefeed code, which I think is much better.
Ack. I agree. I just pushed an update with these changes.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 306 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
if feed.Resolved.IsSet() ...
Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 399 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
this manageIterationErrs bit is might strange. Why wouldn't we want to use stopIteration unconditionally?
I got rid of it. It was a bit odd because it's called by another iterator, but I think it's okay to not manage iteration errors by default.
3b592af
to
be0e279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 12 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava)
pkg/ccl/changefeedccl/changefeed_processors.go
line 468 at r2 (raw file):
updateLaggingRanges := sliMetrics.getLaggingRangesCallback() return func(fn kvcoord.ForEachRangeFn) { go func() {
I think it would be much nicer if kvFeed created this observer instead of processor.
First, kvfeed deals with rangefeeds -- so, it's a natural place for that logic.
Secondly, you don't need to worry about your stray go function not noticing things
like context cancellation.
Finally, you definitely want this observer to run with the same goroutines started
by kvfeed because you want this observer to terminate as soon as the rangefeed itself terminates.
pkg/ccl/changefeedccl/changefeed_processors.go
line 478 at r2 (raw file):
select { case <-ctx.Done(): log.Infof(ctx, "changefeed lagging ranges observer shutting down: %s", ctx.Err())
not sure this (and the above) log lines are needed; if you insist, perhaps if log.V(1)
pkg/ccl/changefeedccl/changefeed_processors.go
line 483 at r2 (raw file):
count := int64(0) thresholdTS := timeutil.Now().Add(-1 * changefeedbase.LaggingRangesThreshold.Get(&settings.SV)) i := 0
is this used?
pkg/ccl/changefeedccl/changefeedbase/settings.go
line 314 at r2 (raw file):
// LaggingRangesCheckFrequency is the frequency at which a changefeed will // check for ranges which have fallen behind. var LaggingRangesCheckFrequency = settings.RegisterDurationSetting(
should these be changefeed options instead of settings?
Not much of a change, but options will let you configure more sensitive changefeed with
lower thresholds.
pkg/kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go
line 100 at r2 (raw file):
if cfg.rangeObserver != nil { cfg.rangeObserver(rr.ForEachPartialRangefeed) }
not sure we need to do this here... Why not call this right after rr created (in dist_sender_rangefeed)?
Assuming you do this in kvfeed, we have "g" (ctxgroup) that runs rangefeed and schema feed. So, your option
could be
WithRangeFeedObserver(func(observe kvcoord.ForEachRangeFn) {
g.GoCtx(func(ctx context.Context) {
return laggingRangesObserver(....., fn);
}
)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 92 at r2 (raw file):
} // ForEachRangeFn is used to iterate o
nit: needs better comment.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 852 at r2 (raw file):
for { select { case <-ctx2.Done():
this is a busy loop; while I understand it's a test, it probably makes it a lot more aggressive.
Just have a simple timer (200ms?)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 863 at r2 (raw file):
}) observedRangesMu.Unlock() require.NoError(t, err)
few things: first, it's not great to panic (or use require) on anything other than the test thread itself (it could lead to deadlocks.
Second: not a fan of stray go func
. I think under race this may even show up as a leaked goroutine since you're not waiting for the goroutine to finish -- you just cancel context; and if leakchecker runs before that, you'll get an error.
I think you should just use context group.
g := ctxgroup.WithContext(ctx2)
defer func() {
cancel()
_ = g.Wait() // Ignore result since it's going to be cancel; you just want to make sure you wait for the goroutine to exit prior to returning
}()
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 888 at r2 (raw file):
observedRangesMu.Lock() defer observedRangesMu.Unlock() for r := range expectedRanges {
perhaps use reflect.DeepEqual(a, b)
?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 910 at r2 (raw file):
makeSpan("1/{4-5}"): {}, makeSpan("{1/5-2}"): {}, }
nice test
d04c370
to
b0725c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)
pkg/ccl/changefeedccl/changefeed_processors.go
line 468 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
I think it would be much nicer if kvFeed created this observer instead of processor.
First, kvfeed deals with rangefeeds -- so, it's a natural place for that logic.
Secondly, you don't need to worry about your stray go function not noticing things
like context cancellation.
Finally, you definitely want this observer to run with the same goroutines started
by kvfeed because you want this observer to terminate as soon as the rangefeed itself terminates.
Done. Now we create it in the same ctx group.
pkg/ccl/changefeedccl/changefeed_processors.go
line 478 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
not sure this (and the above) log lines are needed; if you insist, perhaps
if log.V(1)
I got rid of it.
pkg/ccl/changefeedccl/changefeed_processors.go
line 483 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
is this used?
Nope. Removed.
pkg/ccl/changefeedccl/changefeedbase/settings.go
line 314 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
should these be changefeed options instead of settings?
Not much of a change, but options will let you configure more sensitive changefeed with
lower thresholds.
Done.
pkg/kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go
line 100 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
not sure we need to do this here... Why not call this right after rr created (in dist_sender_rangefeed)?
Assuming you do this in kvfeed, we have "g" (ctxgroup) that runs rangefeed and schema feed. So, your option
could beWithRangeFeedObserver(func(observe kvcoord.ForEachRangeFn) { g.GoCtx(func(ctx context.Context) { return laggingRangesObserver(....., fn); } )
Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 852 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
this is a busy loop; while I understand it's a test, it probably makes it a lot more aggressive.
Just have a simple timer (200ms?)
Ack. Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 863 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
few things: first, it's not great to panic (or use require) on anything other than the test thread itself (it could lead to deadlocks.
Second: not a fan of stray
go func
. I think under race this may even show up as a leaked goroutine since you're not waiting for the goroutine to finish -- you just cancel context; and if leakchecker runs before that, you'll get an error.I think you should just use context group.
g := ctxgroup.WithContext(ctx2) defer func() { cancel() _ = g.Wait() // Ignore result since it's going to be cancel; you just want to make sure you wait for the goroutine to exit prior to returning }()
Done. I assert that the error is "context cancelled" now. The require
happens in a t.Run
goroutine which I believe is safe.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 888 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
perhaps use
reflect.DeepEqual(a, b)
?
Done.
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed_test.go
line 910 at r2 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
nice test
Ty!
b0725c6
to
57d7e0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that ranges undergoing an initial scan for longer than the threshold are considered to be
lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric
for each range in the table. However, as ranges complete the initial scan, the number of ranges will
decrease.
@miretskiy Do you think it would be worth having a separate metric which ignores initial scans? I can imagine an operator not wanting this noise.
Alternatively, they could start the changefeed under one metrics label and change the label when it completes, but restarting a big changefeed takes time and there still might be some noise if they try that.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, maybe it's not worth it. We have the checkpoint_progress sli metric which ignores initial scans. 2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava)
pkg/ccl/changefeedccl/kvfeed/kv_feed.go
line 38 at r3 (raw file):
) type LaggingRangesPollerCfg struct {
comment on the exported struct (or just get rid of the struct).
pkg/ccl/changefeedccl/kvfeed/kv_feed.go
line 44 at r3 (raw file):
// LaggingRangesPollingInterval is how often the kv feed will poll for // lagging ranges and update lagging ranges metrics. LaggingRangesPollingInterval time.Duration
nit: if you keep the struct, it's probably not necessary to prefix member names with laggingrangespolling...
pkg/ccl/changefeedccl/kvfeed/kv_feed.go
line 126 at r3 (raw file):
sc, pff, bf, cfg.UseMux, cfg.Targets, cfg.Knobs) f.onBackfillCallback = cfg.OnBackfillCallback f.rangeObserver = makeLaggingRangesObserver(g, cfg.LaggingRangesPollerCfg)
nit: perhaps rename startLaggingRangesObserver
?
pkg/ccl/changefeedccl/kvfeed/kv_feed.go
line 185 at r3 (raw file):
case <-ctx.Done(): return ctx.Err() case <-time.After(cfg.LaggingRangesPollingInterval):
I prefer not to use time.After (time.After for example does not notice ctx cancellation until time expires); consider creating timer instead.
pkg/kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go
line 96 at r3 (raw file):
if cfg.knobs.metrics != nil { m.metrics = cfg.knobs.metrics }
nit: looks like this file only has white space changes -- consider reverting.
This change adds the `changefeed.lagging_ranges` metric which can be used to track ranges which are behind. This metric is calculated based on a new changefeed option `lagging_ranges_threshold` which is the amount of time that a range checkpoint needs to be in the past to be considered lagging. This defaults to 3 minutes. This change also adds the changefeed option `lagging_ranges_polling_interval` which is the polling rate at which a rangefeed will poll for lagging ranges and update the metric. This defaults to 1 minute. Sometimes a range may not have any checkpoints for a while because the start time may be far in the past (this causes a catchup scan during which no checkpoints are emitted). In this case, the range is considered to the lagging if the created timestamp of the rangefeed is older than `changefeed.lagging_ranges_threshold`. Note that this means that any changefeed which starts with an initial scan containing a significant amount of data will likely indicate nonzero `changefeed.lagging_ranges` until the initial scan is complete. This is intentional. Release note (ops change): A new metric `changefeed.lagging_ranges` is added to show the number of ranges which are behind in changefeeds. This metric can be used with the `metrics_label` changefeed option. A changefeed option `lagging_ranges_threshold` is added which is the amount of time a range needs to be behind to be considered lagging. By default this is 3 minutes. There is also a new option `lagging_ranges_polling_interval` which controls how often the lagging ranges calculation is done. This setting defaults to polling every 1 minute. Note that polling adds latency to the metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterwards. Also note that ranges undergoing an initial scan for longer than the threshold are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. However, as ranges complete the initial scan, the number of ranges will decrease. Epic: None
57d7e0a
to
b5ec7df
Compare
bors r=miretskiy |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from b5ec7df to blathers/backport-release-22.2-109835: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from b5ec7df to blathers/backport-release-23.1-109835: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This change is a backport of the commit from cockroachdb#109835. The original chance uses changefeed options to configure lagging ranges metrics. Because changefeed options require version gates, they cannot be backported. This change instead uses cluster settings. This change adds the `changefeed.lagging_ranges` metric which can be used to track ranges which are behind. This metric is calculated based on a new changefeed option `lagging_ranges_threshold` which is the amount of time that a range checkpoint needs to be in the past to be considered lagging. This defaults to 3 minutes. This change also adds the changefeed option `lagging_ranges_polling_interval` which is the polling rate at which a rangefeed will poll for lagging ranges and update the metric. This defaults to 1 minute. Sometimes a range may not have any checkpoints for a while because the start time may be far in the past (this causes a catchup scan during which no checkpoints are emitted). In this case, the range is considered to the lagging if the created timestamp of the rangefeed is older than `changefeed.lagging_ranges_threshold`. Note that this means that any changefeed which starts with an initial scan containing a significant amount of data will likely indicate nonzero `changefeed.lagging_ranges` until the initial scan is complete. This is intentional. Release note (ops change): A new metric `changefeed.lagging_ranges` is added to show the number of ranges which are behind in changefeeds. This metric can be used with the `metrics_label` changefeed option. A new cluser setting `changefeed.lagging_ranges_threshold` is added which is the amount of time a range needs to be behind to be considered lagging. By default this is 3 minutes. There is also a new cluster setting `changefeed.lagging_ranges_polling_interval` which controls how often the lagging ranges calculation is done. This setting defaults to polling every 1 minute. Note that polling adds latency to the metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterwards. Also note that ranges undergoing an initial scan for longer than the threshold are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. However, as ranges complete the initial scan, the number of ranges will decrease. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-9181
This change is a backport of the commit from cockroachdb#109835. The original chance uses changefeed options to configure lagging ranges metrics. Because changefeed options require version gates, they cannot be backported. This change instead uses cluster settings. This change adds the `changefeed.lagging_ranges` metric which can be used to track ranges which are behind. This metric is calculated based on a new changefeed option `lagging_ranges_threshold` which is the amount of time that a range checkpoint needs to be in the past to be considered lagging. This defaults to 3 minutes. This change also adds the changefeed option `lagging_ranges_polling_interval` which is the polling rate at which a rangefeed will poll for lagging ranges and update the metric. This defaults to 1 minute. Sometimes a range may not have any checkpoints for a while because the start time may be far in the past (this causes a catchup scan during which no checkpoints are emitted). In this case, the range is considered to the lagging if the created timestamp of the rangefeed is older than `changefeed.lagging_ranges_threshold`. Note that this means that any changefeed which starts with an initial scan containing a significant amount of data will likely indicate nonzero `changefeed.lagging_ranges` until the initial scan is complete. This is intentional. Release note (ops change): A new metric `changefeed.lagging_ranges` is added to show the number of ranges which are behind in changefeeds. This metric can be used with the `metrics_label` changefeed option. A new cluser setting `changefeed.lagging_ranges_threshold` is added which is the amount of time a range needs to be behind to be considered lagging. By default this is 3 minutes. There is also a new cluster setting `changefeed.lagging_ranges_polling_interval` which controls how often the lagging ranges calculation is done. This setting defaults to polling every 1 minute. Note that polling adds latency to the metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterwards. Also note that ranges undergoing an initial scan for longer than the threshold are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. However, as ranges complete the initial scan, the number of ranges will decrease. Epic: CRDB-9181
This change adds the
changefeed.lagging_ranges
metric which can be used to trackranges which are behind. This metric is calculated based on a new changefeed option
lagging_ranges_threshold
which is the amount of time that a rangecheckpoint needs to be in the past to be considered lagging. This defaults to 3 minutes.
This change also adds the changefeed option
lagging_ranges_polling_interval
which isthe polling rate at which a rangefeed will poll for lagging ranges and update the metric.
This defaults to 1 minute.
Sometimes a range may not have any checkpoints for a while because the start time
may be far in the past (this causes a catchup scan during which no checkpoints are emitted).
In this case, the range is considered to the lagging if the created timestamp of the
rangefeed is older than
changefeed.lagging_ranges_threshold
. Note that this means thatany changefeed which starts with an initial scan containing a significant amount of data will
likely indicate nonzero
changefeed.lagging_ranges
until the initial scan is complete. Thisis intentional.
Release note (ops change): A new metric
changefeed.lagging_ranges
is added to show the number ofranges which are behind in changefeeds. This metric can be used with the
metrics_label
changefeedoption. A changefeed option
lagging_ranges_threshold
is added which is the amount oftime a range needs to be behind to be considered lagging. By default this is 3 minutes. There is also
a new option
lagging_ranges_polling_interval
which controls how often the lagging rangescalculation is done. This setting defaults to polling every 1 minute.
Note that polling adds latency to the metric being updated. For example, if a range falls behind
by 3 minutes, the metric may not update until an additional minute afterwards.
Also note that ranges undergoing an initial scan for longer than the threshold are considered to be
lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric
for each range in the table. However, as ranges complete the initial scan, the number of ranges will
decrease.
Epic: None