backend: Change default interval for instance stats query #666

skoeva · 2023-08-03T14:29:39Z

Following #663 where we add the background job to periodically populate the instance_stats table, we want to ensure that we generate entries for instances that may not check in hourly but are still active. Thus, we consider a 2-hour default interval for instances checking in, maintaining the hourly cadence of the background job as desired.

pothos

I don't understand why this is needed and what it should prevent how.
The added check in the current form will always be true, or?

skoeva · 2023-08-03T15:21:06Z

@pothos I thought about it wrong, I think instead of passing in a default interval, we can calculate the interval from the current time (set by the goroutine to be hourly) to the latest timestamp in the table. This would cover any potential case where the server goes down and the query does not run. What do you think?

pothos · 2023-08-04T09:44:36Z

I remember why I wanted 2 hours as default - not all instances may check in during an hour and we might not count them then. The minimum considered should probably we always be 2 hours (or at least 90 minutes).

It does not hurt if the calculation is done one time or even multiple times per hour but the interval to look back should still be 2 hours then.

I don't think we need to make the interval adaptive because when an instance didn't check in the last 2 hours it is probably deleted.

Summary:

Change the default interval to be 2 hours and document why
Let the timer ticker run every hour as this was the desired cadence
On startup we maybe don't need any special handling, we can always create the entry with the default interval

yolossn · 2023-08-07T10:39:02Z

Hey @pothos. There can be a case when the UpdateInstanceStats didn't run successfully and the subsequent runs of UpdateInstanceStats will not take into consideration that the previous run failed and will not include the instances data that ideally should have been processed by the failed UpdateInstanceStats run. The proposed solution takes the last successful run timestamp and runs the calculation thus keep the data always in sync.

jepio · 2023-08-07T13:07:13Z

Hey @pothos. There can be a case when the UpdateInstanceStats didn't run successfully and the subsequent runs of UpdateInstanceStats will not take into consideration that the previous run failed and will not include the instances data that ideally should have been processed by the failed UpdateInstanceStats run. The proposed solution takes the last successful run timestamp and runs the calculation thus keep the data always in sync.

I think @pothos is right - i think it's more important to keep a constant interval than it is to try to count instances between to the last run. That's going to give more reliable results.

Imagine the server is down (or failed) for a day. With your suggestion we would count a whole bunch of transient instances towards the next run, which would look like a spike. But that's only an artifact of the server being down. So we get more reasonable results by doing the "2 hour lookback every hour" thing Kai proposes.

Following #663 where we add the background job to periodically populate the instance_stats table, we want to ensure that we generate entries for instances that may not check in hourly but are still active. Thus, we consider a 2-hour default interval for instances checking in, maintaining the hourly cadence of the background job as desired.

yolossn · 2023-08-07T16:11:32Z

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

skoeva · 2023-08-07T16:51:19Z

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

From what it sounds like, there would be no corruption, and a dip (i.e. to 0) in the case of a server shutdown could be easily explained and would not have an impact on the rest of the stats. Those instances checked in pre-shutdown may also no longer be active post-shutdown, so it probably would not make sense to write those entries after in that case.

jepio · 2023-08-09T14:19:08Z

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

I'd say that's acceptable.

jepio · 2023-08-09T14:19:24Z

LGTM

pothos reviewed Aug 3, 2023

View reviewed changes

skoeva changed the title ~~Fix instance stats background job logic~~ backend: Calculate instance stats query interval using latest entry timestamp Aug 3, 2023

skoeva changed the title ~~backend: Calculate instance stats query interval using latest entry timestamp~~ backend: Calculate UpdateInstanceStats interval using latest entry timestamp Aug 3, 2023

skoeva closed this Aug 3, 2023

skoeva deleted the skoeva/bg-job-fix branch August 3, 2023 17:56

skoeva restored the skoeva/bg-job-fix branch August 3, 2023 18:01

skoeva reopened this Aug 3, 2023

skoeva marked this pull request as ready for review August 7, 2023 12:43

skoeva force-pushed the skoeva/bg-job-fix branch from f3edf91 to c52304c Compare August 7, 2023 12:46

skoeva marked this pull request as draft August 7, 2023 13:43

skoeva changed the title ~~backend: Calculate UpdateInstanceStats interval using latest entry timestamp~~ backend: Change default interval for instance stats query Aug 7, 2023

skoeva force-pushed the skoeva/bg-job-fix branch from ebe4ab3 to 2b0f447 Compare August 7, 2023 14:15

skoeva marked this pull request as ready for review August 7, 2023 14:19

skoeva requested a review from yolossn August 7, 2023 16:55

jepio approved these changes Aug 9, 2023

View reviewed changes

yolossn approved these changes Aug 10, 2023

View reviewed changes

skoeva merged commit 9b8f17b into main Aug 10, 2023

skoeva mentioned this pull request Aug 22, 2023

Add metrics endpoints #665

Open

pothos deleted the skoeva/bg-job-fix branch August 29, 2023 10:24

github-actions bot mentioned this pull request Nov 10, 2023

Monthly contributions report 2023-07-22 - 2023-08-21 flatcar/Flatcar#1246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend: Change default interval for instance stats query #666

backend: Change default interval for instance stats query #666

skoeva commented Aug 3, 2023 •

edited

Loading

pothos left a comment

skoeva commented Aug 3, 2023 •

edited

Loading

pothos commented Aug 4, 2023

yolossn commented Aug 7, 2023 •

edited

Loading

jepio commented Aug 7, 2023

yolossn commented Aug 7, 2023

skoeva commented Aug 7, 2023

jepio commented Aug 9, 2023

jepio commented Aug 9, 2023

backend: Change default interval for instance stats query #666

backend: Change default interval for instance stats query #666

Conversation

skoeva commented Aug 3, 2023 • edited Loading

pothos left a comment

Choose a reason for hiding this comment

skoeva commented Aug 3, 2023 • edited Loading

pothos commented Aug 4, 2023

yolossn commented Aug 7, 2023 • edited Loading

jepio commented Aug 7, 2023

yolossn commented Aug 7, 2023

skoeva commented Aug 7, 2023

jepio commented Aug 9, 2023

jepio commented Aug 9, 2023

skoeva commented Aug 3, 2023 •

edited

Loading

skoeva commented Aug 3, 2023 •

edited

Loading

yolossn commented Aug 7, 2023 •

edited

Loading