Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: Change default interval for instance stats query #666

Merged
merged 1 commit into from
Aug 10, 2023

Conversation

skoeva
Copy link
Contributor

@skoeva skoeva commented Aug 3, 2023

Following #663 where we add the background job to periodically populate the instance_stats table, we want to ensure that we generate entries for instances that may not check in hourly but are still active. Thus, we consider a 2-hour default interval for instances checking in, maintaining the hourly cadence of the background job as desired.

Copy link
Member

@pothos pothos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed and what it should prevent how.
The added check in the current form will always be true, or?

@skoeva
Copy link
Contributor Author

skoeva commented Aug 3, 2023

@pothos I thought about it wrong, I think instead of passing in a default interval, we can calculate the interval from the current time (set by the goroutine to be hourly) to the latest timestamp in the table. This would cover any potential case where the server goes down and the query does not run. What do you think?

@skoeva skoeva changed the title Fix instance stats background job logic backend: Calculate instance stats query interval using latest entry timestamp Aug 3, 2023
@skoeva skoeva changed the title backend: Calculate instance stats query interval using latest entry timestamp backend: Calculate UpdateInstanceStats interval using latest entry timestamp Aug 3, 2023
@skoeva skoeva closed this Aug 3, 2023
@skoeva skoeva deleted the skoeva/bg-job-fix branch August 3, 2023 17:56
@skoeva skoeva restored the skoeva/bg-job-fix branch August 3, 2023 18:01
@skoeva skoeva reopened this Aug 3, 2023
@pothos
Copy link
Member

pothos commented Aug 4, 2023

I remember why I wanted 2 hours as default - not all instances may check in during an hour and we might not count them then. The minimum considered should probably we always be 2 hours (or at least 90 minutes).

It does not hurt if the calculation is done one time or even multiple times per hour but the interval to look back should still be 2 hours then.

I don't think we need to make the interval adaptive because when an instance didn't check in the last 2 hours it is probably deleted.

Summary:

  • Change the default interval to be 2 hours and document why
  • Let the timer ticker run every hour as this was the desired cadence
  • On startup we maybe don't need any special handling, we can always create the entry with the default interval

@yolossn
Copy link
Contributor

yolossn commented Aug 7, 2023

Hey @pothos. There can be a case when the UpdateInstanceStats didn't run successfully and the subsequent runs of UpdateInstanceStats will not take into consideration that the previous run failed and will not include the instances data that ideally should have been processed by the failed UpdateInstanceStats run. The proposed solution takes the last successful run timestamp and runs the calculation thus keep the data always in sync.

@skoeva skoeva marked this pull request as ready for review August 7, 2023 12:43
@jepio
Copy link
Member

jepio commented Aug 7, 2023

Hey @pothos. There can be a case when the UpdateInstanceStats didn't run successfully and the subsequent runs of UpdateInstanceStats will not take into consideration that the previous run failed and will not include the instances data that ideally should have been processed by the failed UpdateInstanceStats run. The proposed solution takes the last successful run timestamp and runs the calculation thus keep the data always in sync.

I think @pothos is right - i think it's more important to keep a constant interval than it is to try to count instances between to the last run. That's going to give more reliable results.

Imagine the server is down (or failed) for a day. With your suggestion we would count a whole bunch of transient instances towards the next run, which would look like a spike. But that's only an artifact of the server being down. So we get more reasonable results by doing the "2 hour lookback every hour" thing Kai proposes.

@skoeva skoeva marked this pull request as draft August 7, 2023 13:43
@skoeva skoeva changed the title backend: Calculate UpdateInstanceStats interval using latest entry timestamp backend: Change default interval for instance stats query Aug 7, 2023
Following #663 where we add the background job to periodically populate the instance_stats table, we want to ensure that we generate entries for instances that may not check in hourly but are still active. Thus, we consider a 2-hour default interval for instances checking in, maintaining the hourly cadence of the background job as desired.
@skoeva skoeva marked this pull request as ready for review August 7, 2023 14:19
@yolossn
Copy link
Contributor

yolossn commented Aug 7, 2023

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

@skoeva
Copy link
Contributor Author

skoeva commented Aug 7, 2023

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

From what it sounds like, there would be no corruption, and a dip (i.e. to 0) in the case of a server shutdown could be easily explained and would not have an impact on the rest of the stats. Those instances checked in pre-shutdown may also no longer be active post-shutdown, so it probably would not make sense to write those entries after in that case.

@skoeva skoeva requested a review from yolossn August 7, 2023 16:55
@jepio
Copy link
Member

jepio commented Aug 9, 2023

In case of server being down there will be no instance stats created so we need not worry about it, but there can be a case where the UpdateInstanceStats can error out, in that case the data will not be accurate. If that is acceptable then the two hour interval makes sense.

I'd say that's acceptable.

@jepio
Copy link
Member

jepio commented Aug 9, 2023

LGTM

@skoeva skoeva merged commit 9b8f17b into main Aug 10, 2023
2 checks passed
@skoeva skoeva mentioned this pull request Aug 22, 2023
@pothos pothos deleted the skoeva/bg-job-fix branch August 29, 2023 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants