-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make analytics report update job scheduling more efficient #7576
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SpecLad
force-pushed
the
efficient-analytics-debounce
branch
from
March 8, 2024 15:58
9173a2a
to
0f78ac8
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #7576 +/- ##
===========================================
- Coverage 83.53% 83.44% -0.09%
===========================================
Files 372 373 +1
Lines 39700 39739 +39
Branches 3729 3741 +12
===========================================
Hits 33162 33162
- Misses 6538 6577 +39
|
klakhov
reviewed
Mar 11, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, patch works for me.
Currently, `schedule_analytics_report_autoupdate_job` attempts to debounce job scheduling by examining existing jobs before scheduling a new one. Unfortunately, the `scheduler.get_jobs` function, which it uses for this purpose, scales poorly. Not only does it fetch a list of all scheduled jobs (and not just ones related to the current object), but it then fetches information about every job, one by one. The current logic doesn't even need this information, but RQ Scheduler provides no method to get just the IDs. Replace the current logic with a new lightweight approach that uses a custom Redis key to block scheduling of additional jobs.
SpecLad
force-pushed
the
efficient-analytics-debounce
branch
from
March 11, 2024 15:07
0f78ac8
to
144381c
Compare
klakhov
approved these changes
Mar 11, 2024
Merged
3 tasks
zhiltsov-max
pushed a commit
that referenced
this pull request
Mar 29, 2024
…ts (#7596) <!-- Raise an issue to propose your change (https://github.com/opencv/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://opencv.github.io/cvat/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context <!-- Why is this change required? What problem does it solve? If it fixes an open issue, please link to the issue here. Describe your changes in detail, add screenshots. --> See #7576 for more details. This patch extracts the high-level throttling functionality added in that patch and reuses it for quality reports. Note: in that patch I referred to this functionality as debouncing, but throttling seems like a more accurate description. It would be debouncing if the autoupdate job only ran after no updates occurred for a period, which is not how it actually works. ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [x] I have created a changelog fragment <!-- see top comment in CHANGELOG.md --> - ~~[ ] I have updated the documentation accordingly~~ - ~~[ ] I have added tests to cover my changes~~ - ~~[ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))~~ - ~~[ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning))~~ ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and context
Currently,
schedule_analytics_report_autoupdate_job
attempts to debounce job scheduling by examining existing jobs before scheduling a new one. Unfortunately, thescheduler.get_jobs
function, which it uses for this purpose, scales poorly. Not only does it fetch IDs of all scheduled jobs (and not just ones related to the current object), but it then fetches information about every job, one by one. The current logic doesn't even need this information, but RQ Scheduler provides no method to get just the IDs.Replace the current logic with a new lightweight approach that uses a custom Redis key to block scheduling of additional jobs.
How has this been tested?
Manual testing.
Checklist
develop
branch[ ] I have updated the documentation accordingly[ ] I have added tests to cover my changes[ ] I have linked related issues (see GitHub docs)[ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.