Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cuebot Scalability Change Proposal #1516

Open
DiegoTavares opened this issue Sep 25, 2024 · 0 comments
Open

WIP: Cuebot Scalability Change Proposal #1516

DiegoTavares opened this issue Sep 25, 2024 · 0 comments
Labels
enhancement Improvement to an existing feature

Comments

@DiegoTavares
Copy link
Collaborator

Incrementally redesign Cuebot's monolith into multiple services

Motivation

  1. Cuebot's current design doesn't scale well horizontally. Although multiple instances of the service can be load balanced to spread rqd's requests, all instances still rely on a single SQL database that can only scale vertically.
  2. The current design relies heavily on the performance of the DispatchQuery, which is a costly query that degrades according to the size of the frames table.
  3. We received multiple feedbacks from different studios interested in the project that were scared of adding a Java based application to their stack, as java is not commonly used in the VFX/Animation industry.

Current Design challenges

  • rqd's connect directly to cuebot using grpc and this connection is binding until one of them restart, which makes distributing load without outage a challenge.
  • The scheduling logic is implemented as a step on the logic that handles rqd reports. This design makes the process not only hard to maintain, but also creates a coupling that impacts performance. Any step on the report handling that takes longer than anticipated will impact the speed at which frames are booked.
  • Performance inefficiency arises when multiple nodes attempt to book the same layer. Without a global lock mechanism, conflicts are only resolved at the final step of the booking process, preventing a frame from running on multiple hosts.

Constraints

Proposal

@DiegoTavares DiegoTavares added the enhancement Improvement to an existing feature label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant