This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto flush checkpoint queue if too many are waiting (#279)
Our performance testing finds that our checkpoint queue can increase quickly. This can happen during maintenance if there are a lot of entities in cache and very few cache swap outs happen in the past hour. When an entity's state is swapped out, we try to save a checkpoint. If we haven't done so for an entity within one hour, we put the checkpoint to a buffer and do a flush at the end of maintenance. Since we only flush the 1st 1000 queued requeues to disk, a lot of requests may still wait in the queue until the next flush happens. This is not ideal and can cause memory outages. This PR triggers another flush after the previous flush finishes if there are a lot of queued requests. This PR also corrects the LImitExceededException when a circuit breaker is open: previously, we send a LImitExceededException that stops the detector immediately, which leaves no room for the detector to recover. This PR fixes that by changing the LImitExceededException's stop now flag to be false to give the detector a few more intervals to recover.
- Loading branch information