You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After pulling in #7821 into Cortex, we noticed that not all the items were returned to the chunk pool. This caused the usedTotal in the pool to keep going up and reach the maxTotal (30 GB. After this store gateways are not able to process any requests.
We were able to isolate the root cause to the removal of defer blockClient.Close() from Series() in bucket.go. After putting it back, the issue didn’t occur.
We also noticed another issue even after cherry-picking the diff. The pendingReaders for some blocks are not decremented correctly. Store-Gateways are not able to sync blocks because of this.
Issue
After pulling in #7821 into Cortex, we noticed that not all the items were returned to the chunk pool. This caused the
usedTotal
in the pool to keep going up and reach themaxTotal
(30 GB. After this store gateways are not able to process any requests.We were able to isolate the root cause to the removal of
defer blockClient.Close()
fromSeries()
inbucket.go
. After putting it back, the issue didn’t occur.Metrics
In Cortex the chunk pool has metrics to track it's usage. See: https://github.com/cortexproject/cortex/blob/c25b18d514a191182a818c8f0c954564cf6ceaf4/pkg/storegateway/chunk_bytes_pool.go#L23
The following graphs are for one of the store-gateways in the cluster
Chunk pool
usedTotal
growingChunk pool gets - puts
This shows that the gets are more than puts.
Chunk pool growth after making the following change
Making the following change seems to have fixed the problem.
The text was updated successfully, but these errors were encountered: