Fix request cancellation when frontend worker is busy #740

colega · 2022-01-13T12:35:42Z

Describe the bug

When we try to cancel an enqueued request:

Lines 227 to 236 in e2e2e10

    
           select { 
        
           case <-ctx.Done(): 
        
           	if cancelCh != nil { 
        
           		select { 
        
           		case cancelCh <- freq.queryID: 
        
           			// cancellation sent. 
        
           		default: 
        
           			// failed to cancel, ignore. 
        
           		} 
        
           	}

We have a default clause which we will follow if worker is busy sending or cancelling another request:

http://github.com/grafana/mimir/blob/e2e2e100ec14e7707d1733908e61c83c8b87ef40/pkg/frontend/v2/frontend_scheduler_worker.go#L285-L286

So sometimes the requests are processed in the queriers even when the client has canceled the request.

This is the same as fixed in loki in grafana/loki#5113

We should apply a similar fix.

To Reproduce

See test in grafana/loki#5113

Expected behavior

All upstream requests should be canceled when downstream cancels the originating request.

With previous implementation, if worker was busy talking to scheduler, we didn't push the cancellation, keeping that query running. When cancelling a query, all its subqueries are cancelled at the same time, so this was most likely happening all the time (first subquery scheduled on this worker was canceled, the rest were not because worker was busy cancelling the first one). Also removed the `<-ctx.Done()` escape point when waiting for the enqueueing ACK and modified the enqueueing method to ensure that it always responds something. Fixes: #740 Inspired by: grafana/loki#5113 Signed-off-by: Oleg Zaytsev <[email protected]>

* Increase scheduler worker cancellation chan cap With previous implementation, if worker was busy talking to scheduler, we didn't push the cancellation, keeping that query running. When cancelling a query, all its subqueries are cancelled at the same time, so this was most likely happening all the time (first subquery scheduled on this worker was canceled, the rest were not because worker was busy cancelling the first one). Also removed the `<-ctx.Done()` escape point when waiting for the enqueueing ACK and modified the enqueueing method to ensure that it always responds something. Fixes: #740 Inspired by: grafana/loki#5113 Signed-off-by: Oleg Zaytsev <[email protected]> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <[email protected]> * Remove comment about chan memory usage Co-authored-by: Peter Štibraný <[email protected]> * Update test comment Co-authored-by: Peter Štibraný <[email protected]> * Add resp.Error to the log when response is unknown Signed-off-by: Oleg Zaytsev <[email protected]> * Log the entire uknown response Signed-off-by: Oleg Zaytsev <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

colega self-assigned this Jan 13, 2022

This was referenced Jan 13, 2022

Increase scheduler worker cancellation chan cap #741

Merged

Handle cancellations before requests #742

Closed

pstibrany closed this as completed in #741 Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix request cancellation when frontend worker is busy #740

Fix request cancellation when frontend worker is busy #740

colega commented Jan 13, 2022

Fix request cancellation when frontend worker is busy #740

Fix request cancellation when frontend worker is busy #740

Comments

colega commented Jan 13, 2022