-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Fix multitenant querying #7708
Conversation
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0.2%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@DylanGuedes can you check with #7703 so you don't do things twice? :) |
Good call. I left a comment on his PR recommending that we proceed with this PR instead, since I'm adding a test that checks that this is really working and I noticed we also have to adapt things in other places. |
84d26a0
to
ce855f1
Compare
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0.2%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0.2%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
Does it make sense to opt for the lower of multiple limits rather than the higher? The former doesn't introduce unanticipated increases in resource consumption whereas the latter does if I understand correctly. |
Good question. I feel like we can find arguments for both, but personally, I think going for the higher values is better because it avoids that scenario where a smaller tenant makes queries not operational for bigger tenants if they have limits that are smaller enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really great find. Thanks for these awesome tests! However, we should always use the most restrictive limit. See #5626 for the original introduction.
@@ -99,17 +99,25 @@ func (ast *astMapperware) Do(ctx context.Context, r queryrangebase.Request) (que | |||
return ast.next.Do(ctx, r) | |||
} | |||
|
|||
userID, err := tenant.TenantID(ctx) | |||
// use biggest max query parallelism across all given tenants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. We should take the most restrictive.
"github.com/grafana/loki/integration/cluster" | ||
) | ||
|
||
func TestMultiTenantQuery(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved pending additional dialogue about picking the higher vs. lower limit for a multi-tenant query.
Thanks, I hope those will help! Regarding the limits: I'm not strongly opinionated for proceeding with the bigger values, but personally, I think they work better here. That said, I'll wait a little more for others to share opinions before reverting the change etc. |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0.1%
+ distributor 0%
+ querier 0.1%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@jeschkies @JordanRushing I pushed a commit using the smallest limits instead, WDYT? |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0.1%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.1%
+ distributor 0%
+ querier 0.1%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment. I am happy to take over the PR if you give me right access to your repository.
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
I pushed a commit addressing it, WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Hello @DylanGuedes!
Please, if the current pull request addresses a bug fix, label it with the |
**What this PR does / why we need it**: We recently broke multitenant querying because of recent changes to how timeouts working across Loki. This PR fixes this by: - Adapt the timeout wrapper to work with multitenant queries. It will take the shortest timeout across all given tenants - Adapt the query engine timeout assigning to work with multitenant queries. It will take the shortest timeout between all the tenants - Adapt query sharding to use smallest max query parallelism across given tenants - Add a functional test to ensure multitenant is behaving as expected Signed-off-by: DylanGuedes <[email protected]> Signed-off-by: Mehmet Burak Devecí <[email protected]> **Which issue(s) this PR fixes**: grafana#7696 **Special notes for your reviewer**: The regression was probably introduced by grafana#7555
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-7708-to-release-2.7.x origin/release-2.7.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x ad2260aec2d4d587c50a3dfb68d2c89ed2bf3157
# Push it to GitHub
git push --set-upstream origin backport-7708-to-release-2.7.x
git switch main
# Remove the local backport branch
git branch -D backport-7708-to-release-2.7.x Then, create a pull request where the |
**What this PR does / why we need it**: We recently broke multitenant querying because of recent changes to how timeouts working across Loki. This PR fixes this by: - Adapt the timeout wrapper to work with multitenant queries. It will take the shortest timeout across all given tenants - Adapt the query engine timeout assigning to work with multitenant queries. It will take the shortest timeout between all the tenants - Adapt query sharding to use smallest max query parallelism across given tenants - Add a functional test to ensure multitenant is behaving as expected Signed-off-by: DylanGuedes <[email protected]> Signed-off-by: Mehmet Burak Devecí <[email protected]> **Which issue(s) this PR fixes**: #7696 **Special notes for your reviewer**: The regression was probably introduced by #7555
Co-authored-by: Dylan Guedes <[email protected]> Co-authored-by: Owen Diehl <[email protected]>
What this PR does / why we need it:
We recently broke multitenant querying because of recent changes to how timeouts working across Loki. This PR fixes this by:
Signed-off-by: DylanGuedes [email protected]
Signed-off-by: Mehmet Burak Devecí [email protected]
Which issue(s) this PR fixes:
#7696
Special notes for your reviewer:
The regression was probably introduced by #7555