fix: Search/Query may failed during updating delegator cache #37320

weiliu1031 · 2024-10-31T03:02:41Z

issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases.

This PR delay query node client's init operation until getClient is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues.

casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. Signed-off-by: Wei Liu <wei.liu@zilliz.com>

sre-ci-robot · 2024-10-31T03:02:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weiliu1031
To complete the pull request process, please assign xiaofan-luan after the PR has been reviewed.
You can assign the PR to them by writing /assign @xiaofan-luan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

internal/proxy/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mergify · 2024-10-31T03:09:35Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify · 2024-10-31T03:43:22Z

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

codecov · 2024-10-31T03:49:44Z

Codecov Report

Attention: Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Project coverage is 80.84%. Comparing base (48614f7) to head (3a2f651).
Report is 20 commits behind head on 2.5.

Files with missing lines	Patch %	Lines
internal/proxy/shard_client.go	67.74%	8 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              2.5   #37320      +/-   ##
==========================================
- Coverage   83.15%   80.84%   -2.31%     
==========================================
  Files        1029     1321     +292     
  Lines      157321   183102   +25781     
==========================================
+ Hits       130819   148030   +17211     
- Misses      21332    29865    +8533     
- Partials     5170     5207      +37

Components	Coverage Δ
Client	`∅ <ø> (∅)`
Core	`66.93% <100.00%> (∅)`
Go	`83.11% <77.77%> (-0.07%)`	⬇️

Files with missing lines	Coverage Δ
internal/proxy/lb_policy.go	`97.90% <100.00%> (+0.17%)`	⬆️
internal/proxy/meta_cache.go	`91.36% <100.00%> (ø)`
internal/proxy/shard_client.go	`82.60% <67.74%> (+4.12%)`	⬆️

... and 337 files with indirect coverage changes

weiliu1031 · 2024-10-31T06:19:13Z

/run-cpu-e2e

weiliu1031 · 2024-10-31T06:22:09Z

rerun go-sdk

mergify · 2024-10-31T07:04:58Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-10-31T07:43:23Z

/run-cpu-e2e

Signed-off-by: Wei Liu <wei.liu@zilliz.com>

pr#37116 let proxy retry to get shard leader if error happens, which cause if search/query on a unloaded collection, which will keep retrying until ctx done. This PR add error type check to skip retry on ErrCollectionLoaded. Signed-off-by: Wei Liu <wei.liu@zilliz.com>

mergify · 2024-11-01T08:04:36Z

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

weiliu1031 · 2024-11-01T08:26:18Z

/run-cpu-e2e

sre-ci-robot added the size/M label Oct 31, 2024

sre-ci-robot requested review from czs007 and godchen0212 October 31, 2024 03:02

mergify bot added dco-passed kind/bug labels Oct 31, 2024

mergify bot added the ci-passed label Oct 31, 2024

weiliu1031 added 2 commits November 1, 2024 14:51

fix: dead lock if query node crash during shard client init

f9e4eee

Signed-off-by: Wei Liu <wei.liu@zilliz.com>

mergify bot removed the ci-passed label Nov 1, 2024

mergify bot added the ci-passed label Nov 1, 2024

weiliu1031 closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Search/Query may failed during updating delegator cache #37320

fix: Search/Query may failed during updating delegator cache #37320

weiliu1031 commented Oct 31, 2024

sre-ci-robot commented Oct 31, 2024

mergify bot commented Oct 31, 2024

mergify bot commented Oct 31, 2024

codecov bot commented Oct 31, 2024 •

edited

Loading

weiliu1031 commented Oct 31, 2024

weiliu1031 commented Oct 31, 2024

mergify bot commented Oct 31, 2024

weiliu1031 commented Oct 31, 2024

mergify bot commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

fix: Search/Query may failed during updating delegator cache #37320

fix: Search/Query may failed during updating delegator cache #37320

Conversation

weiliu1031 commented Oct 31, 2024

sre-ci-robot commented Oct 31, 2024

mergify bot commented Oct 31, 2024

mergify bot commented Oct 31, 2024

codecov bot commented Oct 31, 2024 • edited Loading

Codecov Report

weiliu1031 commented Oct 31, 2024

weiliu1031 commented Oct 31, 2024

mergify bot commented Oct 31, 2024

weiliu1031 commented Oct 31, 2024

mergify bot commented Nov 1, 2024

weiliu1031 commented Nov 1, 2024

codecov bot commented Oct 31, 2024 •

edited

Loading