Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Search/Query may failed during updating delegator cache #37320

Closed
wants to merge 3 commits into from

Conversation

weiliu1031
Copy link
Contributor

issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases.

This PR delay query node client's init operation until getClient is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues.

casue init query node client is too heavy, so we remove
updateShardClient from leader mutex, which cause much more concurrent
cornor cases.

This PR delay query node client's init operation until `getClient` is
called, then use leader mutex to protect updating shard client progress
to avoid concurrent issues.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weiliu1031
To complete the pull request process, please assign xiaofan-luan after the PR has been reviewed.
You can assign the PR to them by writing /assign @xiaofan-luan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added the size/M Denotes a PR that changes 30-99 lines. label Oct 31, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Oct 31, 2024
Copy link
Contributor

mergify bot commented Oct 31, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Oct 31, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Project coverage is 80.84%. Comparing base (48614f7) to head (3a2f651).
Report is 20 commits behind head on 2.5.

Files with missing lines Patch % Lines
internal/proxy/shard_client.go 67.74% 8 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              2.5   #37320      +/-   ##
==========================================
- Coverage   83.15%   80.84%   -2.31%     
==========================================
  Files        1029     1321     +292     
  Lines      157321   183102   +25781     
==========================================
+ Hits       130819   148030   +17211     
- Misses      21332    29865    +8533     
- Partials     5170     5207      +37     
Components Coverage Δ
Client ∅ <ø> (∅)
Core 66.93% <100.00%> (∅)
Go 83.11% <77.77%> (-0.07%) ⬇️
Files with missing lines Coverage Δ
internal/proxy/lb_policy.go 97.90% <100.00%> (+0.17%) ⬆️
internal/proxy/meta_cache.go 91.36% <100.00%> (ø)
internal/proxy/shard_client.go 82.60% <67.74%> (+4.12%) ⬆️

... and 337 files with indirect coverage changes

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@weiliu1031
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Oct 31, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Oct 31, 2024
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
pr#37116 let proxy retry to get shard leader if error happens, which
cause if search/query on a unloaded collection, which will keep retrying
until ctx done.

This PR add error type check to skip retry on ErrCollectionLoaded.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@mergify mergify bot removed the ci-passed label Nov 1, 2024
Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Nov 1, 2024
@weiliu1031 weiliu1031 closed this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug size/M Denotes a PR that changes 30-99 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants