-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Search/Query may failed during updating delegator cache #37115
Labels
kind/bug
Issues or changes related a bug
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Milestone
Comments
weiliu1031
added
kind/bug
Issues or changes related a bug
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
labels
Oct 24, 2024
/assign |
yanliang567
added
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
and removed
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
labels
Oct 25, 2024
sre-ci-robot
pushed a commit
that referenced
this issue
Oct 28, 2024
issue: #37115 pr: #37116 casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. --------- Signed-off-by: Wei Liu <[email protected]>
This was referenced Oct 31, 2024
sre-ci-robot
pushed a commit
that referenced
this issue
Nov 5, 2024
issue: #37115 Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot
pushed a commit
that referenced
this issue
Nov 5, 2024
issue: #37115 casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. --------- Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot
pushed a commit
that referenced
this issue
Nov 5, 2024
issue: #37115 pr#37116 let proxy retry to get shard leader if error happens, which cause if search/query on a unloaded collection, which will keep retrying until ctx done. This PR add error type check to skip retry on ErrCollectionLoaded. Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot
pushed a commit
that referenced
this issue
Nov 12, 2024
issue: #37115 the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock. This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes. --------- Signed-off-by: Wei Liu <[email protected]>
weiliu1031
added a commit
to weiliu1031/milvus
that referenced
this issue
Nov 17, 2024
) issue: milvus-io#37115 the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock. This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes. --------- Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot
pushed a commit
that referenced
this issue
Nov 25, 2024
) issue: #37115 pr: #37371 #37646 #37729 the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock. This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes. --------- --------- Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Congqi Xia <[email protected]> Co-authored-by: congqixia <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Issues or changes related a bug
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Is there an existing issue for this?
Environment
Current Behavior
proxy will update shard leader cache first, then release the lock and try to init shard client. which cause a period that user can get shard leader from meta cache, but can't find shard client from shard client manager
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: