Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Decouple shard client manager from shard cache #37371

Merged
merged 3 commits into from
Nov 12, 2024

Conversation

weiliu1031
Copy link
Contributor

issue: #37115
the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock.

This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes.

@sre-ci-robot sre-ci-robot added the size/XL Denotes a PR that changes 500-999 lines. label Nov 1, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Nov 1, 2024
Copy link

codecov bot commented Nov 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.13%. Comparing base (5e90f34) to head (f3b4481).
Report is 12 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (5e90f34) and HEAD (f3b4481). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (5e90f34) HEAD (f3b4481)
4 1
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #37371       +/-   ##
===========================================
- Coverage   80.58%   68.13%   -12.46%     
===========================================
  Files        1356      290     -1066     
  Lines      190009    25395   -164614     
===========================================
- Hits       153128    17302   -135826     
+ Misses      31475     8093    -23382     
+ Partials     5406        0     -5406     
Components Coverage Δ
Client ∅ <ø> (∅)
Core 68.13% <ø> (ø)
Go ∅ <ø> (∅)

see 1066 files with indirect coverage changes

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from 70bc109 to 5d498f1 Compare November 1, 2024 13:04
Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 1, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from 5d498f1 to f976f00 Compare November 5, 2024 00:46
Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@weiliu1031
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from f976f00 to 86d7446 Compare November 5, 2024 04:18
Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@mergify mergify bot added the ci-passed label Nov 5, 2024
internal/proxy/meta_cache.go Outdated Show resolved Hide resolved
@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from 86d7446 to 2239ddb Compare November 8, 2024 11:01
@mergify mergify bot removed the ci-passed label Nov 8, 2024
Copy link
Contributor

mergify bot commented Nov 8, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.

This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if not
exist. in case of client leak, shard client manager will purge client in
async for every 10 minutes.

Signed-off-by: Wei Liu <[email protected]>
@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from 2239ddb to 557b161 Compare November 11, 2024 03:15
@mergify mergify bot removed the ci-passed label Nov 11, 2024
Copy link
Contributor

mergify bot commented Nov 11, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 11, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Signed-off-by: Wei Liu <[email protected]>
@weiliu1031 weiliu1031 force-pushed the decouple_shard_client branch from 557b161 to ead8b31 Compare November 11, 2024 06:55
@mergify mergify bot added the ci-passed label Nov 11, 2024
Signed-off-by: Wei Liu <[email protected]>
@mergify mergify bot removed the ci-passed label Nov 11, 2024
Copy link
Contributor

mergify bot commented Nov 11, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031
Copy link
Contributor Author

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Nov 11, 2024
@czs007
Copy link
Collaborator

czs007 commented Nov 12, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 2a4c00d into milvus-io:master Nov 12, 2024
18 of 20 checks passed
congqixia added a commit to congqixia/milvus that referenced this pull request Nov 13, 2024
sre-ci-robot pushed a commit that referenced this pull request Nov 13, 2024
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 17, 2024
)

issue: milvus-io#37115
the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.

This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if
not exist. in case of client leak, shard client manager will purge
client in async for every 10 minutes.

---------

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 pushed a commit to weiliu1031/milvus that referenced this pull request Nov 17, 2024
sre-ci-robot pushed a commit that referenced this pull request Nov 25, 2024
)

issue: #37115
pr: #37371 #37646 #37729
the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.

This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if
not exist. in case of client leak, shard client manager will purge
client in async for every 10 minutes.

---------

---------

Signed-off-by: Wei Liu <[email protected]>
Signed-off-by: Congqi Xia <[email protected]>
Co-authored-by: congqixia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants