-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Refresh load failed with error call query coordinator LoadCollection: collection not fully loaded
in ci case
#37166
Comments
/assign @weiliu1031 |
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/4/pipeline This issue currently has a relatively high reproduction probability. |
/assign @bigsheeper |
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/7/pipeline so set is as critical |
RESTful load operations or quick setup |
/assign @zhuwenxing |
the failed step is Do you mean that the call of c.load(refresh=True) needs to wait until the load before has completed? |
Yes. So the test cases need to be updated to wait for loading to complete. You can add a timeout to the wait process. BTW, there's also an issue in the server. The import process continues for several dozen seconds, and it's problematic that the collection hasn't completed loading in that time. This is related to the issue #37395. |
issue: #37166 cause the misuse of timer.Reset, which cause dispatcher failed to send msg to virtual channel buffer, and dispatcher do splitting again and again, which hold the dispatcher manager's lock, block watching channel progress. This PR fix the misuse of timer.Reset Signed-off-by: Wei Liu <[email protected]>
/assign @zhuwenxing |
Actually I don't think add more concurrency at proxy could solve the problem. There is no reason we any of the DDL operation could cost more than 9s. It's obviously the abuse of some DDL request, like ListIndexes blocks the whole thing. We need to fix more bottlenecks and try to batch some of the request before we add more concurrency |
To support 10000 collections * 1000 partitions |
Yes, indeed.
|
Let's do this. Merge all the coordinator into one is not a bad idea |
still see it at https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37318/13/pipeline @smellthemoon |
Load times are not very stable. Previously, setting the load timeout to 5s was sufficient, but in subsequent CI runs, load timeouts still fail quite often, even when the timeout is increased to 10s. |
/assign @smellthemoon /unassign |
This looks like this is an import timeout |
Not sure why I cannot get into this link. Met some slow DDL issues caused by resource racing, plz help take a check @SimFG |
may be related with #37851 |
/assign @SimFG |
When there're a lot of loaded collections, they would occupy the target observer scheduler’s pool. This prevents loading collections from updating the current target in time, slowing down the load process. This PR adds a separate target dispatcher for loading collections. issue: milvus-io#37166 --------- Signed-off-by: bigsheeper <[email protected]>
Remove unnecessary ListIndex and DescribeCollection RPC call during loading. issue: #37166, #37630 pr: #37741 Signed-off-by: bigsheeper <[email protected]>
When there're a lot of loaded collections, they would occupy the target observer scheduler’s pool. This prevents loading collections from updating the current target in time, slowing down the load process. This PR adds a separate target dispatcher for loading collections. issue: #37166 pr: #37454 Signed-off-by: bigsheeper <[email protected]>
/assign @zhuwenxing |
When there're a lot of loaded collections, they would occupy the target observer scheduler’s pool. This prevents loading collections from updating the current target in time, slowing down the load process. This PR adds a separate target dispatcher for loading collections. issue: #37166 pr: #37454 --------- Signed-off-by: bigsheeper <[email protected]>
Remove unnecessary ListIndex and DescribeCollection RPC call during loading. issue: #37166, #37630 pr: #37741 Signed-off-by: bigsheeper <[email protected]>
Is there an existing issue for this?
Environment
Current Behavior
After Import, execute Refresh Load, Refresh Load failed
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed ci job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-37148/1/pipeline
Anything else?
No response
The text was updated successfully, but these errors were encountered: