Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: flaky test case test_search_group_size_default in ci #36407

Closed
1 task done
zhuwenxing opened this issue Sep 20, 2024 · 7 comments
Closed
1 task done

[Bug]: flaky test case test_search_group_size_default in ci #36407

zhuwenxing opened this issue Sep 20, 2024 · 7 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

pytest : test] self = <test_search.TestSearchGroupBy object at 0x7fe84181ff70>

[pytest : test] 

[pytest : test]     @pytest.mark.tags(CaseLabel.L0)

[pytest : test]     def test_search_group_size_default(self):

[pytest : test]         """

[pytest : test]         target: test search group by

[pytest : test]         method: 1. create a collection with 3 different float vectors

[pytest : test]                 2. build index with 3 different index types and metrics

[pytest : test]                 2. search on 3 different float vector fields with group by varchar field with group size

[pytest : test]                 verify results entity = limit * group_size  and group size is full if group_strict_size is True

[pytest : test]                 verify results group counts = limit if group_strict_size is False

[pytest : test]         """

[pytest : test]         self._connect()

[pytest : test]         dense_types = ["FLOAT16_VECTOR", "FLOAT_VECTOR", "BFLOAT16_VECTOR"]

[pytest : test]         dims = [16, 128, 64]

[pytest : test]         index_types = ["FLAT", "IVF_SQ8", "HNSW"]

[pytest : test]         metrics = ct.float_metrics

[pytest : test]         fields = [cf.gen_int64_field(is_primary=True), cf.gen_string_field()]

[pytest : test]         for i in range(len(dense_types)):

[pytest : test]             fields.append(cf.gen_float_vec_field(name=dense_types[i],

[pytest : test]                                                  vector_data_type=dense_types[i], dim=dims[i]))

[pytest : test]         schema = cf.gen_collection_schema(fields, auto_id=True)

[pytest : test]         collection_w = self.init_collection_wrap(name=prefix, schema=schema)

[pytest : test]     

[pytest : test]         # insert with the same values for scalar fields

[pytest : test]         nb = 100

[pytest : test]         for _ in range(100):

[pytest : test]             string_values = pd.Series(data=[str(i) for i in range(nb)], dtype="string")

[pytest : test]             data = [string_values]

[pytest : test]             for i in range(len(dense_types)):

[pytest : test]                 data.append(cf.gen_vectors(dim=dims[i], nb=nb, vector_data_type=dense_types[i]))

[pytest : test]             collection_w.insert(data)

[pytest : test]     

[pytest : test]         collection_w.flush()

[pytest : test]         for i in range(len(dense_types)):

[pytest : test]             _index_params = {"index_type": index_types[i], "metric_type": metrics[i],

[pytest : test]                              "params": cf.get_index_params_params(index_types[i])}

[pytest : test]             collection_w.create_index(dense_types[i], _index_params)

[pytest : test]         collection_w.load()

[pytest : test]     

[pytest : test]         nq = 2

[pytest : test]         limit = 50

[pytest : test]         group_size = 5

[pytest : test]         for j in range(len(dense_types)):

[pytest : test]             search_vectors = cf.gen_vectors(nq, dim=dims[j], vector_data_type=dense_types[j])

[pytest : test]             search_params = {"params": cf.get_search_params_params(index_types[j])}

[pytest : test]             # when group_strict_size=true, it shall return results with entities = limit * group_size

[pytest : test]             res1 = collection_w.search(data=search_vectors, anns_field=dense_types[j],

[pytest : test]                                        param=search_params, limit=limit, # consistency_level=CONSISTENCY_STRONG,

[pytest : test]                                        group_by_field=ct.default_string_field_name,

[pytest : test]                                        group_size=group_size, group_strict_size=True,

[pytest : test]                                        output_fields=[ct.default_string_field_name])[0]

[pytest : test]             for i in range(nq):

[pytest : test]                 for l in range(limit):

[pytest : test]                     group_values = []

[pytest : test]                     for k in range(10):

[pytest : test]                         group_values.append(res1[i][l].fields.get(ct.default_string_field_name))

[pytest : test]                     assert len(set(group_values)) == 1

[pytest : test]                 assert len(res1[i]) == limit * group_size

[pytest : test]     

[pytest : test]             # when group_strict_size=false, it shall return results with group counts = limit

[pytest : test]             res1 = collection_w.search(data=search_vectors, anns_field=dense_types[j],

[pytest : test]                                        param=search_params, limit=limit, # consistency_level=CONSISTENCY_STRONG,

[pytest : test]                                        group_by_field=ct.default_string_field_name,

[pytest : test]                                        group_size=group_size, group_strict_size=False,

[pytest : test]                                        output_fields=[ct.default_string_field_name])[0]

[pytest : test]             for i in range(nq):

[pytest : test]                 group_values = []

[pytest : test]                 for l in range(len(res1[i])):

[pytest : test]                     group_values.append(res1[i][l].fields.get(ct.default_string_field_name))

[pytest : test]                 assert len(set(group_values)) == limit

[pytest : test]     

[pytest : test]         # hybrid search group by

[pytest : test]         req_list = []

[pytest : test]         for j in range(len(dense_types)):

[pytest : test]             search_params = {

[pytest : test]                 "data": cf.gen_vectors(nq, dim=dims[j], vector_data_type=dense_types[j]),

[pytest : test]                 "anns_field": dense_types[j],

[pytest : test]                 "param": {"params": cf.get_search_params_params(index_types[j])},

[pytest : test]                 "limit": limit,

[pytest : test]                 "expr": "int64 > 0"}

[pytest : test]             req = AnnSearchRequest(**search_params)

[pytest : test]             req_list.append(req)

[pytest : test]         # 4. hybrid search group by

[pytest : test]         import numpy as np

[pytest : test]         rank_scorers = ["max", "avg", "sum"]

[pytest : test]         for scorer in rank_scorers:

[pytest : test]             res = collection_w.hybrid_search(req_list, WeightedRanker(0.3, 0.3, 0.3), limit=limit,

[pytest : test]                                              group_by_field=ct.default_string_field_name,

[pytest : test]                                              group_size=group_size, rank_group_scorer=scorer,

[pytest : test]                                              output_fields=[ct.default_string_field_name])[0]

[pytest : test]             for i in range(nq):

[pytest : test]                 group_values = []

[pytest : test]                 for l in range(len(res[i])):

[pytest : test]                     group_values.append(res[i][l].fields.get(ct.default_string_field_name))

[pytest : test] >               assert len(set(group_values)) == limit

[pytest : test] E               AssertionError: assert 36 == 50

[pytest : test] E                +  where 36 = len({'11', '13', '17', '18', '19', '2', ...})

[pytest : test] E                +    where {'11', '13', '17', '18', '19', '2', ...} = set(['42', '74', '47', '53', '85', '93', ...])

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-36381/5/pipeline/94

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 20, 2024
@zhuwenxing
Copy link
Contributor Author

image
It appears to be a case with a relatively high failure probability, failing in both branches on this case.

@yanliang567
Copy link
Contributor

/assign @MrPresent-Han
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 22, 2024
@yanliang567
Copy link
Contributor

reproduced on master-20240922-bfd68cc0, which also reproduced a issue that hybrid search + group returns results with the same group value(no group size set)
img_v3_02f0_a95a7551-7ac4-47e5-a901-aac67e716a2g

MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 23, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 23, 2024
sre-ci-robot pushed a commit that referenced this issue Sep 23, 2024
related issue: #36407
1. add partial load tests
2. use new test class to share one collection for all grouping search
tests

Signed-off-by: yanliang567 <[email protected]>
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 23, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 23, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 23, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 24, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 24, 2024
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 24, 2024
sre-ci-robot pushed a commit that referenced this issue Sep 24, 2024
related: #36407

---------

Signed-off-by: MrPresent-Han <[email protected]>
Co-authored-by: MrPresent-Han <[email protected]>
MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this issue Sep 29, 2024
@MrPresent-Han
Copy link
Contributor

MrPresent-Han commented Oct 12, 2024

/unassign

@yanliang567
Copy link
Contributor

/assign @yanliang567

@yanliang567
Copy link
Contributor

working on verification

@yanliang567 yanliang567 added this to the 2.5.0 milestone Oct 16, 2024
@yanliang567
Copy link
Contributor

verified on master-20241015-f3b6792a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants