Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [null & default] bulk insert failed when insert "None" to the int64 field using the parquet file #36252

Closed
1 task done
binbinlv opened this issue Sep 13, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@binbinlv
Copy link
Contributor

binbinlv commented Sep 13, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240913-375cb44b
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc78
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

bulk insert failed when insert "None" using the parquet file for the int64 field

[2024-09-13 17:17:39 - DEBUG - ci_test]: (api_request)  : [do_bulk_insert] args: ['bulk_insert_9eDBYu9R', ['parquet-ac595aa7-aa9a-4506-b18c-771ea85f4c10/data-fields-12-rows-2000-dim-128-file-num-1-error-none-1726219058.parquet'], None, None, 'default'], kwargs: {} (api_request.py:62)
[2024-09-13 17:17:39 - DEBUG - ci_test]: (api_response) : 452516079621945355  (api_request.py:37)
[2024-09-13 17:17:40 - INFO - ci_test]: after bulk load, there are 1 working tasks (utility_wrapper.py:34)
[2024-09-13 17:17:40 - INFO - ci_test]: wait bulk load timeout is 300 (utility_wrapper.py:111)
[2024-09-13 17:17:40 - INFO - ci_test]: before waiting, there are 0 pending tasks (utility_wrapper.py:113)
[2024-09-13 17:17:42 - DEBUG - ci_test]: (api_request)  : [get_bulk_insert_state] args: [452516079621945355, 300, 'default'], kwargs: {} (api_request.py:62)
[2024-09-13 17:17:42 - DEBUG - ci_test]: (api_response) : <Bulk insert state:
    - taskID          : 452516079621945355,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': "schema not equal, err=field 'int_scalar' type mis-match, milvus data type 'Int64', arrow data type get 'null': importing data failed: ......  (api_request.py:37)
[2024-09-13 17:17:42 - INFO - ci_test]: after waiting, there are 0 pending tasks (utility_wrapper.py:148)
[2024-09-13 17:17:42 - INFO - ci_test]: task state distribution: {'success': set(), 'failed': {452516079621945355}, 'in_progress': set()} (utility_wrapper.py:149)
[2024-09-13 17:17:42 - INFO - ci_test]: {452516079621945355: <Bulk insert state:
    - taskID          : 452516079621945355,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': "schema not equal, err=field 'int_scalar' type mis-match, milvus data type 'Int64', arrow data type get 'null': importing data failed: importing data failed", 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2024-09-13 17:17:39
>} (utility_wrapper.py:150)
[2024-09-13 17:17:42 - INFO - ci_test]: wait for bulk load tasks completed failed, cost time: 2.0613701343536377 (utility_wrapper.py:155)
[2024-09-13 17:17:42 - INFO - ci_test]: bulk insert state:False in 3.2008728981018066 with states:{452516079621945355: <Bulk insert state:
    - taskID          : 452516079621945355,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': "schema not equal, err=field 'int_scalar' type mis-match, milvus data type 'Int64', arrow data type get 'null': importing data failed: importing data failed", 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2024-09-13 17:17:39
>} (test_bulk_insert.py:1116)

Expected Behavior

bulk inserted successfully

Steps To Reproduce

  1. created collection with nullable=True set on int64 field:

  2. generate the dataframe including "None" field


[2024-09-13 17:17:38 - INFO - ci_test]: df:
     int_scalar  ...                                              $meta
0          None  ...  {"0": 0, "name": "Crystal Burke", "address": "...
1          None  ...  {"1": 1, "name": "Catherine King", "address": ...
2          None  ...  {"2": 2, "name": "Karen Sandoval", "address": ...
3          None  ...  {"3": 3, "name": "Elizabeth Williams", "addres...
4          None  ...  {"4": 4, "name": "Bridget Rivera", "address": ...
...         ...  ...                                                ...
1995       None  ...  {"1995": 1995, "name": "William Whitehead", "a...
1996       None  ...  {"1996": 1996, "name": "Curtis Andersen", "add...
1997       None  ...  {"1997": 1997, "name": "Paul Osborne", "addres...
1998       None  ...  {"1998": 1998, "name": "Jeremy Anderson", "add...
1999       None  ...  {"1999": 1999, "name": "Bailey Acosta", "addre...

[2000 rows x 13 columns] (bulk_insert_data.py:936)
  1. generate parquet file from the above dataframe:
df.to_parquet(f"{data_source_new}/{file_name}", engine='pyarrow')
  1. bulk insert the parquet file

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22JJZ%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-null-test-bgkie.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

@binbinlv binbinlv added kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 13, 2024
@binbinlv binbinlv added this to the 2.5.0 milestone Sep 13, 2024
sre-ci-robot pushed a commit that referenced this issue Sep 19, 2024
#36252 
remove no need type check. if users use null type writer to write
parquet, hope it successfully.

Signed-off-by: lixinguo <[email protected]>
Co-authored-by: lixinguo <[email protected]>
@smellthemoon
Copy link
Contributor

pr merged, could you please help to verify? @binbinlv

@binbinlv
Copy link
Contributor Author

/assign @binbinlv

@binbinlv
Copy link
Contributor Author

Verified and fixed.
pymilvus: 2.5.0rc80
milvus: master-20240920-6e430bd4

results:

    - taskID          : 452675705375461453,
    - state           : Completed,
    - row_count       : 2000,
    - infos           : {'failed_reason': '', 'progress_percent': '100'},
    - id_ranges       : [],
    - create_ts       : 2024-09-23 19:14:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants