Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fail to ingest document with nested list into text_embedding processor #1024

Open
yizheliu-amazon opened this issue Dec 17, 2024 · 1 comment · May be fixed by #1040
Open

[BUG] Fail to ingest document with nested list into text_embedding processor #1024

yizheliu-amazon opened this issue Dec 17, 2024 · 1 comment · May be fixed by #1040
Assignees
Labels
bug Something isn't working

Comments

@yizheliu-amazon
Copy link

yizheliu-amazon commented Dec 17, 2024

What is the bug?

When ingesting document with nested list into text_embedding processor, ingestion would fail with class_cast_exception

How can one reproduce the bug?

  1. Create ingest pipeline
PUT /_ingest/pipeline/nlp-ingest-pipeline-v4

{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "H7tN0ZMBSNRyUB1vcEi2",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}
  1. Simulate
POST /_ingest/pipeline/nlp-ingest-pipeline-v4/_simulate

{
    "docs": [
        {
            "_index": "neural-search-index-v2",
            "_id": "1",
            "_source": {
                "category": [
                    {
                        "name": {
                            "en": "this is 1st name"
                        }
                    },
                    {
                        "name": {
                            "en": "this is 2nd name"
                        }
                    }
                ]
            }
        }
    ]
}
  1. Result
{
    "docs": [
        {
            "error": {
                "root_cause": [
                    {
                        "type": "class_cast_exception",
                        "reason": "class java.util.LinkedHashMap cannot be cast to class java.util.List (java.util.LinkedHashMap and java.util.List are in module java.base of loader 'bootstrap')"
                    }
                ],
                "type": "class_cast_exception",
                "reason": "class java.util.LinkedHashMap cannot be cast to class java.util.List (java.util.LinkedHashMap and java.util.List are in module java.base of loader 'bootstrap')"
            }
        }
    ]
}

Exception is thrown at this line

It seems to be introduced by PR #913 .

What is the expected behavior?

Expected result should be like below:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": {
                                "category_name_vector": [
                                    -0.10758455,
                                    0.07971476,
                                    -0.04948872,
                                    ...
                                ],
                                "en": "this is 1st name"
                            }
                            
                        },
                        {
                            "name": {
                                "name": [
                                    -0.034477253,
                                    0.031023245,
                                    0.006734962,
                                    ...
                                ],
                                "en": "this is 2nd name"
                            }
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

What is your host/environment?

MacOS 13.7.1 (22H221)
Neural Search version: based on top of this commit of main branch

Do you have any screenshots?

N/A

Do you have any additional context?

N/A

@yizheliu-amazon
Copy link
Author

@heemin32 Hi Heemin, can you help assign this to me? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment