Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where ingestion failed for input document containing list of nested objects #1040

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yizheliu-amazon
Copy link

Description

Fix bug where ingestion failed for input document containing list of nested objects

Related Issues

Resolves #1024

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@heemin32
Copy link
Collaborator

Can we have IT test for this?

int nestedElementIndex
) {
if (processorKey == null || sourceAndMetadataMap == null || sourceValue == null) return;
if (sourceValue instanceof Map) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't sourceValue always an instance of Map?

int nestedElementIndex
) {
if (processorKey == null || sourceAndMetadataMap == null || sourceValue == null) return;
if (sourceValue instanceof Map) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (sourceValue instanceof Map) {
assert sourceValue instanceof Map, "sourceValue should be an instance of Map"

@heemin32 heemin32 added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Dec 24, 2024
*/
Map<String, Object> child1Level2 = buildObjMapWithSingleField(CHILD_1_TEXT_FIELD, TEXT_VALUE_1);
Map<String, Object> child1Level1 = buildObjMapWithSingleField(CHILD_FIELD_LEVEL_1, child1Level2);
Map<String, Object> child2Level2 = buildObjMapWithSingleField(CHILD_1_TEXT_FIELD, TEXT_VALUE_1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this critical for test case to have all identical values for both nested fields? In real life scenario most of the times values will be different, can we edit this method or add a new test case with 2+ different fields?

List<Map<String, Object>> nestedElementList = (List<Map<String, Object>>) sourceAndMetadataMap.get(processorKey);

IntStream.range(0, nestedElementList.size()).forEach(nestedElementIndex -> {
Map<String, Object> nestedElement = nestedElementList.get(nestedElementIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about following version, get by index from list can be not optimal in case of some list implementations like linked list:

Iterator<Map<String, Object>> iterator = nestedElementList.iterator();
Stream.iterate(0, i -> i + 1)
    .limit(nestedElementList.size())
    .forEach(index -> {
        Map<String, Object> nestedElement = iterator.next();
        putNLPResultToSingleSourceMapInList(
            entryKey,
            entryValue,
            results,
            indexWrapper,
            nestedElement,
            index
        );
    });

}
} else if (inputNestedMapEntry.getValue() instanceof Map) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please refactor logic for each type in a separate method, this should make code cleaner:

    if (entryValue instanceof List) {
        processListTypeEntry(entryKey, (List<Object>) entryValue, processorKey, 
                           results, indexWrapper, sourceAndMetadataMap);
    } else if (entryValue instanceof Map) {
        processMapTypeEntry(entryKey, entryValue, processorKey, 
                          results, indexWrapper, sourceAndMetadataMap);
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have same method name.

if (entryValue instanceof List) { 
  processEntry(entryKey, (List<Object>) entryValue, processorKey, results, indexWrapper, sourceAndMetadataMap); 
} 
else if (entryValue instanceof Map) { 
  processEntry(entryKey, (Map<String, Object>) entryValue, processorKey, results, indexWrapper, sourceAndMetadataMap); 
}


private void processEntry(..., List<Object> entryValue, ...){...}
private void processEntry(..., Map<String, Object> entryValue, ...){...}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fail to ingest document with nested list into text_embedding processor
3 participants