-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug where document embedding fails to be generated due to document has dot in field name #1062
Merged
heemin32
merged 2 commits into
opensearch-project:main
from
yizheliu-amazon:issue-1042-fix-dot
Jan 8, 2025
Merged
Fix bug where document embedding fails to be generated due to document has dot in field name #1062
heemin32
merged 2 commits into
opensearch-project:main
from
yizheliu-amazon:issue-1042-fix-dot
Jan 8, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yizheliu-amazon
requested review from
heemin32,
navneet1v,
VijayanB,
vamshin,
jmazanec15,
naveentatikonda,
junqiu-lei,
martin-gaievski,
sean-zheng-amazon,
model-collapse,
zane-neo,
vibrantvarun,
zhichao-aws,
yuye-aws and
minalsha
as code owners
January 6, 2025 18:47
heemin32
added
the
backport 2.x
Label will add auto workflow to backport PR to 2.x branch
label
Jan 7, 2025
heemin32
reviewed
Jan 7, 2025
@@ -80,4 +83,150 @@ public void test_with_different_configurations() throws URISyntaxException, IOEx | |||
} | |||
} | |||
|
|||
public void test_unflatten_simple_dot_notation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the test naming consistent with others?
testUnflatten_whenSimpleDotNotation_thenSucceed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will do.
will-hwang
reviewed
Jan 7, 2025
src/main/java/org/opensearch/neuralsearch/processor/InferenceProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/util/ProcessorDocumentUtilsTests.java
Outdated
Show resolved
Hide resolved
…t has dot in field name Signed-off-by: Yizhe Liu <[email protected]>
yizheliu-amazon
force-pushed
the
issue-1042-fix-dot
branch
from
January 8, 2025 00:54
58bcad0
to
89aabe5
Compare
Signed-off-by: Yizhe Liu <[email protected]>
heemin32
approved these changes
Jan 8, 2025
junqiu-lei
approved these changes
Jan 8, 2025
will-hwang
approved these changes
Jan 8, 2025
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Jan 8, 2025
…t has dot in field name (#1062) * Fix bug where document embedding fails to be generated due to document has dot in field name Signed-off-by: Yizhe Liu <[email protected]> * Address comments Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]> (cherry picked from commit 5b9f43b)
q-andy
pushed a commit
to q-andy/neural-search
that referenced
this pull request
Jan 8, 2025
…t has dot in field name (opensearch-project#1062) * Fix bug where document embedding fails to be generated due to document has dot in field name Signed-off-by: Yizhe Liu <[email protected]> * Address comments Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]>
5 tasks
yizheliu-amazon
added a commit
to yizheliu-amazon/neural-search
that referenced
this pull request
Jan 10, 2025
…t has dot in field name (opensearch-project#1062) * Fix bug where document embedding fails to be generated due to document has dot in field name Signed-off-by: Yizhe Liu <[email protected]> * Address comments Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]>
junqiu-lei
pushed a commit
that referenced
this pull request
Jan 10, 2025
… due to document has dot in field name (#1076) * Fix bug where document embedding fails to be generated due to document has dot in field name (#1062) * Fix bug where document embedding fails to be generated due to document has dot in field name Signed-off-by: Yizhe Liu <[email protected]> * Address comments Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]> * Clean up unused validateFieldName() and use existing methods for TextEmbeddingProcessorIT (#1074) Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]>
martin-gaievski
pushed a commit
that referenced
this pull request
Jan 10, 2025
…t has dot in field name (#1062) * Fix bug where document embedding fails to be generated due to document has dot in field name Signed-off-by: Yizhe Liu <[email protected]> * Address comments Signed-off-by: Yizhe Liu <[email protected]> --------- Signed-off-by: Yizhe Liu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fix bug where document embedding fails to be generated due to document has dot in field name, which does not match field mapping exactly
Related Issues
Resolves #1042
Root cause
Such issue is caused by that we fail to unbox/unflatten field in ingested doc, then it causes mismatch between nested field in fieldMap config and ingested document schema.
For example, nested field can be represented as either {"a.b": "c"}, or {"a": {"b": "c"}}. If fieldMap is {"a": {"b": "b_embedding"}}, but ingested document is like {"a.b": "c"}, such schema mismatch fieldMap, and then we are not able to fetch "c" as value for inferencing.
About the change
Basically, the solution is to unbox/unflatten the field with dot from ingested doc.
Before this change, given fieldMap is below
Simulate ingestion for below doc, there is no
level_3_container.level_3_embedding
generated for it.After this change, embedding can be generated
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.