-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat . as a nested field in field_map of text embedding processor #488
Treat . as a nested field in field_map of text embedding processor #488
Conversation
Signed-off-by: Sanjana679 <[email protected]>
…ing processor Signed-off-by: Sanjana679 <[email protected]>
a428d30
to
2c7f491
Compare
how many nested levels are we going to support? We need to state that if it's more than 2 levels, e.g. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #488 +/- ##
============================================
- Coverage 84.65% 84.10% -0.55%
+ Complexity 508 498 -10
============================================
Files 40 40
Lines 1505 1497 -8
Branches 234 229 -5
============================================
- Hits 1274 1259 -15
- Misses 128 137 +9
+ Partials 103 101 -2 ☔ View full report in Codecov by Sentry. |
@@ -154,6 +154,16 @@ Map<String, Object> buildMapWithProcessorKeyAndOriginalValue(IngestDocument inge | |||
for (Map.Entry<String, Object> fieldMapEntry : fieldMap.entrySet()) { | |||
String originalKey = fieldMapEntry.getKey(); | |||
Object targetKey = fieldMapEntry.getValue(); | |||
|
|||
int nestedDotIndex = originalKey.indexOf('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user-provided info, can we add basic validation if it's not already done as part of the processor/pipeline definition.
if multiple levels of nested fields are needed this code may need a rework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some basic validation done in validateEmbeddingConfiguration
which is run on fieldMap
in the constructor in this file. However, I will add some extra validation for nested fields.
|
||
int nestedDotIndex = originalKey.indexOf('.'); | ||
if (nestedDotIndex != -1) { | ||
Map<String, Object> temp = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this map object? Can you please use more meaningful name for map variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a more meaningful name for the map variable.
Signed-off-by: Sanjana679 <[email protected]>
Signed-off-by: Sanjana679 <[email protected]>
@Sanjana679 thanks for raising the PR. Can you please add output for this case.
Also I don't see any integration test added, can we add the integration tests. |
@Sanjana679 can you also run |
@Sanjana679 I can within the diff there are lines which are coming because of the different formatter you might have in your local. Please fix that too and make sure that the lines which are actually modified are coming in the diff. |
if (nestedDotIndex != -1) { | ||
Map<String, Object> newTargetKey = new LinkedHashMap<>(); | ||
newTargetKey.put(originalKey.substring(nestedDotIndex + 1), targetKey); | ||
targetKey = newTargetKey; | ||
|
||
originalKey = originalKey.substring(0, nestedDotIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sanjana679 can you please provide details how we are handling multiple level of nesting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, I'm not currently handling multiple levels of nesting, as I initially thought it was only for one level. However, I will work on handling multiple levels of nesting.
@Sanjana679 can you also see with your changes this issue will be resolved or not: https://forum.opensearch.org/t/need-neural-search-plugin-to-support-nested-field-type-array-of-objects/16760 ? |
…ocessor-nested-fields
At the moment, my changes don't resolve this issue but if I can easily resolve this issue I will work on it. |
I checked the question the way user is using is not correct. can we check if the processor is created like this below:
will it work or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add tests to cover the new logic described here for nested fields ingestion?
super(tag, description); | ||
this.type = type; | ||
if (StringUtils.isBlank(modelId)) throw new IllegalArgumentException("model_id is null or empty, cannot process it"); | ||
if (StringUtils.isBlank(modelId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did the formatting change? did you run ./gradlew :spotlessApply
prior?
@@ -154,9 +164,20 @@ Map<String, Object> buildMapWithProcessorKeyAndOriginalValue(IngestDocument inge | |||
for (Map.Entry<String, Object> fieldMapEntry : fieldMap.entrySet()) { | |||
String originalKey = fieldMapEntry.getKey(); | |||
Object targetKey = fieldMapEntry.getValue(); | |||
|
|||
int nestedDotIndex = originalKey.indexOf('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use a static constant and avoid magic characters in code.
e.g.
private static final char FIELD_SEPARATOR = '.';
String parentKey, | ||
Object processorKey, | ||
Map<String, Object> sourceAndMetadataMap, | ||
Map<String, Object> treeRes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these indents.
nestedFieldMapEntry.getKey(), | ||
nestedFieldMapEntry.getValue(), | ||
(Map<String, Object>) sourceAndMetadataMap.get(parentKey), | ||
next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep )
consistent
@@ -211,18 +233,21 @@ private void validateEmbeddingFieldsValue(IngestDocument ingestDocument) { | |||
private void validateNestedTypeValue(String sourceKey, Object sourceValue, Supplier<Integer> maxDepthSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace Supplier<Integer>
with a simple int
?
.stream() | ||
.filter(Objects::nonNull) | ||
.forEach(x -> validateNestedTypeValue(sourceKey, x, () -> maxDepth + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these indents
implemented same logic in #841, we can close this PR |
Description
These changes will allow embeddings to be computed when using a nested source field for a text embedding processor as shown in the configuration below:
Issues Resolved
#110
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.