-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Around Mapper and Mapping #1939
Merged
jmazanec15
merged 5 commits into
opensearch-project:main
from
jmazanec15:field-mapper-ref
Aug 10, 2024
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
0e1cd5b
Refactor FieldMapping logic
jmazanec15 79e0a4b
Remove optional from dimension
jmazanec15 e22fcf8
Change ModelFieldMapper to initialize per method
jmazanec15 24365ac
Add back consumer legacy change
jmazanec15 8d06041
add bwc test
jmazanec15 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
97 changes: 97 additions & 0 deletions
97
src/main/java/org/opensearch/knn/index/mapper/FlatVectorFieldMapper.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.index.mapper; | ||
|
||
import org.apache.lucene.document.FieldType; | ||
import org.opensearch.Version; | ||
import org.opensearch.common.Explicit; | ||
import org.opensearch.knn.index.VectorDataType; | ||
|
||
import java.util.Map; | ||
import java.util.Optional; | ||
|
||
/** | ||
* Mapper used when you dont want to build an underlying KNN struct - you just want to | ||
* store vectors as doc values | ||
*/ | ||
public class FlatVectorFieldMapper extends KNNVectorFieldMapper { | ||
|
||
private final PerDimensionValidator perDimensionValidator; | ||
|
||
public static FlatVectorFieldMapper createFieldMapper( | ||
String fullname, | ||
String simpleName, | ||
Map<String, String> metaValue, | ||
VectorDataType vectorDataType, | ||
Integer dimension, | ||
MultiFields multiFields, | ||
CopyTo copyTo, | ||
Explicit<Boolean> ignoreMalformed, | ||
boolean stored, | ||
boolean hasDocValues, | ||
Version indexCreatedVersion | ||
) { | ||
final KNNVectorFieldType mappedFieldType = new KNNVectorFieldType(fullname, metaValue, vectorDataType, new KNNMappingConfig() { | ||
@Override | ||
public Optional<Integer> getDimension() { | ||
return Optional.of(dimension); | ||
} | ||
}); | ||
return new FlatVectorFieldMapper( | ||
simpleName, | ||
mappedFieldType, | ||
multiFields, | ||
copyTo, | ||
ignoreMalformed, | ||
stored, | ||
hasDocValues, | ||
indexCreatedVersion | ||
); | ||
} | ||
|
||
private FlatVectorFieldMapper( | ||
String simpleName, | ||
KNNVectorFieldType mappedFieldType, | ||
MultiFields multiFields, | ||
CopyTo copyTo, | ||
Explicit<Boolean> ignoreMalformed, | ||
boolean stored, | ||
boolean hasDocValues, | ||
Version indexCreatedVersion | ||
) { | ||
super(simpleName, mappedFieldType, multiFields, copyTo, ignoreMalformed, stored, hasDocValues, indexCreatedVersion, null); | ||
this.perDimensionValidator = selectPerDimensionValidator(vectorDataType); | ||
this.fieldType = new FieldType(KNNVectorFieldMapper.Defaults.FIELD_TYPE); | ||
this.fieldType.freeze(); | ||
} | ||
|
||
private PerDimensionValidator selectPerDimensionValidator(VectorDataType vectorDataType) { | ||
if (VectorDataType.BINARY == vectorDataType) { | ||
return PerDimensionValidator.DEFAULT_BIT_VALIDATOR; | ||
} | ||
|
||
if (VectorDataType.BYTE == vectorDataType) { | ||
return PerDimensionValidator.DEFAULT_BYTE_VALIDATOR; | ||
} | ||
|
||
return PerDimensionValidator.DEFAULT_FLOAT_VALIDATOR; | ||
} | ||
|
||
@Override | ||
protected VectorValidator getVectorValidator() { | ||
return VectorValidator.NOOP_VECTOR_VALIDATOR; | ||
} | ||
|
||
@Override | ||
protected PerDimensionValidator getPerDimensionValidator() { | ||
return perDimensionValidator; | ||
} | ||
|
||
@Override | ||
protected PerDimensionProcessor getPerDimensionProcessor() { | ||
return PerDimensionProcessor.NOOP_PROCESSOR; | ||
} | ||
} |
40 changes: 40 additions & 0 deletions
40
src/main/java/org/opensearch/knn/index/mapper/KNNMappingConfig.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.index.mapper; | ||
|
||
import org.opensearch.knn.index.engine.KNNMethodContext; | ||
|
||
import java.util.Optional; | ||
|
||
/** | ||
* Class holds information about how the ANN indices are created. The design of this class ensures that we do not | ||
* accidentally configure an index that has multiple ways it can be created. This class is immutable. | ||
*/ | ||
public interface KNNMappingConfig { | ||
/** | ||
* | ||
* @return Optional containing the modelId if created from model, otherwise empty | ||
*/ | ||
default Optional<String> getModelId() { | ||
return Optional.empty(); | ||
} | ||
|
||
/** | ||
* | ||
* @return Optional containing the KNNMethodContext if created from method, otherwise empty | ||
*/ | ||
default Optional<KNNMethodContext> getKnnMethodContext() { | ||
return Optional.empty(); | ||
} | ||
|
||
/** | ||
* | ||
* @return the dimension of the index; for model based indices, it will be null | ||
*/ | ||
default Optional<Integer> getDimension() { | ||
return Optional.empty(); | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are removing this code and legacy field mapper, I understand that a new MethodFieldMapper will be used here. But how this change is BWC? because if in a field
parametersString
is not added in the old segments then even when we have changed the mapper it will not impact the fieldInfo of old segments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the fieldInfo of old segments will not be re-used to create new segments. We will need to use the fieldInfo that is created for new segments. This fieldInfo will be setup using the MethodFieldMapper which will specify the paramsString. So, I dont see an issue with BWC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldInfo as per my understanding is at the Lucene index level(aka shard level) and not the lucene segment level. Please validate this, by doing a local test. If already done please let me know the steps, I will check on my side too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was referring to: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FieldInfo.java#L24-L29. FieldInfo is per-segment. I can look a little bit more into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. But can we do 1 validation to be 100% sure. because fieldInfo gets created per segment when we are searching. But during indexing are there checks which may prevent be triggered same field has different values.
Example, we cannot create a field with docvalues as false for 1 segment and for another segment make it true. So similarly can we check, for same field can we add 1 attribute and then next time we add another attribute.
Please ignore if we have done this check. Just want to ensure that we have handled/thought about all the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navneet1v
This case fails:
That being said, I think we probably need to leave the code as is in the KNN80DocValuesConsumer.
For field type/mapping, let me check which they use for initializing the merged segments fileinfo. If its always the latest one, then I think we can leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navneet1v I think this is the key point:
So, what we can conclude is that the field info attributes will be a Union of the merged fields. So, we just need to keep the legacy checks in KNN80DocValuesConsumer but we can keep Mapper layer as is in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aligned on keeping the check in KNN80DocValuesConsumer and remove the checks from mapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add that failure case in bwc test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @heemin32 added