Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mode and compression based parameter support #1

Open
wants to merge 4 commits into
base: disk-staging-base
Choose a base branch
from

Conversation

jmazanec15
Copy link
Owner

@jmazanec15 jmazanec15 commented Aug 30, 2024

Description

Iteration 1

This PR is a work in progress, but want to get feedback on overall structure, so sharing now. It does not have tests at the moment, but Im working on adding those.

Specifically, I want to make sure that the structure does not have any fundamental limitations around backwards compatibility and/or functionality. Also, recommendations on naming of classes or patterns or packaging would also be appreciated. Also, any recommendations on overall null handling strategy would also be appreciated. Im going to clean up a lot of things, like exception strings, so dont worry as much about that.

This change adds mode and compression parameters so that we can take user hints about their workload and then dynamically adjust the index. In order to make this change, significant changes around the structuring of index configuration were required.

First, I modified the current logic to follow a more distinct separation between user provided configuration and internal configuration we use to create and search the index. The process of translating from one to other is referred to as "resolving". As part of this, I made KNNMethodContext and MethodCOmponentContext not updateable - so - once built they are supposed to be immutable. As part of this, I had to make some modifications around the streaming before and after 2.17 (there was a bug I fixed here).

Then, for the mapping, in order to prevent mapping merge issues, I created a new class called UserProvidedParameters that stores the parsed parameters from the mapping directly. Then, after parsing in the KNNVectorFieldMapper class, I added logic to go from UserProvidedParameters -> ResolvedRequiredParametersResolvedRequiredParameters. ResolvedRequiredParameters are the parameters that first need to be resolved before passing to the KNNLibrary to resolve. For example, this includes filling in values like vectordatatype and spacetype (if not set) and also selects the KNNEngine based on the mode and compression. After this, the ResolvedRequiredParameters are used to resolve a KNNIndexContext that contains all information (including defaults) that are used to construct the index. For validation, the parsed parameterts are initially validated independently and then validation happens during the recursive resolve process. Finally, after the KNNIndexContext is created, it is used to configure the KnnVectorFieldType. The KnnVectorFieldType is then the main access point else where in the plugin for resolving information about the field. For instance, on search the KnnVectorFieldType is used to check if the input method parameters are valid.

Training follows a similar path as the field mapping.

Importantly, the KNNIndexContext is not persisted. Thus, for backwards compatibility, if we do need to change the configuration, we need to use the version of the created resource. This is available in the ResolvedRequiredParameters.

Internally, in the resolve process, for a given configuration, the defaults are filled in from the top down. For now, I think this will be okay, but may want to consider more of a 2 phase approach in the future. During resolution, the KNNIndexContext continually gets updated. On these lines, I discussed with Tejas and he mentioned that we should potentially switch to passing a builder into this method to ensure it does not get updated after its resolved. I want to get more feedback on this if this should be required.

Along with all of these, there are some changes to make sure that it is manageable.

Iteration 2 (Current)

I changed up the validation logic. I removed UserProvidedParameters, ResolvedRequiredParameters, KNNIndexContext and replaced them with KNNLibraryIndexConfig, KNNLibraryIndex, KNNLibraryIndex.Builder, and OriginalMappingParameters. The KNNLibraryIndex is created via a recursive process where the libraries can update and adjust settings of the KNNLibraryIndex.Builder as needed.

Initial commit for mode and compression based parameter support. Everything
compiles but the tests are not in a good state and will need to be revisited in
a later commit, once the dust settles.

Overall, the change includes a major shift towards resolving user
provided parameters to internal objects. This includes allowing
KNNMethodContext and MethodComponentContext to be null as well as
modifying the library logic into a resolution strategy.

For mode and compression, it adds 2 new parameters for either configuring
an index or configuring a model: mode and compression.

After this, the majority of the code that needs to know about the method
configuration needs to go through the field type. More tidying up to
come in the future.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15
Copy link
Owner Author

Here are some of the important classes in the different areas:

For resolution

  1. KNNLibrary - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-119d4797a87cfb1655c674dc306642120cee217b18ffc2f6f9ada0868b2950a9
  2. KNNMethod - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-ba298e9c4df4e1949d6c197fb0b7387551d8be09431d325c71f82ac04a7692e7
  3. MethodComponent - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-b216aba36b43b23c721c5151d25d9a89b7d607ad7bd53390278d64f5fee764f8
  4. Parameter - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-563b5a4b7fc99d89143330381516f620e892e50d995d78582ca149f6beac7b04
  5. AbstractKNNLibrary - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-10b712f3d46ddaf97bd627828707cba4ab16f44f575ad37afba0b3de5e2e9746
  6. AbstactKNNMethod - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-3311ba4861f79a86b493ffac9db1045b9b343be00995a3366daa1146d7fd9e6b
  7. AbstractFaissMethod - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-a57560848385e9cd34701f86e98b701ea3225eb66320cd8016ebe0c22ba3819e
  8. QFrameBitEncoder - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-2c1678938545a561cc31906945a6687f4fa6293b03b7459e67111fdf04549476
  9. KNNMethodContext - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-73b059157d0ff8520927509a30a2071a2bae958b1855f929e9d122c40d54794a
  10. MethodComponentContext - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-96d4739c3220d2fc260011aae7e7a6ce927fc04f7e53f0932985b08fd2776d2c
  11. ResolvedRequiredParameters - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-7a01880d970e95e96c6be4d0ea4f8c5db6532e6f2385f8d6ad58809dab133fd3

For mapping

  1. KNNVectorFieldMapper - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-c6d56dded611534c5be8f8afdba6bf320f7892d47465deee44bd6390f2bd1f12
  2. KNNEngineResolver - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-f00247c5db706066b1c07fdb2505a1890221bb08732b40873e067bb3f7cfbaad
  3. MethodFieldMapper - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-b406c9fdd24ad72cabbdbb75c76f41816a1165718c8b94b7351b3c62b63e0ab8
  4. ModelFieldMapper - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-ff51b91046efd3fe776b386cbf5d5a39d3be2d716b5408d669d64e0d5f42d72d
  5. KNNVectorFieldType - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-4d6ecd16c762a05810dfc69bf08a0ff6e77eb3e13efaa30e883627136d1cba38

For usages of configuration

  1. KNNQueryBuilder - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-6d0df17543181185170f65c116cdb3514d910e1557f012b7959611ae92678080
  2. NativeIndexWriter -
  3. BasePerFieldVectorFormat- https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-60cad16fb7fef0302c7ca18486b1be7a366fd53f49511ae4674042c775edd9c8
  4. NativeIndexWriter - https://github.com/jmazanec15/k-NN-1/pull/1/files#diff-75a39a358693703da64ecd93181897b1e20ad25f951f0f3813148c21257a999c

if (vResolved == null) {
vResolved = FAISS_SQ_ENCODER_FP16;
}
if (FAISS_SQ_ENCODER_FP16.equals(vResolved) == false && context.getPerDimensionProcessor() == CLIP_TO_FP16_PROCESSOR) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a redundant check as you are already validating it below for clipping

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because it will kind of depend in what order the parameters are parsed and figured its better to do it twice.

if (vResolved
&& context.getLibraryParameters() != null
&& context.getLibraryParameters().get(FAISS_SQ_TYPE) != FAISS_SQ_ENCODER_FP16) {
return ValidationUtil.chainValidationErrors(null, "Clip only supported for FP16 encoder. IMPROVE");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see IMPROVE keyword in error messages. But, just adding a comment to remind that we need to use existing constants.

if (LUCENE_SQ_BITS_SUPPORTED.contains(v)) {
return null;
}
return ValidationUtil.chainValidationErrors(null, "Invalid confidence interval. IMPROVE");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid bits

Signed-off-by: John Mazanec <[email protected]>
Copy link

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on it. Will be putting out comments iteratively

  • Please reconsider some of classes in the packages. A lot of things seem to be under engine. Would be good to make sub packages or move under index (or under appropriate package)


@AllArgsConstructor
@Getter
public enum CompressionConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compression since these are user requested values and not a configuration

import lombok.AllArgsConstructor;
import lombok.Getter;

@AllArgsConstructor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @RequiredArgsConstructor to eliminate possibility of default 0 for any new value


public static CompressionConfig fromString(String name) {
if (name == null || name.equals("NA")) {
return NOT_CONFIGURED;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder values are generally not recommended, as they can be misused easily. Any specific reason for adding one? may be consider return optional or null and let client handle the behavior if not found

@AllArgsConstructor
@Getter
public enum WorkloadModeConfig {
NOT_CONFIGURED("NA"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please reconsider place holder


@AllArgsConstructor
@Getter
public enum WorkloadModeConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't want to see us deviate from whats in the request parameter name. Lets rename this to Mode maybe to avoid any confusion

IN_MEMORY(MODE_IN_MEMORY_NAME),
ON_DISK(MODE_ON_DISK_NAME);

public static final WorkloadModeConfig DEFAULT = IN_MEMORY;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rethink if this is needed

*/
@AllArgsConstructor
@Getter
public final class UserProvidedParameters {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name seems a bit generic. Consider MappingRequest or IndexMappingRequest

* Class provides the context to build an index for ANN search. All configuration is resolved before c
* construction and
*/
public final class KNNIndexContext {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem like the right place. Consider moving it to index.model or index.params

* @param settings Settings for the index; passing null will mean that it is not possible to resolve for the legacy
* @param createdVersion version that this was created for
*/
public ResolvedRequiredParameters(UserProvidedParameters originalParameters, Settings settings, Version createdVersion) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are places in the code where we have to add logic in constructor, seems like we can avoid it here. In general logic in constructor is code smell

Consider a ParameterResolver class which has a function that takes these parameters and returns this object. This will make this pojo light weight. Not super urgent

* Resolved parameters required for constructing a {@link KNNIndexContext}. If any of these parameters can be null,
* then their getters need to be wrapped in an {@link java.util.Optional}
*/
public final class ResolvedRequiredParameters {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving this in index.model or index.params

public static final String MODE_PARAMETER = "mode";
public static final String COMPRESSION_PARAMETER = "compression";
public static final String MODE_IN_MEMORY_NAME = "in_memory";
public static final String MODE_ON_DISK_NAME = "on_disk";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make it as enum?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an enum. I will move out of the constants

// TODO: Is there any way we could avoid resolving it like this?
KNNIndexContext knnIndexContext = ModelUtil.getKnnMethodContextFromModelMetadata(model.getModelID(), model.getModelMetadata());
if (knnIndexContext != null && knnIndexContext.getLibraryParameters().containsKey(VECTOR_DATA_TYPE_FIELD)) {
IndexUtil.updateVectorDataTypeToParameters(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see if we can remove IndexUtil and just call parameter.put(KNNConstants...

IndexUtil.updateVectorDataTypeToParameters(parameters, model.getModelMetadata().getVectorDataType());

// TODO: Is there any way we could avoid resolving it like this?
KNNIndexContext knnIndexContext = ModelUtil.getKnnMethodContextFromModelMetadata(model.getModelID(), model.getModelMetadata());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncomfortable when more and more static methods are being introduced.
Can model have a method to generate vector data type?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im going to move this to be generated directly by ModelMetadata on construction.

KNNMethod knnMethod = methods.get(methodName);
ValidationException validationException = knnMethod.resolveKNNIndexContext(knnIndexContext);
if (shouldTrain != knnIndexContext.isTrainingRequired()) {
validationException = ValidationUtil.chainValidationErrors(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have a builder class instead of static method. ValidationUtil.chainValidationErrors was not intuitive to guess what it does and I had to look inside the method.

builder.addException("error")
builder.addException(null)
builder.build(); // Return null if there is not exception message.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Let me change to this.

if (knnMethodContext == null) {
return;
protected String resolveMethod(KNNIndexContext knnIndexContext) {
KNNMethodContext knnMethodContext = knnIndexContext.getResolvedRequiredParameters().getKnnMethodContext().orElse(null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use orElse(null), I think there is no benefit of using optional.

Optional<KNNMethodContext> knnMethodContext = knnIndexContext.getResolvedRequiredParameters().getKnnMethodContext();
if (knnMethodContext.isPresent() && knnMethodContext.getMethodComponentContext().getName().isPresent()) {
  return knnMethodContext.getMethodComponentContext().getName().get();
}
return doResolveMethod(knnIndexContext);

protected final Map<String, KNNMethod> methods;
@Getter
protected final String version;

@Override
public KNNLibrarySearchContext getKNNLibrarySearchContext(String methodName) {
public ValidationException resolveKNNIndexContext(KNNIndexContext knnIndexContext, boolean shouldTrain) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method have mix of resolving and validation which is confusing.
Hard to understand which value should be pre populated in knnIndexContext and which value will be resolved.

If we want to validate all values and return the entire list of violation, we should separate validation from resolving.

If we can throw exception for whenever we see the violation, maybe, validation could happen while resolving certain value.

If my understanding is correct, value get resolved like UserProvidedParameters -> ResolvedRequiredParameters -> KNNIndexContext.

I think this is more clear if we can code it like

UserProvidedPrameters userProvidedParameters = new UserProvidedParameters();
ResolvedRequiredParameters userProvidedParameters = new ResolvedRequiredParameters(userProvidedParameters);
KNNIndexContext userProvidedParameters = new KNNIndexContext(userProvidedParameters);

return null;
protected abstract String doResolveMethod(KNNIndexContext knnIndexContext);

private String validateSpaceType(KNNIndexContext knnIndexContext) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both validateSpaceType and validateDimension only needs value from KNNIndexContext itself.
Can the validation logic be inside KNNIndexContext? Better if it gets validated during instance build time.

* @param settings Settings for the index; passing null will mean that it is not possible to resolve for the legacy
* @param createdVersion version that this was created for
*/
public ResolvedRequiredParameters(UserProvidedParameters originalParameters, Settings settings, Version createdVersion) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just holding UserProvidedParameters instead of copying to private variable?

this.createdVersion = Objects.requireNonNull(createdVersion, "createdVersion must be set for ResolvedRequiredParameters");
}

public KNNIndexContext resolveKNNIndexContext(boolean shouldTrain) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating resolvedRequiredParameters from UserProvidedParameters is okay.
However, when we crate KNNIndexContext, it is not resolving any value it seem.
Then, the KNNIndexContext is pass around multiple methods to resolve some of the value inside it. Can it be like providing everything upfront to KNNIndexContext and let KNNIndexContext to figure out how to resolve remaining values?

Letting all other methods to resolve part of KNNIndexContext makes it hard to understand the resolving logic and flow because it scattered across multiple parts. For example, which value gets resolved first and which value is needed to be resolved first to resolve certain value. Where and when certain value get resolved.

Signed-off-by: John Mazanec <[email protected]>
return knnMethodConfigContext.getDimension();
}
}
() -> KNNVectorFieldType.KNNVectorFieldTypeConfig.builder()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So out of curiosity, is the mapped field type persisted as part of index information? I imaging it would be else the user would have to have a mapping request everytime the the system crashes and comes up

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its persisted via the source of the mapping. What happens is the source gets parsed into a field mapper and field type when an index is created.

@@ -25,6 +25,8 @@ public final class RescoreContext {
@Builder.Default
private float oversampleFactor = DEFAULT_OVERSAMPLE_FACTOR;

public static final RescoreContext DISABLED_RESCORE_CONTEXT = RescoreContext.builder().oversampleFactor(0).build();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oversampling as 0 sounds a bit risky, any bug introduced with code changes might end up having k = 0. Should we just have a default value?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point. Let me address this.

@@ -255,7 +258,18 @@ private Map<String, Object> getTemplateParameters(FieldInfo fieldInfo, Model mod
parameters.put(KNNConstants.INDEX_THREAD_QTY, KNNSettings.state().getSettingValue(KNNSettings.KNN_ALGO_PARAM_INDEX_THREAD_QTY));
parameters.put(KNNConstants.MODEL_ID, fieldInfo.attributes().get(MODEL_ID));
parameters.put(KNNConstants.MODEL_BLOB_PARAMETER, model.getModelBlob());
IndexUtil.updateVectorDataTypeToParameters(parameters, model.getModelMetadata().getVectorDataType());

// TODO: Is there any way we could avoid resolving it like this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be imagine this method to be simply like this , Should be consider taking everything required for this method from KnnIndexContext as it suggests to have all context which is required for Indexing including IndexThread Quantity . What we have to do is to have a method which takes feildInfo and gives us IndexCotext required for Indexing? or Even gives us Map Directly .

private final String modelId;
private final String mode;
private final String compressionLevel;
private final KNNMethodContext knnMethodContext;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this Class Suggests , it's User providedParameters , It would be easy to have flatten architecture and the compute whatever we want from flatten like computing MethodContext out of it in this class ,,

import lombok.AllArgsConstructor;
import lombok.Getter;

@AllArgsConstructor
@Getter
public final class UserProvidedParameters {
    private final Integer dimension;
    private final String mode;
    private final String compressionLevel;
    private final String methodName;
    private final String engine;
    private final String spaceType;

    // Flattened individual parameters
    private final Integer efConstruction;
    private final Integer m;

    // Computed field
    private final KNNMethodContext knnMethodContext;

    // Constructor that calculates the KNNMethodContext based on the provided fields
    public UserProvidedParameters(
        Integer dimension,
        String mode,
        String compressionLevel,
        String methodName,
        String engine,
        String spaceType,
        Integer efConstruction,
        Integer m
    ) {
        this.dimension = dimension;
        this.mode = mode;
        this.compressionLevel = compressionLevel;
        this.methodName = methodName;
        this.engine = engine;
        this.spaceType = spaceType;
        this.efConstruction = efConstruction;
        this.m = m;

        // Calculate the KNNMethodContext based on the provided fields
        this.knnMethodContext = new KNNMethodContext(
            this.methodName,
            this.engine,
            this.spaceType,
            Map.of(
                "ef_construction", this.efConstruction,
                "m", this.m
            )
        );
    }
}

@@ -663,26 +594,15 @@ Optional<float[]> getFloatsFromContext(ParseContext context, int dimension) thro

@Override
public ParametrizedFieldMapper.Builder getMergeBuilder() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find any usage of getMergeBuilder , where it is getting used?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in opensearch its an override. handled with mapping updates and also serialization. Basically a mapper is either constructed via parsing or from merging


// Helper class used to validate builder before build is called. Needs to be invoked in 2 places: during
// parsing and during merge.
private static class BuilderValidator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this class in Mapper folder as package private , just to clean this file.

@@ -193,7 +193,7 @@ private static int getEfConstruction(Settings indexSettings, Version indexVersio
return Integer.parseInt(efConstruction);
}

static KNNMethodContext createKNNMethodContextFromLegacy(Settings indexSettings, Version indexCreatedVersion) {
public static KNNMethodContext createKNNMethodContextFromLegacy(Settings indexSettings, Version indexCreatedVersion) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though we are replacing MethodContext with IndexContext ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see IndexContext is also having engine and SpaceType , Can we completely get rid of MethodContext Class,It's confusing?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to ensure that there was a difference between a top level and a component. I think we could potentially get rid of it but might hold on for now.

}

Optional<KNNIndexContext> getKNNIndexContext() {
KNNVectorFieldTypeConfig knnVectorFieldTypeConfig = cachedKNNVectorFieldTypeConfig.getKnnVectorFieldTypeConfig();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really don't have to do whole if else , if we can make optional returnbable from cachedKNNVectorFieldTypeConfig

}

@RequiredArgsConstructor
private static class CachedKNNVectorFieldTypeConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class seems reduntant , why can't we have suppiler as part of KNNVectorFieldTypeConfig class it self , and that can as wholesome class for caching and noncaching cases.


import org.opensearch.common.ValidationException;

public final class ValidationUtil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use @UtilClass , which make this class by default final. and we can get rid of Util suffix

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that the current status of UtilClass getting out of experimental was negative (https://projectlombok.org/features/experimental/UtilityClass). That being said, might hold off.

PR changes a lot of the resolution logic and does some renaming.

Signed-off-by: John Mazanec <[email protected]>
Copy link

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't have much of the comments. Apart from these below:

  1. Quite few classes have been added and logics on the streams are also updated. Please add java doc on these classes around what is the scope of these classes.
  2. Going forward lets try to see how we can avoid such large changes in 1 PR. This is really not scalable and let me think that we are completely rewriting the code.
  3. Packages like engine are way too many classes. Please avoid that.
  4. **Context classes are just growing too many in the code. Are those really context? or just parameters to do something.
  5. May be once this PR is merged, lets have a readme which tells the whole flow of indexing and mappers how different constructs are made like OriginalMappingParameters and when does a specific validator is called, when encoders are called etc.

Having said that, I don't see above as blockers for this code.

Comment on lines +25 to 29
/**
* Vector data type represents the type used to build the library index. If something like binary quantization is
* done, then this will be different from the vector data type the user provides
*/
VectorDataType vectorDataType;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be better to name this as indexVectorDataType

Comment on lines +31 to 32
Map<String, Object> sqEncoderParams = encoderMethodComponentContext.getParameters().orElse(null);
this.initConfidenceInterval(sqEncoderParams);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is optional and we are returning null, does other parts of the code that depends on sqEncoderParams like initConfidenceInterval etc handling the nulls correctly?

Comment on lines +197 to 202
if (knnIndexContext != null) {
QuantizationConfig quantizationConfig = knnIndexContext.getQuantizationConfig();
if (quantizationConfig != null && quantizationConfig != QuantizationConfig.EMPTY) {
this.fieldType.putAttribute(QFRAMEWORK_CONFIG, QuantizationConfigParser.toCsv(quantizationConfig));
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we ever hit the case in ModelFieldMapper with QF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants