Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add schema validation and placeholders to index mappings #3240

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

pyek-bot
Copy link
Contributor

@pyek-bot pyek-bot commented Nov 27, 2024

Description

Added schema validation and placeholders to index mappings

Related Issues

Resolves #2951

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…ums rather than use their own mappings

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval November 27, 2024 01:23 — with GitHub Actions Inactive
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval November 27, 2024 01:23 — with GitHub Actions Inactive
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval November 27, 2024 03:31 — with GitHub Actions Inactive
@pyek-bot
Copy link
Contributor Author

Please add 2.x backport label

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval December 4, 2024 03:12 — with GitHub Actions Failure
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval December 4, 2024 03:12 — with GitHub Actions Inactive
@pyek-bot pyek-bot requested a review from Zhangxunmt December 4, 2024 20:53
@pyek-bot
Copy link
Contributor Author

pyek-bot commented Dec 6, 2024

Addressed comments, please re-review. Thanks!

@Zhangxunmt @ylwu-amzn

Zhangxunmt
Zhangxunmt previously approved these changes Dec 9, 2024
}

return mapping;
}

/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this seems not a standard java doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it into a proper comment for now via latest commit, i will modify to java doc standard with * in a later pr, i don't want to dismiss the previous approval with a new push again, if thats okay

@@ -54,6 +63,8 @@ public class StringUtils {
}
public static final String TO_STRING_FUNCTION_NAME = ".toString()";

private static final ObjectMapper MAPPER = new ObjectMapper();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it thread-safe to define it as singleton ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the instance is not being modified anywhere, only methods are used. It is used similarly in RestActionUtils

Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval December 13, 2024 16:07 — with GitHub Actions Failure
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval December 13, 2024 16:07 — with GitHub Actions Inactive
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval December 13, 2024 17:39 — with GitHub Actions Inactive
public static String getMappingFromFile(String path) throws IOException {
URL url = IndexUtils.class.getClassLoader().getResource(path);
if (url == null) {
throw new IOException("Resource not found: " + path);
}

String mapping = Resources.toString(url, Charsets.UTF_8).trim();
if (mapping.isEmpty() || !StringUtils.isJson(mapping)) {
throw new IllegalArgumentException("Invalid or non-JSON mapping at: " + path);
if (mapping.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this check here considering we are going to check mapping.isBlank() in the replacePlaceholders method. Seems like redundant.

}

String placeholderMapping = Resources.toString(url, Charsets.UTF_8);
mapping = mapping.replace(placeholder.getKey(), placeholderMapping);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are creating string in every replace. May be we could use a StringBuilder to have in-place replacement?

May be something like this:

public static String replacePlaceholders(String mapping) throws IOException {
    if (mapping == null || mapping.isBlank()) {
        throw new IllegalArgumentException("Mapping cannot be null or empty");
    }

    // Preload resources into memory to avoid redundant I/O
    Map<String, String> loadedPlaceholders = new HashMap<>();
    for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
        URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
        if (url == null) {
            throw new IOException("Resource not found: " + placeholder.getValue());
        }
        // Load and cache the content
        loadedPlaceholders.put(placeholder.getKey(), Resources.toString(url, Charsets.UTF_8));
    }

    // Use StringBuilder for efficient in-place replacements
    StringBuilder result = new StringBuilder(mapping);
    for (Map.Entry<String, String> entry : loadedPlaceholders.entrySet()) {
        String placeholder = entry.getKey();
        String replacement = entry.getValue();

        // Replace all occurrences of the placeholder
        int index;
        while ((index = result.indexOf(placeholder)) != -1) {
            result.replace(index, index + placeholder.length(), replacement);
        }
    }

    return result.toString();
}

@@ -336,4 +347,25 @@ public static JsonObject getJsonObjectFromString(String jsonString) {
return JsonParser.parseString(jsonString).getAsJsonObject();
}

public static void validateSchema(String schemaString, String instanceString) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add try catch block to catch any exceptions like:

catch (JsonProcessingException e) {
        throw new IllegalArgumentException("Invalid JSON format: " + e.getMessage(), e);
    } catch (Exception e) {
        throw new OpenSearchParseException("Schema validation failed: " + e.getMessage(), e);
    }


// Validate JSON node against the schema
Set<ValidationMessage> errors = schema.validate(jsonNode);
if (!errors.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this:

if (!errors.isEmpty()) {
            String errorMessage = errors.stream()
                                        .map(ValidationMessage::getMessage)
                                        .collect(Collectors.joining(", "));
            throw new OpenSearchParseException(
                "Validation failed: " + errorMessage + 
                " for instance: " + instanceString + 
                " with schema: " + schemaString
            );
        }

@@ -1,6 +1,6 @@
{
"_meta": {
"schema_version": "1"
"schema_version": 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, I see that JSON files in the repository are named using snake_case. Let's follow the same convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Use index mapping json file
4 participants