Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add schema validation and placeholders to index mappings #3240

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
18eedf6
feat(index mappings): fetch mappings and version from json file inste…
pyek-bot Oct 23, 2024
7068729
refactor: changing exception being thrown
pyek-bot Oct 23, 2024
abd4b25
chore: remove unused file
pyek-bot Oct 23, 2024
d2f3c0e
chore: fix typo in comment
pyek-bot Oct 23, 2024
fa81560
chore: adding new line at the end of files
pyek-bot Oct 23, 2024
63a6d62
feat: add test cases
pyek-bot Oct 24, 2024
8cbaf01
fix: remove test code
pyek-bot Oct 24, 2024
3423026
fix(test): in main the versions were not updated appropriately
pyek-bot Oct 24, 2024
67a1810
refactor: move mapping templates under common module
pyek-bot Oct 25, 2024
63bc70d
refactor: ensure that conversationindexconstants reference mlindex en…
pyek-bot Oct 29, 2024
88673ed
refactor: update comment
pyek-bot Oct 29, 2024
409fbd9
feat: add enhancements to validate index schema and allow using place…
pyek-bot Oct 30, 2024
9776787
Merge branch 'fetch_index_mappings_from_files' into enhancements_fetc…
pyek-bot Oct 30, 2024
af4d1d0
refactor: modifying comment
pyek-bot Oct 30, 2024
b383b48
test: adding testcase for MLIndex to catch failures before runtime
pyek-bot Oct 30, 2024
56d945b
refactor: rename dir from mappings to index-mappings
pyek-bot Oct 31, 2024
4ea99ea
fix: add null checks
pyek-bot Oct 31, 2024
6bef762
fix conflicts and merge parent
pyek-bot Oct 31, 2024
ba9d232
fix: modify mappin paths for placeholders
pyek-bot Oct 31, 2024
ce72e53
fix: adding dependencies for testing
pyek-bot Nov 2, 2024
e714c35
fix(test): compare json object rather than strings to avoid eol chara…
pyek-bot Nov 5, 2024
d50a9f9
Merge branch 'main' into fetch_index_mappings_from_files
pyek-bot Nov 25, 2024
86d2b76
refactor: combine if statements into single check
pyek-bot Nov 26, 2024
1cd16b2
refactoring: null handling + clean code
pyek-bot Nov 26, 2024
369475d
spotless apply
pyek-bot Nov 26, 2024
313b70d
merge parent and fix conflicts
pyek-bot Nov 26, 2024
d8390ca
tests: adding more UT
pyek-bot Nov 26, 2024
cba2c88
fix: dependencies to handle jarhell
pyek-bot Nov 26, 2024
d7b19ea
merge main and fix conflicts
pyek-bot Nov 27, 2024
2884b50
spotless apply
pyek-bot Nov 27, 2024
3e45f6e
refactor: add header and use single instance of mapper
pyek-bot Dec 4, 2024
399b892
fixed: doc syntax
pyek-bot Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions common/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ dependencies {
exclude group: 'com.google.guava', module: 'listenablefuture'
}
compileOnly 'com.jayway.jsonpath:json-path:2.9.0'
compileOnly("com.fasterxml.jackson.core:jackson-annotations:${versions.jackson}")
compileOnly("com.fasterxml.jackson.core:jackson-databind:${versions.jackson_databind}")
compileOnly group: 'com.networknt' , name: 'json-schema-validator', version: '1.4.0'
pyek-bot marked this conversation as resolved.
Show resolved Hide resolved
}

lombok {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

package org.opensearch.ml.common.utils;

import static org.opensearch.ml.common.utils.StringUtils.validateSchema;

import java.io.IOException;
import java.net.URL;
import java.util.Map;
Expand Down Expand Up @@ -40,20 +42,80 @@ public class IndexUtils {
public static final Map<String, Object> UPDATED_DEFAULT_INDEX_SETTINGS = Map.of("index.auto_expand_replicas", "0-1");
public static final Map<String, Object> UPDATED_ALL_NODES_REPLICA_INDEX_SETTINGS = Map.of("index.auto_expand_replicas", "0-all");

// Schema that validates system index mappings
public static final String MAPPING_SCHEMA_PATH = "index-mappings/schema.json";

// Placeholders to use within the json mapping files
private static final String USER_PLACEHOLDER = "USER_MAPPING_PLACEHOLDER";
private static final String CONNECTOR_PLACEHOLDER = "CONNECTOR_MAPPING_PLACEHOLDER";
public static final Map<String, String> MAPPING_PLACEHOLDERS = Map
.of(USER_PLACEHOLDER, "index-mappings/placeholders/user.json", CONNECTOR_PLACEHOLDER, "index-mappings/placeholders/connector.json");

public static String getMappingFromFile(String path) throws IOException {
URL url = IndexUtils.class.getClassLoader().getResource(path);
if (url == null) {
throw new IOException("Resource not found: " + path);
}

String mapping = Resources.toString(url, Charsets.UTF_8).trim();
if (mapping.isEmpty() || !StringUtils.isJson(mapping)) {
throw new IllegalArgumentException("Invalid or non-JSON mapping at: " + path);
if (mapping.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this check here considering we are going to check mapping.isBlank() in the replacePlaceholders method. Seems like redundant.

throw new IllegalArgumentException("Empty mapping found at: " + path);
}

mapping = replacePlaceholders(mapping);
validateMapping(mapping);

return mapping;
}

public static String replacePlaceholders(String mapping) throws IOException {
if (mapping == null || mapping.isBlank()) {
throw new IllegalArgumentException("Mapping cannot be null or empty");
}

for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
if (url == null) {
throw new IOException("Resource not found: " + placeholder.getValue());
}

String placeholderMapping = Resources.toString(url, Charsets.UTF_8);
mapping = mapping.replace(placeholder.getKey(), placeholderMapping);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are creating string in every replace. May be we could use a StringBuilder to have in-place replacement?

May be something like this:

public static String replacePlaceholders(String mapping) throws IOException {
    if (mapping == null || mapping.isBlank()) {
        throw new IllegalArgumentException("Mapping cannot be null or empty");
    }

    // Preload resources into memory to avoid redundant I/O
    Map<String, String> loadedPlaceholders = new HashMap<>();
    for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
        URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
        if (url == null) {
            throw new IOException("Resource not found: " + placeholder.getValue());
        }
        // Load and cache the content
        loadedPlaceholders.put(placeholder.getKey(), Resources.toString(url, Charsets.UTF_8));
    }

    // Use StringBuilder for efficient in-place replacements
    StringBuilder result = new StringBuilder(mapping);
    for (Map.Entry<String, String> entry : loadedPlaceholders.entrySet()) {
        String placeholder = entry.getKey();
        String replacement = entry.getValue();

        // Replace all occurrences of the placeholder
        int index;
        while ((index = result.indexOf(placeholder)) != -1) {
            result.replace(index, index + placeholder.length(), replacement);
        }
    }

    return result.toString();
}

}

return mapping;
}

/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this seems not a standard java doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it into a proper comment for now via latest commit, i will modify to java doc standard with * in a later pr, i don't want to dismiss the previous approval with a new push again, if thats okay

- Checks if mapping is a valid json
- Validates mapping against a schema found in mappings/schema.json
- Schema validates the following:
- Below fields are present:
- "_meta"
- "_meta.schema_version"
- "properties"
- No additional fields at root level
- No additional fields in "_meta" object
- "properties" is an object type
- "_meta" is an object type
- "_meta_.schema_version" provided type is integer

Note: Validation can be made more strict if a specific schema is defined for each index.
*/
public static void validateMapping(String mapping) throws IOException {
if (mapping.isBlank() || !StringUtils.isJson(mapping)) {
throw new IllegalArgumentException("Invalid or non-JSON mapping found: " + mapping);
}

URL url = IndexUtils.class.getClassLoader().getResource(MAPPING_SCHEMA_PATH);
if (url == null) {
throw new IOException("Resource not found: " + MAPPING_SCHEMA_PATH);
}

String schema = Resources.toString(url, Charsets.UTF_8);
validateSchema(schema, mapping);
}

public static Integer getVersionFromMapping(String mapping) {
if (mapping == null || mapping.isBlank()) {
throw new IllegalArgumentException("Mapping cannot be null or empty");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@

package org.opensearch.ml.common.utils;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.security.AccessController;
import java.security.PrivilegedActionException;
import java.security.PrivilegedExceptionAction;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
Expand All @@ -23,13 +25,20 @@
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.opensearch.OpenSearchParseException;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.google.gson.JsonSyntaxException;
import com.jayway.jsonpath.JsonPath;
import com.networknt.schema.JsonSchema;
import com.networknt.schema.JsonSchemaFactory;
import com.networknt.schema.SpecVersion;
import com.networknt.schema.ValidationMessage;

import lombok.extern.log4j.Log4j2;

Expand Down Expand Up @@ -336,4 +345,26 @@ public static JsonObject getJsonObjectFromString(String jsonString) {
return JsonParser.parseString(jsonString).getAsJsonObject();
}

public static void validateSchema(String schemaString, String instanceString) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add try catch block to catch any exceptions like:

catch (JsonProcessingException e) {
        throw new IllegalArgumentException("Invalid JSON format: " + e.getMessage(), e);
    } catch (Exception e) {
        throw new OpenSearchParseException("Schema validation failed: " + e.getMessage(), e);
    }

ObjectMapper mapper = new ObjectMapper();
pyek-bot marked this conversation as resolved.
Show resolved Hide resolved
// parse the schema JSON as string
JsonNode schemaNode = mapper.readTree(schemaString);
JsonSchema schema = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012).getSchema(schemaNode);

// JSON data to validate
JsonNode jsonNode = mapper.readTree(instanceString);

// Validate JSON node against the schema
Set<ValidationMessage> errors = schema.validate(jsonNode);
if (!errors.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this:

if (!errors.isEmpty()) {
            String errorMessage = errors.stream()
                                        .map(ValidationMessage::getMessage)
                                        .collect(Collectors.joining(", "));
            throw new OpenSearchParseException(
                "Validation failed: " + errorMessage + 
                " for instance: " + instanceString + 
                " with schema: " + schemaString
            );
        }

throw new OpenSearchParseException(
pyek-bot marked this conversation as resolved.
Show resolved Hide resolved
"Validation failed: "
+ Arrays.toString(errors.toArray(new ValidationMessage[0]))
+ " for instance: "
+ instanceString
+ " with schema: "
+ schemaString
);
}
}
}
74 changes: 2 additions & 72 deletions common/src/main/resources/index-mappings/ml-model.json
Original file line number Diff line number Diff line change
Expand Up @@ -167,77 +167,7 @@
}
}
},
"connector": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"version": {
"type": "keyword"
},
"description": {
"type": "text"
},
"protocol": {
"type": "keyword"
},
"parameters": {
"type": "flat_object"
},
"credential": {
"type": "flat_object"
},
"client_config": {
"type": "flat_object"
},
"actions": {
"type": "flat_object"
}
}
},
"user": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"backend_roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"custom_attribute_names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
"connector": CONNECTOR_MAPPING_PLACEHOLDER,
"user": USER_MAPPING_PLACEHOLDER
}
}
39 changes: 1 addition & 38 deletions common/src/main/resources/index-mappings/ml-task.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,43 +44,6 @@
"remote_job": {
"type": "flat_object"
},
"user": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"backend_roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"custom_attribute_names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
"user": USER_MAPPING_PLACEHOLDER
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"version": {
"type": "keyword"
},
"description": {
"type": "text"
},
"protocol": {
"type": "keyword"
},
"parameters": {
"type": "flat_object"
},
"credential": {
"type": "flat_object"
},
"client_config": {
"type": "flat_object"
},
"actions": {
"type": "flat_object"
}
}
}
38 changes: 38 additions & 0 deletions common/src/main/resources/index-mappings/placeholders/user.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"backend_roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"roles": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"custom_attribute_names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
25 changes: 25 additions & 0 deletions common/src/main/resources/index-mappings/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"type": "object",
"properties": {
"_meta": {
"type": "object",
"properties": {
"schema_version": {
"type": "integer"
}
},
"required": [
"schema_version"
],
"additionalProperties": false
},
"properties": {
"type": "object"
}
},
"required": [
"_meta",
"properties"
],
"additionalProperties": false
}
Loading
Loading