Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject non-string keys #1004

Merged
merged 5 commits into from
Oct 16, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions src/main/java/com/pinterest/secor/util/orc/VectorColumnFiller.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
import java.nio.charset.StandardCharsets;
import java.sql.Timestamp;
import java.util.List;
import java.util.Set;

/**
*
Expand Down Expand Up @@ -166,13 +165,28 @@ public void convert(JsonElement value, ColumnVector vect, int row) {
}

static class MapColumnConverter implements JsonConverter {
private JsonConverter[] childrenConverters;
private JsonConverter[] childConverters;

public MapColumnConverter(TypeDescription schema) {
List<TypeDescription> kids = schema.getChildren();
childrenConverters = new JsonConverter[kids.size()];
for (int c = 0; c < childrenConverters.length; ++c) {
childrenConverters[c] = createConverter(kids.get(c));
assertKeyType(schema);

List<TypeDescription> childTypes = schema.getChildren();
childConverters = new JsonConverter[childTypes.size()];
for (int c = 0; c < childConverters.length; ++c) {
childConverters[c] = createConverter(childTypes.get(c));
}
}

/**
* Rejects non-string keys. This is a limitation imposed by JSON specifications that only allows strings
* as keys.
*/
private void assertKeyType(TypeDescription schema) {
TypeDescription keyType = schema.getChildren().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it guarantee that schema has Children? And what are each children schema representing?

Copy link
Contributor Author

@suminb suminb Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we have an ORC schema like map<string,int> then the first child is a TypeDescription instance representing string type. Not sure if schema is guaranteed to have at least one child. We might want to test with a malformed ORC schema like map<>. It may or may not reach this point when such schema is given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure the passed-in 'schema' argument really represents the map schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure the passed-in 'schema' argument really represents the map schema

It certainly wouldn't hurt to do so, but I feel like this is somewhat redundant as it is already ensured by VectorColumnFilter.createConverter(). Its very logic doesn't leave any room for other types to fall into MapColumnConverter.

    public static JsonConverter createConverter(TypeDescription schema) {
        switch (schema.getCategory()) {

        // ...

        case MAP:
            return new MapColumnConverter(schema);
        default:
            throw new IllegalArgumentException("Unhandled type " + schema);
        }
    }

If you still insist doing so, I would suggest to add some common type checking logic in JsonConverter interface or in VectorColumnFiller class so that other types could also benefit from it. Any thoughts on this?

String keyTypeName = keyType.getCategory().getName();
if (!keyTypeName.equals("string")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be case insensitive comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point. Not sure if ORC schema is case sensitive. But it would be a good idea to perform case insensitive comparison here just to be safe.

Copy link
Contributor Author

@suminb suminb Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested what happens when ORC schema is defined with upper-case letters (e.g., map<INT,int>). When reached at MapColumnConverter, the schema shows the schema information as map<int,int>. Something in the middle automatically converts it to lower-case letters. However, I'll stick with a case-insensitive comparison, as you suggested, for extra safety.

throw new IllegalArgumentException(
String.format("Unsupported key type: %s", keyTypeName));
}
}

Expand All @@ -192,8 +206,8 @@ public void convert(JsonElement value, ColumnVector vect, int row) {

int i = 0;
for (String key : obj.keySet()) {
childrenConverters[0].convert(new JsonPrimitive(key), vector.keys, (int) vector.offsets[row] + i);
childrenConverters[1].convert(obj.get(key), vector.values, (int) vector.offsets[row] + i);
childConverters[0].convert(new JsonPrimitive(key), vector.keys, (int) vector.offsets[row] + i);
childConverters[1].convert(obj.get(key), vector.values, (int) vector.offsets[row] + i);
i++;
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,4 +204,20 @@ public void testWithLargeKeySet() throws Exception {
}
fileReader.close();
}

@Test(expected = IllegalArgumentException.class)
public void testWithNonStringKeys() throws Exception {
PropertiesConfiguration properties = new PropertiesConfiguration();
properties.setProperty("secor.orc.schema.provider", DEFAULT_ORC_SCHEMA_PROVIDER);
properties.setProperty("secor.orc.message.schema.test-nonstring-keys", "struct<kvs:map<int\\,int>>");

SecorConfig config = new SecorConfig(properties);
JsonORCFileReaderWriterFactory factory = new JsonORCFileReaderWriterFactory(config);
LogFilePath tempLogFilePath = getTempLogFilePath("test-nonstring-keys");

FileWriter fileWriter = factory.BuildFileWriter(tempLogFilePath, codec);
KeyValue written1 = new KeyValue(90001, "{\"kvs\":{1:2,3:4}}".getBytes());
fileWriter.write(written1);
fileWriter.close();
}
}