Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject non-string keys #1004

Merged
merged 5 commits into from
Oct 16, 2019
Merged

Conversation

suminb
Copy link
Contributor

@suminb suminb commented Oct 14, 2019

As far as I understand, JSON only allows string keys. However, the current implementation of MapColumnConverter allows arbitrary types as map keys which potentially cause issues for JsonConverter. This PR adds a safeguard that rejects non-string map keys.

* as keys.
*/
private void assertKeyType(TypeDescription schema) {
TypeDescription keyType = schema.getChildren().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it guarantee that schema has Children? And what are each children schema representing?

Copy link
Contributor Author

@suminb suminb Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we have an ORC schema like map<string,int> then the first child is a TypeDescription instance representing string type. Not sure if schema is guaranteed to have at least one child. We might want to test with a malformed ORC schema like map<>. It may or may not reach this point when such schema is given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure the passed-in 'schema' argument really represents the map schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure the passed-in 'schema' argument really represents the map schema

It certainly wouldn't hurt to do so, but I feel like this is somewhat redundant as it is already ensured by VectorColumnFilter.createConverter(). Its very logic doesn't leave any room for other types to fall into MapColumnConverter.

    public static JsonConverter createConverter(TypeDescription schema) {
        switch (schema.getCategory()) {

        // ...

        case MAP:
            return new MapColumnConverter(schema);
        default:
            throw new IllegalArgumentException("Unhandled type " + schema);
        }
    }

If you still insist doing so, I would suggest to add some common type checking logic in JsonConverter interface or in VectorColumnFiller class so that other types could also benefit from it. Any thoughts on this?

private void assertKeyType(TypeDescription schema) {
TypeDescription keyType = schema.getChildren().get(0);
String keyTypeName = keyType.getCategory().getName();
if (!keyTypeName.equals("string")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be case insensitive comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point. Not sure if ORC schema is case sensitive. But it would be a good idea to perform case insensitive comparison here just to be safe.

Copy link
Contributor Author

@suminb suminb Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested what happens when ORC schema is defined with upper-case letters (e.g., map<INT,int>). When reached at MapColumnConverter, the schema shows the schema information as map<int,int>. Something in the middle automatically converts it to lower-case letters. However, I'll stick with a case-insensitive comparison, as you suggested, for extra safety.

@suminb
Copy link
Contributor Author

suminb commented Oct 15, 2019

Thanks for the review. I’ll come up with a better way to handle these concerns mentioned above and revise code accordingly.

@HenryCaiHaiying HenryCaiHaiying merged commit 7490d87 into pinterest:master Oct 16, 2019
@HenryCaiHaiying
Copy link
Contributor

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants