Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Visualize complex Hive column types #1072

Closed
wants to merge 1 commit into from

Conversation

baumandm
Copy link
Contributor

@baumandm baumandm commented Nov 4, 2022

This change makes it easier to understand complex Hive types, including array<>, map<>, struct<>, and uniontype<>. It adds a parser that detects these types and converts them into a representative JSON object, which is then visualized using the react-json-view library added in #991.

Currently it only works on these Hive types. If it's too specific/niche to merge that's fine, we'll maintain it internally on our side. Just wanted to throw it out there in case it would be useful to anyone else.

Screen Shot 2022-11-04 at 4 25 55 PM Screen Shot 2022-11-04 at 4 19 28 PM

Screen Shot 2022-11-04 at 3 57 14 PM

Screen Shot 2022-11-04 at 4 18 58 PM

Copy link
Collaborator

@jczhong84 jczhong84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not sure if the JSON format is the best way to help visualize complex hive data types. Json works well for the struct and array type, but not for the map and union types I think. People may get confused what's the actual type is. An alternative way I'm thinking is: how about just simply format the origin type string with indents, e.g.
struct<
  date:struct<
    year:int,
    month:int,
    day:int>,
  hour:int,
  minute:int,
  second:int,
  timeZoneId:string
>

so that when people are seeing types like map or uniontype, they will go and check what it is.

  • Also I just realized that some really big type string may get truncated. It has a max length of 255 or 256 in our hive metastore. e.g. map<string,struct<userid:bigint,matchtypeused:struct<value:int>,matchtypetouseridmap:map<int,bigint>,restrictedusereason:array<struct<value:int>>,isactivated:boolean,experimentname:string,experimentgroup:string,isoptin:boolean,useridtoconfidencescoremap:m
    not sure how you will handle this case?

json: any;
}

export const Json: React.FC<IProps & Partial<ReactJsonViewProps>> = ({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about renaming the component as JsonViewer, to differentiate from the built-in JSON object.

@@ -57,6 +62,13 @@ export const DataTableColumnCard: React.FunctionComponent<IProps> = ({
{column.comment}
</KeyContentDisplay>
)}
{parsedType !== column.type && (
<>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need the <> </>? it only has a single child

timeZoneId: 'string',
}
*/
export function parseType(type: string): any | string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be cases of

  • uppercase, e.g. STRUCT
  • space between, e.g. struct <

and what happens if the parsing fails?

@baumandm
Copy link
Contributor Author

Thanks for the feedback, I agree JSON isn't the most clear way to visualize some of these types.

We're using another product that uses a nested column approach for complex types, but it is also confusing when looking at map<> and array<> types. Based on the feedback from our users, I think they prefer a confusing visualization over just the raw text.

The advantages were that it was relatively easy to implement and reused the same JSON viewer as the results, providing some consistency for users. It also is collapsible which is nice for extremely nested types.

I can experiment with different approaches and share some additional options.

Re: the type length limit, I noticed this as well and was going to bring it up later. The parser degrades as well as it can, but ideally we'd increase the max length. We too have a significant number of columns synced with truncated types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants