-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
View Presto row and array objects clearly in the data grid #7625
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khtruong I have some comments and questions. If you have time, it would be good to add some unit tests to the helper functions as well, to ensure they're doing what we expect.
actual_cols, actual_data, actual_expanded_cols = PrestoEngineSpec.expand_data( | ||
cols, data) | ||
expected_cols = [ | ||
{'name': 'row_column', 'type': 'ROW'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here. Why do you simplify the type, returning
{'name': 'row_column', 'type': 'ROW'}
instead of the original
{'name': 'row_column', 'type': 'ROW(NESTED_OBJ VARCHAR)'}
?
Why can't we simply compute all_columns
like this
all_columns = selected_columns + expanded_columns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, there are few reasons (although they may not be great reasons):
- type usually corresponds to class objects representing data types. We often just cast it to a string in our code base.
- The string parser I created awhile ago splits the data type string based on parenthesis and comma's so we'd have to create another helper method to maintain the full data type. (This is possible if we want to).
For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.
Ah, I see. Good point.
For the first question, ideally we'd pass the class objects, converting them to strings whenever needed. I'm ok with using strings, but losing the information here (going from 'ROW(NESTED_OBJ VARCHAR)
to simply ROW
) seems off. But maybe we can improve this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing!
array_column, | ||
array_column_hierarchy) | ||
continue | ||
array_data = expanded_array_data[0][array_column] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested a query that returns no rows? Eg:
SELECT nested_column FROM table WHERE 1=0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test case. I verified and it goes through the function because the query returns an empty array for columns and data.
I added a bunch more unit tests for most of the helper functions. Let me know if you want more. |
Codecov Report
@@ Coverage Diff @@
## master #7625 +/- ##
=========================================
+ Coverage 65.37% 65.58% +0.2%
=========================================
Files 435 435
Lines 21513 21687 +174
Branches 2378 2384 +6
=========================================
+ Hits 14064 14223 +159
- Misses 7329 7343 +14
- Partials 120 121 +1
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #7625 +/- ##
=========================================
+ Coverage 65.37% 65.58% +0.2%
=========================================
Files 435 435
Lines 21513 21687 +174
Branches 2378 2384 +6
=========================================
+ Hits 14064 14223 +159
- Misses 7329 7343 +14
- Partials 120 121 +1
Continue to review full report at Codecov.
|
Thanks for adding all the unit tests, @khtruong! |
CATEGORY
Choose one
SUMMARY
We do not immediately display Presto rows and arrays clearly in the data grid. This feature separates out nested fields and data values to help clearly display structural columns. Hopefully in the future, we can apply similar enhancements for other data sources.
BEFORE SCREENSHOTS
Here ColumnA is a row column that contains two nested fields 1) an integer field and 2) a string field. We cannot immediately tell what are the nested fields' names. If this was a more complicated column (i.e. The data type is row(nested_array1 array(row nested_field int, nested_array2 ...)))), then you can imagine how difficult it would be to read the data values.
AFTER SCREENSHOTS
Here we have pulled out the nested fields into its own columns. This is much more readable.
Here ColumnB is an array where data values are pulled out into subsequent rows.
(Note: Ignore the visual enhancements - grayed out columns and data values. We will be adding visual enhancements in a separate PR)
TEST PLAN
Tested manually and added unit tests
ADDITIONAL INFORMATION
REVIEWERS
@betodealmeida @DiggidyDave @xtinec @john-bodley @michellethomas @graceguo-supercat @mistercrunch