View Presto row and array objects clearly in the data grid #7625

khtruong · 2019-05-30T21:49:53Z

SUMMARY

We do not immediately display Presto rows and arrays clearly in the data grid. This feature separates out nested fields and data values to help clearly display structural columns. Hopefully in the future, we can apply similar enhancements for other data sources.

BEFORE SCREENSHOTS

Here ColumnA is a row column that contains two nested fields 1) an integer field and 2) a string field. We cannot immediately tell what are the nested fields' names. If this was a more complicated column (i.e. The data type is row(nested_array1 array(row nested_field int, nested_array2 ...)))), then you can imagine how difficult it would be to read the data values.

AFTER SCREENSHOTS

Here we have pulled out the nested fields into its own columns. This is much more readable.

Here ColumnB is an array where data values are pulled out into subsequent rows.

(Note: Ignore the visual enhancements - grayed out columns and data values. We will be adding visual enhancements in a separate PR)

TEST PLAN

Tested manually and added unit tests

ADDITIONAL INFORMATION

REVIEWERS

@betodealmeida @DiggidyDave @xtinec @john-bodley @michellethomas @graceguo-supercat @mistercrunch

betodealmeida

@khtruong I have some comments and questions. If you have time, it would be good to add some unit tests to the helper functions as well, to ensure they're doing what we expect.

superset/sql_lab.py

betodealmeida · 2019-05-30T22:57:50Z

tests/db_engine_specs_test.py

+        actual_cols, actual_data, actual_expanded_cols = PrestoEngineSpec.expand_data(
+            cols, data)
+        expected_cols = [
+            {'name': 'row_column', 'type': 'ROW'},


I'm confused here. Why do you simplify the type, returning

{'name': 'row_column', 'type': 'ROW'}

instead of the original

{'name': 'row_column', 'type': 'ROW(NESTED_OBJ VARCHAR)'}

?

Why can't we simply compute all_columns like this

all_columns = selected_columns + expanded_columns

?

For the first question, there are few reasons (although they may not be great reasons):

type usually corresponds to class objects representing data types. We often just cast it to a string in our code base.

The string parser I created awhile ago splits the data type string based on parenthesis and comma's so we'd have to create another helper method to maintain the full data type. (This is possible if we want to).

For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.

For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.

Ah, I see. Good point.

For the first question, ideally we'd pass the class objects, converting them to strings whenever needed. I'm ok with using strings, but losing the information here (going from 'ROW(NESTED_OBJ VARCHAR) to simply ROW) seems off. But maybe we can improve this later?

Sure thing!

superset/db_engine_specs.py

betodealmeida · 2019-05-30T23:25:21Z

superset/db_engine_specs.py

+                                             array_column,
+                                             array_column_hierarchy)
+                    continue
+                array_data = expanded_array_data[0][array_column]


Have you tested a query that returns no rows? Eg:

SELECT nested_column FROM table WHERE 1=0

?

Great test case. I verified and it goes through the function because the query returns an empty array for columns and data.

superset/db_engine_specs.py

khtruong · 2019-05-31T17:01:57Z

@khtruong I have some comments and questions. If you have time, it would be good to add some unit tests to the helper functions as well, to ensure they're doing what we expect.

I added a bunch more unit tests for most of the helper functions. Let me know if you want more.

codecov-io · 2019-05-31T18:10:40Z

Codecov Report

Merging #7625 into master will increase coverage by 0.2%.
The diff coverage is 91.71%.

@@            Coverage Diff            @@
##           master    #7625     +/-   ##
=========================================
+ Coverage   65.37%   65.58%   +0.2%     
=========================================
  Files         435      435             
  Lines       21513    21687    +174     
  Branches     2378     2384      +6     
=========================================
+ Hits        14064    14223    +159     
- Misses       7329     7343     +14     
- Partials      120      121      +1

Impacted Files	Coverage Δ
...ets/src/SqlLab/components/ExploreResultsButton.jsx	`72% <100%> (ø)`	⬆️
superset/sql_lab.py	`76.75% <100%> (+0.38%)`	⬆️
superset/db_engine_specs.py	`67.11% <91.42%> (+4.38%)`	⬆️
...src/components/FilterableTable/FilterableTable.jsx	`89.13% <0%> (-1.89%)`	⬇️
...uperset/assets/src/SqlLab/components/ResultSet.jsx	`79.54% <0%> (-0.46%)`	⬇️
superset/common/query_object.py	`26.31% <0%> (ø)`	⬆️
superset/viz.py	`71.95% <0%> (+0.01%)`	⬆️
superset/utils/core.py	`88.23% <0%> (+0.02%)`	⬆️
superset/config.py	`94.01% <0%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbdb6b0...13c3e6e. Read the comment docs.

codecov-io · 2019-05-31T18:10:40Z

Codecov Report

Merging #7625 into master will increase coverage by 0.2%.
The diff coverage is 91.71%.

@@            Coverage Diff            @@
##           master    #7625     +/-   ##
=========================================
+ Coverage   65.37%   65.58%   +0.2%     
=========================================
  Files         435      435             
  Lines       21513    21687    +174     
  Branches     2378     2384      +6     
=========================================
+ Hits        14064    14223    +159     
- Misses       7329     7343     +14     
- Partials      120      121      +1

Impacted Files	Coverage Δ
...ets/src/SqlLab/components/ExploreResultsButton.jsx	`72% <100%> (ø)`	⬆️
superset/sql_lab.py	`76.75% <100%> (+0.38%)`	⬆️
superset/db_engine_specs.py	`67.11% <91.42%> (+4.38%)`	⬆️
...src/components/FilterableTable/FilterableTable.jsx	`89.13% <0%> (-1.89%)`	⬇️
...uperset/assets/src/SqlLab/components/ResultSet.jsx	`79.54% <0%> (-0.46%)`	⬇️
superset/common/query_object.py	`26.31% <0%> (ø)`	⬆️
superset/viz.py	`71.95% <0%> (+0.01%)`	⬆️
superset/utils/core.py	`88.23% <0%> (+0.02%)`	⬆️
superset/config.py	`94.01% <0%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbdb6b0...13c3e6e. Read the comment docs.

betodealmeida · 2019-05-31T18:25:03Z

Thanks for adding all the unit tests, @khtruong!

khtruong added 5 commits May 23, 2019 13:01

feat: rough check in for Presto rows and arrays

ca32740

fix: presto arrays

835c1c1

fix: return selected and expanded columns

af73463

fix: add helper methods and unit tests

0e1683c

fix: merge with master

1707219

pull-request-size bot added the size/XL label May 30, 2019

fix: only allow exploration of selected columns

ccaddce

betodealmeida mentioned this pull request May 30, 2019

Show expanded columns in gray in SQL Editor #7627

Merged

12 tasks

betodealmeida reviewed May 30, 2019

View reviewed changes

fix: address Beto's comments and add more unit tests

13c3e6e

betodealmeida merged commit d296734 into apache:master May 31, 2019

betodealmeida mentioned this pull request Jun 11, 2019

Render columns dynamically on wide tables #7693

Merged

12 tasks

betodealmeida mentioned this pull request Jun 21, 2019

[SIP-23] Move SQL Lab storage out of browser localStorage #7748

Closed

mistercrunch mentioned this pull request Oct 4, 2019

Fix lint in superset/db_engine_spec #8338

Merged

12 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

View Presto row and array objects clearly in the data grid #7625

View Presto row and array objects clearly in the data grid #7625

khtruong commented May 30, 2019

betodealmeida left a comment

betodealmeida May 30, 2019

khtruong May 31, 2019

betodealmeida May 31, 2019

khtruong May 31, 2019

betodealmeida May 30, 2019

khtruong May 31, 2019

khtruong commented May 31, 2019

codecov-io commented May 31, 2019

codecov-io commented May 31, 2019

betodealmeida commented May 31, 2019

View Presto row and array objects clearly in the data grid #7625

View Presto row and array objects clearly in the data grid #7625

Conversation

khtruong commented May 30, 2019

CATEGORY

SUMMARY

BEFORE SCREENSHOTS

AFTER SCREENSHOTS

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

betodealmeida left a comment

Choose a reason for hiding this comment

betodealmeida May 30, 2019

Choose a reason for hiding this comment

khtruong May 31, 2019

Choose a reason for hiding this comment

betodealmeida May 31, 2019

Choose a reason for hiding this comment

khtruong May 31, 2019

Choose a reason for hiding this comment

betodealmeida May 30, 2019

Choose a reason for hiding this comment

khtruong May 31, 2019

Choose a reason for hiding this comment

khtruong commented May 31, 2019

codecov-io commented May 31, 2019

Codecov Report

codecov-io commented May 31, 2019

Codecov Report

betodealmeida commented May 31, 2019