Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View Presto row and array objects clearly in the data grid #7625

Merged
merged 7 commits into from
May 31, 2019

Conversation

khtruong
Copy link
Contributor

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

We do not immediately display Presto rows and arrays clearly in the data grid. This feature separates out nested fields and data values to help clearly display structural columns. Hopefully in the future, we can apply similar enhancements for other data sources.

BEFORE SCREENSHOTS

Here ColumnA is a row column that contains two nested fields 1) an integer field and 2) a string field. We cannot immediately tell what are the nested fields' names. If this was a more complicated column (i.e. The data type is row(nested_array1 array(row nested_field int, nested_array2 ...)))), then you can imagine how difficult it would be to read the data values.

Screen Shot 2019-05-16 at 11 50 16 AM

AFTER SCREENSHOTS

Here we have pulled out the nested fields into its own columns. This is much more readable.

Screen Shot 2019-05-16 at 1 34 36 PM

Here ColumnB is an array where data values are pulled out into subsequent rows.

Screen Shot 2019-05-16 at 4 04 23 PM

(Note: Ignore the visual enhancements - grayed out columns and data values. We will be adding visual enhancements in a separate PR)

TEST PLAN

Tested manually and added unit tests

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@betodealmeida @DiggidyDave @xtinec @john-bodley @michellethomas @graceguo-supercat @mistercrunch

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khtruong I have some comments and questions. If you have time, it would be good to add some unit tests to the helper functions as well, to ensure they're doing what we expect.

superset/sql_lab.py Outdated Show resolved Hide resolved
actual_cols, actual_data, actual_expanded_cols = PrestoEngineSpec.expand_data(
cols, data)
expected_cols = [
{'name': 'row_column', 'type': 'ROW'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. Why do you simplify the type, returning

{'name': 'row_column', 'type': 'ROW'}

instead of the original

{'name': 'row_column', 'type': 'ROW(NESTED_OBJ VARCHAR)'}

?

Why can't we simply compute all_columns like this

all_columns = selected_columns + expanded_columns

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first question, there are few reasons (although they may not be great reasons):

  1. type usually corresponds to class objects representing data types. We often just cast it to a string in our code base.
  2. The string parser I created awhile ago splits the data type string based on parenthesis and comma's so we'd have to create another helper method to maintain the full data type. (This is possible if we want to).

For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second question, I pass all_columns for each call to parse_structural_column to maintain order. If I concatenate the two lists selected_columns and expanded_columns, then the nested fields are not displayed right after its parent.

Ah, I see. Good point.

For the first question, ideally we'd pass the class objects, converting them to strings whenever needed. I'm ok with using strings, but losing the information here (going from 'ROW(NESTED_OBJ VARCHAR) to simply ROW) seems off. But maybe we can improve this later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

superset/db_engine_specs.py Outdated Show resolved Hide resolved
superset/db_engine_specs.py Outdated Show resolved Hide resolved
superset/db_engine_specs.py Outdated Show resolved Hide resolved
superset/db_engine_specs.py Outdated Show resolved Hide resolved
superset/db_engine_specs.py Outdated Show resolved Hide resolved
array_column,
array_column_hierarchy)
continue
array_data = expanded_array_data[0][array_column]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested a query that returns no rows? Eg:

SELECT nested_column FROM table WHERE 1=0

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test case. I verified and it goes through the function because the query returns an empty array for columns and data.

superset/db_engine_specs.py Show resolved Hide resolved
superset/db_engine_specs.py Outdated Show resolved Hide resolved
@khtruong
Copy link
Contributor Author

@khtruong I have some comments and questions. If you have time, it would be good to add some unit tests to the helper functions as well, to ensure they're doing what we expect.

I added a bunch more unit tests for most of the helper functions. Let me know if you want more.

@codecov-io
Copy link

Codecov Report

Merging #7625 into master will increase coverage by 0.2%.
The diff coverage is 91.71%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #7625     +/-   ##
=========================================
+ Coverage   65.37%   65.58%   +0.2%     
=========================================
  Files         435      435             
  Lines       21513    21687    +174     
  Branches     2378     2384      +6     
=========================================
+ Hits        14064    14223    +159     
- Misses       7329     7343     +14     
- Partials      120      121      +1
Impacted Files Coverage Δ
...ets/src/SqlLab/components/ExploreResultsButton.jsx 72% <100%> (ø) ⬆️
superset/sql_lab.py 76.75% <100%> (+0.38%) ⬆️
superset/db_engine_specs.py 67.11% <91.42%> (+4.38%) ⬆️
...src/components/FilterableTable/FilterableTable.jsx 89.13% <0%> (-1.89%) ⬇️
...uperset/assets/src/SqlLab/components/ResultSet.jsx 79.54% <0%> (-0.46%) ⬇️
superset/common/query_object.py 26.31% <0%> (ø) ⬆️
superset/viz.py 71.95% <0%> (+0.01%) ⬆️
superset/utils/core.py 88.23% <0%> (+0.02%) ⬆️
superset/config.py 94.01% <0%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbdb6b0...13c3e6e. Read the comment docs.

1 similar comment
@codecov-io
Copy link

Codecov Report

Merging #7625 into master will increase coverage by 0.2%.
The diff coverage is 91.71%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #7625     +/-   ##
=========================================
+ Coverage   65.37%   65.58%   +0.2%     
=========================================
  Files         435      435             
  Lines       21513    21687    +174     
  Branches     2378     2384      +6     
=========================================
+ Hits        14064    14223    +159     
- Misses       7329     7343     +14     
- Partials      120      121      +1
Impacted Files Coverage Δ
...ets/src/SqlLab/components/ExploreResultsButton.jsx 72% <100%> (ø) ⬆️
superset/sql_lab.py 76.75% <100%> (+0.38%) ⬆️
superset/db_engine_specs.py 67.11% <91.42%> (+4.38%) ⬆️
...src/components/FilterableTable/FilterableTable.jsx 89.13% <0%> (-1.89%) ⬇️
...uperset/assets/src/SqlLab/components/ResultSet.jsx 79.54% <0%> (-0.46%) ⬇️
superset/common/query_object.py 26.31% <0%> (ø) ⬆️
superset/viz.py 71.95% <0%> (+0.01%) ⬆️
superset/utils/core.py 88.23% <0%> (+0.02%) ⬆️
superset/config.py 94.01% <0%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbdb6b0...13c3e6e. Read the comment docs.

@betodealmeida
Copy link
Member

Thanks for adding all the unit tests, @khtruong!

@betodealmeida betodealmeida merged commit d296734 into apache:master May 31, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants