[fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) #8935

john-bodley · 2020-01-08T02:03:14Z

SUMMARY

In the past we have discussed whether we should use an empty pandas.DataFrame object or None to represent no-data. Previously df attribute of the QueryResult class could either be None or a pandas.DataFrame object and was inconsistency defined, i.e., the Druid native connector used the former whereas the SQL connector used the later. This was problematic as the None type wasn't always handled, e.g. here and here would both throw an exception if df was None.

This PR ensures that the df attribute is always a pandas.DataFrame object which can be empty.

Additionally I’ve also added some basic typing to help enforce a non-optional pandas.DataFrame and also provided a short circuit when processing of data for the NVD3TimeSeriesViz class if the data-frame is empty.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI.

ADDITIONAL INFORMATION

REVIEWERS

to: @etr2460 @michellethomas @mistercrunch @serenajiang @villebro @willbarrett

john-bodley · 2020-01-08T02:04:13Z

superset/connectors/druid/models.py

-    def _add_filter_from_pre_query_data(
-        self, df: Optional[pd.DataFrame], dimensions, dim_filter
-    ):
+    def _add_filter_from_pre_query_data(self, df: pd.DataFrame, dimensions, dim_filter):


The return type from client.export_pandas() is never None per here.

john-bodley · 2020-01-08T02:11:33Z

superset/viz.py

@@ -439,11 +439,7 @@ def get_df_payload(self, query_obj=None, **kwargs):
                and self.status != utils.QueryStatus.FAILED
            ):
                try:
-                    cache_value = dict(
-                        dttm=cached_dttm,
-                        df=df if df is not None else None,


df if df is not None else None is merely df.

willbarrett

Looks reasonable to me. I approve of greater consistency in return values.

willbarrett · 2020-01-08T17:16:56Z

superset/viz.py

@@ -439,11 +439,7 @@ def get_df_payload(self, query_obj=None, **kwargs):
                and self.status != utils.QueryStatus.FAILED
            ):
                try:
-                    cache_value = dict(
-                        dttm=cached_dttm,
-                        df=df if df is not None else None,


john-bodley · 2020-01-08T17:29:56Z

superset/connectors/druid/models.py

@@ -1379,7 +1377,7 @@ def query(self, query_obj: Dict) -> QueryResult:

        if df is None or df.size == 0:
            return QueryResult(
-                df=pd.DataFrame([]),
+                df=pd.DataFrame(),


This is equivalent.

john-bodley · 2020-01-08T17:32:47Z

superset/viz.py

@@ -1155,6 +1153,9 @@ def process_data(self, df, aggregate=False):
        if fd.get("granularity") == "all":
            raise Exception(_("Pick a time granularity for your time series"))

+        if df.empty:


Short circuiting as pivot_table will throw an exception if the pandas.DataFrame is empty.

villebro

LGTM

villebro · 2020-01-08T19:26:22Z

superset/viz.py

@@ -77,6 +77,8 @@
    "size",
 ]

+VizData = Optional[Union[List[Any], Dict[Any, Any]]]


Nice, we should really leverage alias types more.

villebro · 2020-01-08T19:27:22Z

superset/viz.py

@@ -439,11 +439,7 @@ def get_df_payload(self, query_obj=None, **kwargs):
                and self.status != utils.QueryStatus.FAILED
            ):
                try:
-                    cache_value = dict(
-                        dttm=cached_dttm,
-                        df=df if df is not None else None,


pull-request-size bot added the size/L label Jan 8, 2020

john-bodley commented Jan 8, 2020

View reviewed changes

john-bodley changed the title ~~[fix] Enforce the query result data to be pandas.DataFrame~~ [fix] Enforce the QueryResult.df to be pandas.DataFrame object Jan 8, 2020

john-bodley commented Jan 8, 2020

View reviewed changes

[fix] Enforce the query result to contain a data-frame

aaba7da

john-bodley force-pushed the john-bodley--fix-empty-data-frame branch from 732f1d1 to aaba7da Compare January 8, 2020 06:24

john-bodley changed the title ~~[fix] Enforce the QueryResult.df to be pandas.DataFrame object~~ [fix] Enforce the QueryResult.df to be a pandas.DataFrame object Jan 8, 2020

john-bodley marked this pull request as ready for review January 8, 2020 16:21

willbarrett approved these changes Jan 8, 2020

View reviewed changes

john-bodley commented Jan 8, 2020

View reviewed changes

villebro approved these changes Jan 8, 2020

View reviewed changes

john-bodley merged commit 2d456e8 into apache:master Jan 8, 2020

john-bodley deleted the john-bodley--fix-empty-data-frame branch January 8, 2020 19:50

john-bodley changed the title ~~[fix] Enforce the QueryResult.df to be a pandas.DataFrame object~~ [fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) Jan 11, 2020

john-bodley mentioned this pull request Jan 11, 2020

[fix] Enforce the QueryResult.df to be a pandas.DataFrame (Phase II) #8948

Merged

12 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) #8935

[fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) #8935

john-bodley commented Jan 8, 2020 •

edited

Loading

john-bodley Jan 8, 2020

john-bodley Jan 8, 2020

willbarrett Jan 8, 2020

villebro Jan 8, 2020

willbarrett left a comment

willbarrett Jan 8, 2020

john-bodley Jan 8, 2020

john-bodley Jan 8, 2020

villebro left a comment

villebro Jan 8, 2020

villebro Jan 8, 2020

[fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) #8935

[fix] Enforce the QueryResult.df to be a pandas.DataFrame object (Phase I) #8935

Conversation

john-bodley commented Jan 8, 2020 • edited Loading

CATEGORY

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willbarrett left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley commented Jan 8, 2020 •

edited

Loading