-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: pandas bug when data is blank on post-processing #20629
Conversation
superset/charts/post_processing.py
Outdated
try: | ||
df = pd.DataFrame.from_dict(query["data"]) | ||
except ValueError: # no data error | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to continue
here and in line 336, since other queries might have data (and if they also don't we'll end up returning result
unmodified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is query["data"]
in this case? Would it be preferable to first check whether query["data"]
is valid? Per your unit tests it seems like this might be an empty string—the worst of the worst—and maybe we could/should fix this upstream and have it be None
, i.e., the following works:
>>> import pandas as pd
>>> pd.DataFrame.from_dict(None)
Empty DataFrame
Columns: []
Index: []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida if the data is None or '', is there any value in continuing this process rather than returning early? AFAICT we'll continue to get more errors down below as well. Per @john-bodley's point, I can do a nullish check instead of the try/except if we want to be more specific to these errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you continue
here it would skip to the next step of the for query in result["queries"]
loop, so it wouldn't get more errors. There could be other queries in result["queries"]
that have non-blank data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you're saying. I also added a new test for multiple queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on @john-bodley's comment, i added a few more tests when data is None
and found a case where it errors later in the code, so I put a nullish check like he suggested instead of the try/except.
Codecov Report
@@ Coverage Diff @@
## master #20629 +/- ##
==========================================
- Coverage 66.82% 66.67% -0.15%
==========================================
Files 1752 1752
Lines 65616 65570 -46
Branches 6938 6938
==========================================
- Hits 43849 43722 -127
- Misses 20007 20088 +81
Partials 1760 1760
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
f6cebfe
to
1cf4df0
Compare
1cf4df0
to
c701d6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank (cherry picked from commit c2be54c)
* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank (cherry picked from commit c2be54c)
* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank (cherry picked from commit c2be54c)
* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank
SUMMARY
There's a bug in post processing for tables and pivot-table charts when the data is empty for both json and csv formats. We'll now just return the original results instead of trying to apply any post-processing on it.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Currently raises a 500
TESTING INSTRUCTIONS
Create a chart with no results for a pivot table and then try to export pivoted results as csv. This would also break on alert/reports when formatting the report as a csv.
ADDITIONAL INFORMATION