Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: convert <NA> values to None instead of stringifying #22321

Merged
merged 1 commit into from
Dec 3, 2022

Conversation

eschutho
Copy link
Member

@eschutho eschutho commented Dec 2, 2022

SUMMARY

With the pandas pyathena driver, we are seeing some issues with null values for integer columns. The bug is happening at this point where we get a type error

TypeError: Unserializable object <NA> of type <class 'pandas._libs.missing.NAType'> and then we catch the error and try to stringify it, which resulted in pyarrow.lib.ArrowInvalid: Could not convert <NA> with type NAType: did not recognize Python value type when inferring an Arrow data type

For this fix, I am returning None for all pandas values so that the printed value looks like other nullable printed values.

AFTER SCREENSHOTS OR ANIMATED GIF

_DEV__Superset

TESTING INSTRUCTIONS

An athena db with the pyathena+pandas schema should be able to fetch values that are integer types with null values.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Dec 2, 2022

Codecov Report

Merging #22321 (915884d) into master (3bc0865) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #22321      +/-   ##
==========================================
- Coverage   66.92%   66.90%   -0.02%     
==========================================
  Files        1835     1836       +1     
  Lines       69988    70130     +142     
  Branches     7612     7612              
==========================================
+ Hits        46839    46921      +82     
- Misses      21183    21243      +60     
  Partials     1966     1966              
Flag Coverage Δ
hive 52.53% <66.66%> (-0.05%) ⬇️
mysql 77.95% <66.66%> (-0.13%) ⬇️
postgres 78.02% <66.66%> (-0.13%) ⬇️
presto 52.42% <66.66%> (-0.05%) ⬇️
python 81.24% <100.00%> (-0.10%) ⬇️
sqlite 76.48% <66.66%> (-0.12%) ⬇️
unit 50.92% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/result_set.py 97.85% <100.00%> (+0.03%) ⬆️
superset/db_engine_specs/databricks.py 64.96% <0.00%> (-17.97%) ⬇️
superset/utils/pandas_postprocessing/utils.py 95.31% <0.00%> (-1.24%) ⬇️
superset/views/database/views.py 31.27% <0.00%> (-0.09%) ⬇️
superset/models/helpers.py 38.19% <0.00%> (-0.09%) ⬇️
superset/charts/schemas.py 99.35% <0.00%> (ø)
superset/connectors/sqla/models.py 89.32% <0.00%> (ø)
superset/utils/pandas_postprocessing/sort.py 100.00% <0.00%> (ø)
superset/db_engine_specs/risingwave.py 100.00% <0.00%> (ø)
superset/charts/data/api.py 89.87% <0.00%> (+0.06%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@eschutho eschutho merged commit 1c20206 into apache:master Dec 3, 2022
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants