-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(csv): Do not coerce persisted data integer columns to float #20760
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20760 +/- ##
===========================================
- Coverage 66.35% 54.87% -11.49%
===========================================
Files 1754 1754
Lines 66689 66688 -1
Branches 7049 7049
===========================================
- Hits 44253 36595 -7658
- Misses 20639 28296 +7657
Partials 1797 1797
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Hi @john-bodley This fix introduces a new problem when user exports CSV file from a cached Query. The thing is, when Dataframe is created dinamically from cached data, it is not respecting column formats. I'm testing this, and it works well when changing:
to
Thank you |
SUMMARY
Regrettably #20151 wasn't suffice is the result set was stored prior to downloading the CSV file. More specifically Pandas coerces an integer array with
None
to a float—likely because of the Numpy coercion, i.e.,The fix is to explicitly define the dtype, using the standard DataFrame constructor, i.e.,
Long term we should probably replace quirky Pandas with PyArrow globally.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION