fix: replace NaNs with None in some backends when loading from pandas dataframe #9094

chloeh13q · 2024-05-01T06:51:57Z

Description of changes

Examples were broken on MySQL backend and PySpark backend when there are null values in a numeric column.

Druid, PySpark, RW don't support examples.

Exasol - did not test
Flink - broken
Impala - did not test
MSSQL - broken bug: examples broken on mssql backend #9095
MySQL - fixed
Oracle - did not test
PostgreSQL - fixed
Snowflake - did not test

Issues closed

gforsyth

Thanks for putting this in @chloeh13q ! I'm on board with trying this out -- I think for mysql, where nan isn't allowed at all, this is a good solution.
For postgres, we should be a little more specific in the conversion.

I also think this could work for mssql and a few other backends that currently can't load a large chunk of the examples.

ibis/backends/postgres/__init__.py

chloeh13q · 2024-05-01T20:00:37Z

For postgres, we should be a little more specific in the conversion.

Sounds good.

I also think this could work for mssql and a few other backends that currently can't load a large chunk of the examples.

I think mssql examples are broken for a different reason, at least on my local; I filed a ticket #9095 yesterday and I'm getting Incorrect syntax on all of the examples. Do you have other backends in mind? Or you just meant this in a generic sense

…ng from pandas dataframe

…ad of the entire df

chloeh13q · 2024-05-02T19:24:20Z

I'm having trouble spinning up some of the backends so I'm not able to test whether the examples work in these backends. But I can confirm that the examples are now fixed in MySQL and postgres with this PR!

gforsyth · 2024-05-03T17:11:56Z

xref #9110

gforsyth

LGTM

cpcloud · 2024-05-06T13:48:58Z

Do we have some tests for this? How do we know this won't regress?

gforsyth · 2024-05-06T17:04:53Z

We should enable some of the examples tests for postgres and mysql (I don't think we need to run all of them)

The palmer penguins data has null valued integers, which when used to create a `memtable`, leads to pandas casting a column to `float` because it reads in the nulls as NaN. We added a fix for this for some backends in ibis-project#9094.

The palmer penguins data has null valued integers, which when used to create a `memtable`, leads to pandas casting a column to `float` because it reads in the nulls as NaN. We added a fix for this for some backends in #9094.

chloeh13q changed the title ~~fix: replace NaNs with None in mysql and postgres backends when loading from pandas dataframe~~ fix: replace NaNs with None in some backends when loading from pandas dataframe May 1, 2024

chloeh13q requested a review from gforsyth May 1, 2024 07:24

gforsyth reviewed May 1, 2024

View reviewed changes

ibis/backends/postgres/__init__.py Outdated Show resolved Hide resolved

Chloe He added 3 commits May 2, 2024 11:39

fix: replace NaNs with None in mysql and postgres backends when loadi…

394abaf

…ng from pandas dataframe

fix(postgres): replace NaNs with None only in non-float columns inste…

0afe3dd

…ad of the entire df

fix(postgres): change a deprecated API

302409b

chloeh13q force-pushed the fix/examples branch from b34714d to 302409b Compare May 2, 2024 18:39

chloeh13q requested a review from gforsyth May 2, 2024 19:22

chloeh13q marked this pull request as ready for review May 2, 2024 19:23

gforsyth approved these changes May 6, 2024

View reviewed changes

gforsyth merged commit f2a7cd9 into ibis-project:main May 6, 2024
82 checks passed

cpcloud added this to the 9.1 milestone May 6, 2024

cpcloud added ux User experience related issues postgres The PostgreSQL backend mysql The MySQL backend labels May 6, 2024

gforsyth mentioned this pull request May 6, 2024

test(examples): add penguins example test #9130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace NaNs with None in some backends when loading from pandas dataframe #9094

fix: replace NaNs with None in some backends when loading from pandas dataframe #9094

chloeh13q commented May 1, 2024 •

edited

Loading

gforsyth left a comment

chloeh13q commented May 1, 2024 •

edited

Loading

chloeh13q commented May 2, 2024 •

edited

Loading

gforsyth commented May 3, 2024

gforsyth left a comment

cpcloud commented May 6, 2024

gforsyth commented May 6, 2024

fix: replace NaNs with None in some backends when loading from pandas dataframe #9094

fix: replace NaNs with None in some backends when loading from pandas dataframe #9094

Conversation

chloeh13q commented May 1, 2024 • edited Loading

Description of changes

Issues closed

gforsyth left a comment

Choose a reason for hiding this comment

chloeh13q commented May 1, 2024 • edited Loading

chloeh13q commented May 2, 2024 • edited Loading

gforsyth commented May 3, 2024

gforsyth left a comment

Choose a reason for hiding this comment

cpcloud commented May 6, 2024

gforsyth commented May 6, 2024

chloeh13q commented May 1, 2024 •

edited

Loading

chloeh13q commented May 1, 2024 •

edited

Loading

chloeh13q commented May 2, 2024 •

edited

Loading