-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BUG: read_sql tries to convert blob/varbinary to string with pyarrow backend #60105
base: main
Are you sure you want to change the base?
Changes from 9 commits
2244869
bd00fc5
6a23f05
2f261c8
8f900b8
536b1ed
8965a1d
a32b4a6
f412c72
a0200d0
10ca030
ba2e82d
8dd45ca
602194a
f8ae285
02514e0
ef5c2ed
da7817a
de5c604
e025d84
3d83bab
6ea5785
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -4358,3 +4358,21 @@ def test_xsqlite_if_exists(sqlite_buildin): | |||||
(5, "E"), | ||||||
] | ||||||
drop_table(table_name, sqlite_buildin) | ||||||
|
||||||
|
||||||
@pytest.mark.parametrize("dtype_backend", ["pyarrow", "numpy_nullable", lib.no_default]) | ||||||
def test_bytes_column(sqlite_buildin, dtype_backend): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Should test this against all databases There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure thing! I was trying to pass in the cartesian product of the |
||||||
pa = pytest.importorskip("pyarrow") | ||||||
""" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is well intentioned but can you remove the docstring? We don't use them in tests. Instead, you can just add a comment pointing to the github issue number in the function body |
||||||
Regression test for (#59242) | ||||||
Bytes being returned in a column that could not be converted | ||||||
to a string would raise a UnicodeDecodeError | ||||||
when using dtype_backend='pyarrow' or dtype_backend='numpy_nullable' | ||||||
""" | ||||||
query = """ | ||||||
select cast(x'0123456789abcdef0123456789abcdef' as blob) a | ||||||
""" | ||||||
df = pd.read_sql(query, sqlite_buildin, dtype_backend=dtype_backend) | ||||||
assert df.a.values[0] == b"\x01#Eg\x89\xab\xcd\xef\x01#Eg\x89\xab\xcd\xef" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you use our built-in test helpers instead? I think you can just do: result = pd.read_sql(...)
expected = pd.DataFrame({"a": ...}, dtype=pd.ArrowDtype(pa.binary()))
tm.assert_frame_equal(result, expected) What data type does this produce currently with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for sure, changed the testing logic over to using this! for |
||||||
if dtype_backend == "pyarrow": | ||||||
assert df.a.dtype == pd.ArrowDtype(pa.binary()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also found that this was failing for
dtype_backend=numpy_nullable
so implemented a fix and added test cases for each possibledtype_backend