-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support read_parquet for backend with no native support #9744
Changes from 3 commits
ab2ad16
661f50d
e16f1bb
eaec7a2
9106ad8
27d7a08
ac6117f
3ce9674
24530ca
bb238af
12cfc7d
2cf597a
b4cf0ea
2ba5002
6f2c754
24bfe38
6a50c46
4579bff
d1ed444
b01bc6a
e70de2f
413ada7
c3fba44
8b6b3c6
0d55190
fda5493
71ebb8e
2473c02
3ab60a8
59c03e0
c0c1fd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -421,12 +421,13 @@ def test_register_garbage(con, monkeypatch): | |
("functional_alltypes.parquet", "funk_all"), | ||
], | ||
) | ||
@pytest.mark.notyet( | ||
["flink", "impala", "mssql", "mysql", "postgres", "risingwave", "sqlite", "trino"] | ||
) | ||
@pytest.mark.notyet(["flink"]) | ||
def test_read_parquet(con, tmp_path, data_dir, fname, in_table_name): | ||
pq = pytest.importorskip("pyarrow.parquet") | ||
|
||
if con.name in ["oracle", "exasol"]: | ||
pytest.skip("Skip Exasol and Oracle because of the global pytestmark") | ||
jitingxu1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
fname = Path(fname) | ||
fname = Path(data_dir) / "parquet" / fname.name | ||
table = pq.read_table(fname) | ||
|
@@ -452,19 +453,7 @@ def ft_data(data_dir): | |
return table.slice(0, nrows) | ||
|
||
|
||
@pytest.mark.notyet( | ||
[ | ||
"flink", | ||
"impala", | ||
"mssql", | ||
"mysql", | ||
"pandas", | ||
"postgres", | ||
"risingwave", | ||
"sqlite", | ||
"trino", | ||
] | ||
) | ||
@pytest.mark.notyet(["flink"]) | ||
def test_read_parquet_glob(con, tmp_path, ft_data): | ||
pq = pytest.importorskip("pyarrow.parquet") | ||
|
||
|
@@ -476,7 +465,11 @@ def test_read_parquet_glob(con, tmp_path, ft_data): | |
for fname in fnames: | ||
pq.write_table(ft_data, tmp_path / fname) | ||
|
||
table = con.read_parquet(tmp_path / f"*.{ext}") | ||
if con.name == "clickhouse": | ||
# clickhouse does not support read directory | ||
table = con.read_parquet(tmp_path / f"*.{ext}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem like the right approach. You're changing what's being tested. Why can't you leave this code unchanged here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the pyarrow We have three kinds of read_parquet:
Maybe we could add something before There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test is whether the backend can read a glob of parquet files, the answer to that seems to be "no", so it should be marked as |
||
else: | ||
table = con.read_parquet(tmp_path) | ||
|
||
assert table.count().execute() == nrows * ntables | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of BytesIO, I could pass the fsspec object, It could be HTTPFile if we pass an HTTP url. Not sure what is the best way to handle the type of
path
@gforsyth any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
fsspec
is a good option.