-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dt regression in empty() #898
Conversation
@jrbourbeau , I'll merge this when it passes, and that should be enough to make dask CI happy. |
Thanks for fixing so quickly @martindurant! Will there be a release out with this patch soon? We use releases in most CI build (one build uses |
Yes, since the windows-py3.12 wheel failed to build in the last round anyway. |
@jrbourbeau , would you mind running your main-branch CI somewhere to see if the failures go away? |
Locally I'm getting the same error ____________________________________________________________________________________________________________________________________ test_timestamp96 _____________________________________________________________________________________________________________________________________
tmpdir = local('/private/var/folders/h0/_w6tz8jd3b9bk6w7d_xpg9t40000gn/T/pytest-of-james/pytest-21/test_timestamp960')
@FASTPARQUET_MARK
def test_timestamp96(tmpdir):
fn = str(tmpdir)
df = pd.DataFrame({"a": [pd.to_datetime("now", utc=True)]})
ddf = dd.from_pandas(df, 1)
ddf.to_parquet(fn, engine="fastparquet", write_index=False, times="int96")
pf = fastparquet.ParquetFile(fn)
assert pf._schema[1].type == fastparquet.parquet_thrift.Type.INT96
> out = dd.read_parquet(fn, engine="fastparquet", index=False).compute()
dask/dataframe/io/tests/test_parquet.py:1883:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/base.py:342: in compute
(result,) = compute(self, traverse=False, **kwargs)
dask/base.py:628: in compute
results = schedule(dsk, keys, **kwargs)
dask/dataframe/io/parquet/core.py:96: in __call__
return read_parquet_part(
dask/dataframe/io/parquet/core.py:654: in read_parquet_part
dfs = [
dask/dataframe/io/parquet/core.py:655: in <listcomp>
func(
dask/dataframe/io/parquet/fastparquet.py:1075: in read_partition
return cls.pf_to_pandas(
dask/dataframe/io/parquet/fastparquet.py:1115: in pf_to_pandas
df, views = pf.pre_allocate(size, columns, categories, index)
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:797: in pre_allocate
df, arrs = _pre_allocate(size, columns, categories, index, cats,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:1051: in _pre_allocate
df, views = dataframe.empty(dtypes, size, cols=cols, index_names=index,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/dataframe.py:202: in empty
values = type(bvalues)._from_sequence(values, copy=False, dtype=bvalues.dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pandas/_libs/tslibs/tzconversion.pyx:187: ValueError Note it looks like the line changed in this PR is similar, but not exactly the same, to the line where the error is being raised. Maybe both lines need the same sort of update |
What's your pandas version? |
In [1]: import pandas as pd
pd
In [2]: pd.__version__
Out[2]: '1.5.3' |
OK, then I think all the pandas I have and in tests are too new... Hold on. |
Fixes #897