Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: de-duplicate Block.__init__ #38134

Merged
merged 16 commits into from
Mar 9, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4735,6 +4735,7 @@ Write to a feather file.
Read from a feather file.

.. ipython:: python
:okwarning:

result = pd.read_feather("example.feather")
result
Expand Down Expand Up @@ -4818,6 +4819,7 @@ Write to a parquet file.
Read from a parquet file.

.. ipython:: python
:okwarning:

result = pd.read_parquet("example_fp.parquet", engine="fastparquet")
result = pd.read_parquet("example_pa.parquet", engine="pyarrow")
Expand All @@ -4827,6 +4829,7 @@ Read from a parquet file.
Read only certain columns of a parquet file.

.. ipython:: python
:okwarning:

result = pd.read_parquet(
"example_fp.parquet",
Expand Down Expand Up @@ -4895,6 +4898,7 @@ Partitioning Parquet files
Parquet supports partitioning of data based on the values of one or more columns.

.. ipython:: python
:okwarning:

df = pd.DataFrame({"a": [0, 0, 1, 1], "b": [0, 1, 0, 1]})
df.to_parquet(path="test", engine="pyarrow", partition_cols=["a"], compression=None)
Expand Down
7 changes: 7 additions & 0 deletions doc/source/user_guide/scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ To load the columns we want, we have two options.
Option 1 loads in all the data and then filters to what we need.

.. ipython:: python
:okwarning:

columns = ["id_0", "name_0", "x_0", "y_0"]

Expand All @@ -79,6 +80,7 @@ Option 1 loads in all the data and then filters to what we need.
Option 2 only loads the columns we request.

.. ipython:: python
:okwarning:

pd.read_parquet("timeseries_wide.parquet", columns=columns)

Expand All @@ -98,6 +100,7 @@ referred to as "low-cardinality" data). By using more efficient data types, you
can store larger datasets in memory.

.. ipython:: python
:okwarning:

ts = pd.read_parquet("timeseries.parquet")
ts
Expand Down Expand Up @@ -206,6 +209,7 @@ counts up to this point. As long as each individual file fits in memory, this wi
work for arbitrary-sized datasets.

.. ipython:: python
:okwarning:

%%time
files = pathlib.Path("data/timeseries/").glob("ts*.parquet")
Expand Down Expand Up @@ -289,6 +293,7 @@ returns a Dask Series with the same dtype and the same name.
To get the actual result you can call ``.compute()``.

.. ipython:: python
:okwarning:

%time ddf["name"].value_counts().compute()

Expand Down Expand Up @@ -322,6 +327,7 @@ Dask implements the most used parts of the pandas API. For example, we can do
a familiar groupby aggregation.

.. ipython:: python
:okwarning:

%time ddf.groupby("name")[["x", "y"]].mean().compute().head()

Expand All @@ -345,6 +351,7 @@ we need to supply the divisions manually.
Now we can do things like fast random access with ``.loc``.

.. ipython:: python
:okwarning:

ddf.loc["2002-01-01 12:01":"2002-01-01 12:05"].compute()

Expand Down
47 changes: 18 additions & 29 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,20 @@ def __init__(self, values, placement, ndim: int):
1 for SingleBlockManager/Series, 2 for BlockManager/DataFrame
"""
# TODO(EA2D): ndim will be unnecessary with 2D EAs
self.ndim = self._check_ndim(values, ndim)
self.mgr_locs = placement
self.values = self._maybe_coerce_values(values)
self.ndim = self._check_ndim(values, ndim)

if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
raise ValueError(
f"Wrong number of items passed {len(self.values)}, "
f"placement implies {len(self.mgr_locs)}"
)

if self.is_extension and self.ndim == 2 and len(self.mgr_locs) != 1:
# TODO(EA2D): check unnecessary with 2D EAs
raise AssertionError("block.size != values.size")
jreback marked this conversation as resolved.
Show resolved Hide resolved

def _maybe_coerce_values(self, values):
"""
Ensure we have correctly-typed values.
Expand Down Expand Up @@ -180,7 +184,19 @@ def _check_ndim(self, values, ndim):
ValueError : the number of dimensions do not match
"""
if ndim is None:
ndim = values.ndim
warnings.warn(
"Accepting ndim=None in the Block constructor is deprecated, "
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
"this will raise in a future version.",
FutureWarning,
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
stacklevel=3,
)
if self.is_extension:
if len(self.mgr_locs) != 1:
ndim = 1
else:
ndim = 2
else:
ndim = values.ndim

if self._validate_ndim and values.ndim != ndim:
raise ValueError(
Expand Down Expand Up @@ -1667,33 +1683,6 @@ class ExtensionBlock(Block):

values: ExtensionArray

def __init__(self, values, placement, ndim: int):
"""
Initialize a non-consolidatable block.

'ndim' may be inferred from 'placement'.

This will call continue to call __init__ for the other base
classes mixed in with this Mixin.
"""

# Placement must be converted to BlockPlacement so that we can check
# its length
if not isinstance(placement, libinternals.BlockPlacement):
placement = libinternals.BlockPlacement(placement)

# Maybe infer ndim from placement
if ndim is None:
if len(placement) != 1:
ndim = 1
else:
ndim = 2
super().__init__(values, placement, ndim=ndim)

if self.ndim == 2 and len(self.mgr_locs) != 1:
# TODO(EA2D): check unnecessary with 2D EAs
raise AssertionError("block.size != values.size")

@property
def shape(self):
# TODO(EA2D): override unnecessary with 2D EAs
Expand Down