Skip to content

Commit

Permalink
Merge branch 'main' into main_chelsealin_line
Browse files Browse the repository at this point in the history
  • Loading branch information
chelsea-lin authored Mar 13, 2024
2 parents 276f228 + b6211ee commit 36bd746
Show file tree
Hide file tree
Showing 4 changed files with 178 additions and 17 deletions.
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,36 @@

[1]: https://pypi.org/project/bigframes/#history

## [0.24.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.23.0...v0.24.0) (2024-03-12)


### ⚠ BREAKING CHANGES

* `read_parquet` uses a "pandas" engine to parse files by default. Use `engine="bigquery"` for the previous behavior

### Features

* (Series|Dataframe).plot.hist() ([#420](https://github.com/googleapis/python-bigquery-dataframes/issues/420)) ([4aadff4](https://github.com/googleapis/python-bigquery-dataframes/commit/4aadff4db59243b4510a874fef2bdb17402d1674))
* Add detect_anomalies to ml ARIMAPlus and KMeans models ([#426](https://github.com/googleapis/python-bigquery-dataframes/issues/426)) ([6df28ed](https://github.com/googleapis/python-bigquery-dataframes/commit/6df28ed704552ebec7869e1f2034614cb6407098))
* Add engine parameter to `read_parquet` ([#413](https://github.com/googleapis/python-bigquery-dataframes/issues/413)) ([31325a1](https://github.com/googleapis/python-bigquery-dataframes/commit/31325a190320bf01ced53d9f4cdb94462daaa06b))
* Add ml PCA.detect_anomalies method ([#422](https://github.com/googleapis/python-bigquery-dataframes/issues/422)) ([8d82945](https://github.com/googleapis/python-bigquery-dataframes/commit/8d8294544ac7fedaca753c5473e3ca2a27868420))
* Support BYOSA in `remote_function` ([#407](https://github.com/googleapis/python-bigquery-dataframes/issues/407)) ([d92ced2](https://github.com/googleapis/python-bigquery-dataframes/commit/d92ced2adaa30a0405ace9ca6cd70a8e217f13d0))
* Support CMEK for BQ tables ([#403](https://github.com/googleapis/python-bigquery-dataframes/issues/403)) ([9a678e3](https://github.com/googleapis/python-bigquery-dataframes/commit/9a678e35201d935e1d93875429005033cfe7cff6))


### Bug Fixes

* Move `third_party.bigframes_vendored` to `bigframes_vendored` ([#424](https://github.com/googleapis/python-bigquery-dataframes/issues/424)) ([763edeb](https://github.com/googleapis/python-bigquery-dataframes/commit/763edeb4f4e8bc4b8bb05a992dae80c49c245e25))
* Only do row identity based joins when joining by index ([#356](https://github.com/googleapis/python-bigquery-dataframes/issues/356)) ([76b252f](https://github.com/googleapis/python-bigquery-dataframes/commit/76b252f907055d72556e3e95f6cb5ee41de5b1c2))
* Read_pandas inline respects location ([#412](https://github.com/googleapis/python-bigquery-dataframes/issues/412)) ([ae0e3ea](https://github.com/googleapis/python-bigquery-dataframes/commit/ae0e3eaca49171fd449de4d43ddc3e3ce9fdc2ce))


### Documentation

* Add predict sample to samples/snippets/bqml_getting_started_test.py ([#388](https://github.com/googleapis/python-bigquery-dataframes/issues/388)) ([6a3b0cc](https://github.com/googleapis/python-bigquery-dataframes/commit/6a3b0cc7f84120fc5978ce11b6b7c55e89654304))
* Document minimum IAM requirement ([#416](https://github.com/googleapis/python-bigquery-dataframes/issues/416)) ([36173b0](https://github.com/googleapis/python-bigquery-dataframes/commit/36173b0c14747fb52909bbedd93249024bae9ac1))
* Fix the note rendering for DataFrames methods: nlargest, nsmallest ([#417](https://github.com/googleapis/python-bigquery-dataframes/issues/417)) ([38bd2ba](https://github.com/googleapis/python-bigquery-dataframes/commit/38bd2ba21bc1a3222635de22eecd97930bf5b1de))

## [0.23.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.22.0...v0.23.0) (2024-03-05)


Expand Down
16 changes: 0 additions & 16 deletions bigframes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import typing
from typing import Any, Dict, Iterable, Literal, Tuple, Union

import bigframes_vendored.google_cloud_bigquery._pandas_helpers as gcb3p_pandas_helpers
import bigframes_vendored.ibis.backends.bigquery.datatypes as third_party_ibis_bqtypes
import bigframes_vendored.ibis.expr.operations as vendored_ibis_ops
import geopandas as gpd # type: ignore
Expand Down Expand Up @@ -492,21 +491,6 @@ def cast_ibis_value(
)


def to_pandas_dtypes_overrides(schema: Iterable[bigquery.SchemaField]) -> Dict:
"""For each STRUCT field, make sure we specify the full type to use."""
# TODO(swast): Also override ARRAY fields.
dtypes = {}
for field in schema:
if field.field_type == "RECORD" and field.mode != "REPEATED":
# TODO(swast): We're using a private API here. Would likely be
# better if we called `to_arrow()` and converted to a pandas
# DataFrame ourselves from that.
dtypes[field.name] = pd.ArrowDtype(
gcb3p_pandas_helpers.bq_to_arrow_data_type(field)
)
return dtypes


def is_dtype(scalar: typing.Any, dtype: Dtype) -> bool:
"""Captures whether a scalar can be losslessly represented by a dtype."""
if scalar is None:
Expand Down
2 changes: 1 addition & 1 deletion bigframes/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.23.0"
__version__ = "0.24.0"
147 changes: 147 additions & 0 deletions scripts/get_code_sample_coverage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import importlib
import inspect
import sys
from typing import Dict, List

import bigframes
import bigframes.pandas as bpd

PRESENT = "present"
NOT_PRESENT = "not_present"

CLASSES = [
bpd.DataFrame,
bpd.Series,
bpd.Index,
bigframes.session.Session,
bigframes.operations.strings.StringMethods,
bigframes.operations.datetimes.DatetimeMethods,
bigframes.operations.structs.StructAccessor,
]

ML_MODULE_NAMES = [
"cluster",
"compose",
"decomposition",
"ensemble",
"linear_model",
"metrics",
"model_selection",
"pipeline",
"preprocessing",
"llm",
"forecasting",
"imported",
"remote",
]

for module_name in ML_MODULE_NAMES:
module = importlib.import_module(f"bigframes.ml.{module_name}")
classes_ = [
class_ for _, class_ in inspect.getmembers(module, predicate=inspect.isclass)
]
CLASSES.extend(classes_)


def get_code_samples_summary() -> Dict[str, Dict[str, List[str]]]:
"""Get Summary of the code samples coverage in BigFrames APIs.
Returns:
Summary: A dictionary of the format
{
class_1: {
"present": [method1, method2, ...],
"not_present": [method3, method4, ...]
},
class_2: {
...
}
}
"""
summary: Dict[str, Dict[str, List[str]]] = dict()

for class_ in CLASSES:
class_key = f"{class_.__module__}.{class_.__name__}"
summary[class_key] = {PRESENT: [], NOT_PRESENT: []}

members = inspect.getmembers(class_)

for name, obj in members:
# ignore private methods
if name.startswith("_") and not name.startswith("__"):
continue

def predicate(impl):
return (
# This includes class methods like `from_dict`, `from_records`
inspect.ismethod(impl)
# This includes instance methods like `dropna`, join`
or inspect.isfunction(impl)
# This includes properties like `shape`, `values` but not
# generic properties like `__weakref__`
or (inspect.isdatadescriptor(impl) and not name.startswith("__"))
)

if not predicate(obj):
continue

# At this point we have a property or a public method
impl = getattr(class_, name)

docstr = inspect.getdoc(impl)
code_samples_present = docstr and "**Examples:**" in docstr
key = PRESENT if code_samples_present else NOT_PRESENT
summary[class_key][key].append(name)

return summary


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Get a summary of code samples coverage in BigFrames APIs."
)
parser.add_argument(
"-d",
"--details",
type=bool,
action=argparse.BooleanOptionalAction,
default=False,
help="Whether to print APIs with and without code samples.",
)

args = parser.parse_args(sys.argv[1:])

summary = get_code_samples_summary()

total_with_code_samples = 0
total = 0
for class_, class_summary in summary.items():
apis_with_code_samples = len(class_summary[PRESENT])
total_with_code_samples += apis_with_code_samples

apis_total = len(class_summary[PRESENT]) + len(class_summary[NOT_PRESENT])
total += apis_total

coverage = 100 * apis_with_code_samples / apis_total
print(f"{class_}: {coverage:.1f}% ({apis_with_code_samples}/{apis_total})")
if args.details:
print(f"===> APIs WITH code samples: {class_summary[PRESENT]}")
print(f"===> APIs WITHOUT code samples: {class_summary[NOT_PRESENT]}")

coverage = 100 * total_with_code_samples / total
print(f"Total: {coverage:.1f}% ({total_with_code_samples}/{total})")

0 comments on commit 36bd746

Please sign in to comment.