Skip to content

Commit

Permalink
Merge branch 'main' into tswast-ordering-mode-partial-sample
Browse files Browse the repository at this point in the history
  • Loading branch information
tswast authored Nov 25, 2024
2 parents a6ab522 + 9015c33 commit c89b938
Show file tree
Hide file tree
Showing 119 changed files with 4,252 additions and 724 deletions.
4 changes: 2 additions & 2 deletions .github/.OwlBot.lock.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@
# limitations under the License.
docker:
image: gcr.io/cloud-devrel-public-resources/owlbot-python:latest
digest: sha256:5efdf8d38e5a22c1ec9e5541cbdfde56399bdffcb6f531183f84ac66052a8024
# created: 2024-10-23T18:04:53.195998718Z
digest: sha256:2ed982f884312e4883e01b5ab8af8b6935f0216a5a2d82928d273081fc3be562
# created: 2024-11-12T12:09:45.821174897Z
20 changes: 10 additions & 10 deletions .kokoro/docker/docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# This file is autogenerated by pip-compile with Python 3.9
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --allow-unsafe --generate-hashes requirements.in
Expand All @@ -8,9 +8,9 @@ argcomplete==3.5.1 \
--hash=sha256:1a1d148bdaa3e3b93454900163403df41448a248af01b6e849edc5ac08e6c363 \
--hash=sha256:eb1ee355aa2557bd3d0145de7b06b2a45b0ce461e1e7813f5d066039ab4177b4
# via nox
colorlog==6.8.2 \
--hash=sha256:3e3e079a41feb5a1b64f978b5ea4f46040a94f11f0e8bbb8261e3dbbeca64d44 \
--hash=sha256:4dcbb62368e2800cb3c5abd348da7e53f6c362dda502ec27c560b2e58a66bd33
colorlog==6.9.0 \
--hash=sha256:5906e71acd67cb07a71e779c47c4bcb45fb8c2993eebe9e5adcd6a6f1b283eff \
--hash=sha256:bfba54a1b93b94f54e1f4fe48395725a3d92fd2a4af702f6bd70946bdc0c6ac2
# via nox
distlib==0.3.9 \
--hash=sha256:47f8c22fd27c27e25a65601af709b38e4f0a45ea4fc2e710f65755fa8caaaf87 \
Expand All @@ -24,9 +24,9 @@ nox==2024.10.9 \
--hash=sha256:1d36f309a0a2a853e9bccb76bbef6bb118ba92fa92674d15604ca99adeb29eab \
--hash=sha256:7aa9dc8d1c27e9f45ab046ffd1c3b2c4f7c91755304769df231308849ebded95
# via -r requirements.in
packaging==24.1 \
--hash=sha256:026ed72c8ed3fcce5bf8950572258698927fd1dbda10a5e981cdf0ac37f4f002 \
--hash=sha256:5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124
packaging==24.2 \
--hash=sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759 \
--hash=sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f
# via nox
platformdirs==4.3.6 \
--hash=sha256:357fb2acbc885b0419afd3ce3ed34564c13c9b95c89360cd9563f73aa5e2b907 \
Expand All @@ -36,7 +36,7 @@ tomli==2.0.2 \
--hash=sha256:2ebe24485c53d303f690b0ec092806a085f07af5a5aa1464f3931eec36caaa38 \
--hash=sha256:d46d457a85337051c36524bc5349dd91b1877838e2979ac5ced3e710ed8a60ed
# via nox
virtualenv==20.26.6 \
--hash=sha256:280aede09a2a5c317e409a00102e7077c6432c5a38f0ef938e643805a7ad2c48 \
--hash=sha256:7345cc5b25405607a624d8418154577459c3e0277f5466dd79c49d5e492995f2
virtualenv==20.27.1 \
--hash=sha256:142c6be10212543b32c6c45d3d3893dff89112cc588b7d0879ae5a1ec03a47ba \
--hash=sha256:f11f1b8a29525562925f745563bfd48b189450f61fb34c4f9cc79dd5aa32a1f4
# via nox
3 changes: 2 additions & 1 deletion .kokoro/test-samples-impl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ export PYTHONUNBUFFERED=1
env | grep KOKORO

# Install nox
python3.9 -m pip install --upgrade --quiet nox
# `virtualenv==20.26.6` is added for Python 3.7 compatibility
python3.9 -m pip install --upgrade --quiet nox virtualenv==20.26.6

# Use secrets acessor service account to get secrets
if [[ -f "${KOKORO_GFILE_DIR}/secrets_viewer_service_account.json" ]]; then
Expand Down
63 changes: 63 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,69 @@

[1]: https://pypi.org/project/bigframes/#history

## [1.27.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.26.0...v1.27.0) (2024-11-16)


### Features

* Add astype(type, errors='null') to cast safely ([#1122](https://github.com/googleapis/python-bigquery-dataframes/issues/1122)) ([b4d17ff](https://github.com/googleapis/python-bigquery-dataframes/commit/b4d17ffdd891da266ad9765a087d3512c0e056fc))


### Bug Fixes

* Dataframe fillna with scalar. ([#1132](https://github.com/googleapis/python-bigquery-dataframes/issues/1132)) ([37f8c32](https://github.com/googleapis/python-bigquery-dataframes/commit/37f8c32a541565208602f3f6ed37dded13e16b9b))
* Exclude index columns from model fitting processes. ([#1138](https://github.com/googleapis/python-bigquery-dataframes/issues/1138)) ([8d4da15](https://github.com/googleapis/python-bigquery-dataframes/commit/8d4da1582a5965e6a1f9732ec0ce592ea47ce5fa))
* Unordered mode too many labels issue. ([#1148](https://github.com/googleapis/python-bigquery-dataframes/issues/1148)) ([7216b21](https://github.com/googleapis/python-bigquery-dataframes/commit/7216b21abd01bc61878bb5686f83ee13ef297912))


### Documentation

* Document groupby.head and groupby.size methods ([#1111](https://github.com/googleapis/python-bigquery-dataframes/issues/1111)) ([a61eb4d](https://github.com/googleapis/python-bigquery-dataframes/commit/a61eb4d6e323e5001715d402e0e67054df6e62af))

## [1.26.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.25.0...v1.26.0) (2024-11-12)


### Features

* Add basic geopandas functionality ([#962](https://github.com/googleapis/python-bigquery-dataframes/issues/962)) ([3759c63](https://github.com/googleapis/python-bigquery-dataframes/commit/3759c6397eaa3c46c4142aa51ca22be3dc8e4971))
* Support `json_extract_string_array` in the `bigquery` module ([#1131](https://github.com/googleapis/python-bigquery-dataframes/issues/1131)) ([4ef8bac](https://github.com/googleapis/python-bigquery-dataframes/commit/4ef8bacdcc5447ba53c0f354526346f4dec7c5a1))


### Bug Fixes

* Fix Series.to_frame generating string label instead of int where name is None ([#1118](https://github.com/googleapis/python-bigquery-dataframes/issues/1118)) ([14e32b5](https://github.com/googleapis/python-bigquery-dataframes/commit/14e32b51c11c1718128f49ef94e754afc0ac0618))
* Update the API documentation with newly added rep ([#1120](https://github.com/googleapis/python-bigquery-dataframes/issues/1120)) ([72c228b](https://github.com/googleapis/python-bigquery-dataframes/commit/72c228b15627e6047d60ae42740563a6dfea73da))


### Performance Improvements

* Reduce CURRENT_TIMESTAMP queries ([#1114](https://github.com/googleapis/python-bigquery-dataframes/issues/1114)) ([32274b1](https://github.com/googleapis/python-bigquery-dataframes/commit/32274b130849b37d7e587643cf7b6d109455ff38))
* Reduce dry runs from read_gbq with table ([#1129](https://github.com/googleapis/python-bigquery-dataframes/issues/1129)) ([f7e4354](https://github.com/googleapis/python-bigquery-dataframes/commit/f7e435488d630cf4cf493c89ecdde94a95a7a0d7))


### Documentation

* Add file for Classification with a Boosted Treed Model and snippet for preparing sample data ([#1135](https://github.com/googleapis/python-bigquery-dataframes/issues/1135)) ([7ac6639](https://github.com/googleapis/python-bigquery-dataframes/commit/7ac6639fb0e8baf5fb3adf5785dffd8cf9b06702))
* Add snippet for Linear Regression tutorial Predict Outcomes section ([#1101](https://github.com/googleapis/python-bigquery-dataframes/issues/1101)) ([108f4a9](https://github.com/googleapis/python-bigquery-dataframes/commit/108f4a98463596d8df6d381b3580eb72eab41b6e))
* Update `DataFrame` docstrings to include the errors section ([#1127](https://github.com/googleapis/python-bigquery-dataframes/issues/1127)) ([a38d4c4](https://github.com/googleapis/python-bigquery-dataframes/commit/a38d4c422b6b312f6a54d7b1dd105a474ec2e91a))
* Update GroupBy docstrings ([#1103](https://github.com/googleapis/python-bigquery-dataframes/issues/1103)) ([9867a78](https://github.com/googleapis/python-bigquery-dataframes/commit/9867a788e7c46bf0850cacbe7cd41a11fea32d6b))
* Update Session doctrings to include exceptions ([#1130](https://github.com/googleapis/python-bigquery-dataframes/issues/1130)) ([a870421](https://github.com/googleapis/python-bigquery-dataframes/commit/a87042158b181dceee31124fe208926a3bb1071f))

## [1.25.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.24.0...v1.25.0) (2024-10-29)


### Features

* Add the `ground_with_google_search` option for GeminiTextGenerator predict ([#1119](https://github.com/googleapis/python-bigquery-dataframes/issues/1119)) ([ca02cd4](https://github.com/googleapis/python-bigquery-dataframes/commit/ca02cd4b87d354c1e01c670cd9d4e36fa74896f5))
* Add warning when user tries to access struct series fields with `__getitem__` ([#1082](https://github.com/googleapis/python-bigquery-dataframes/issues/1082)) ([20e5c58](https://github.com/googleapis/python-bigquery-dataframes/commit/20e5c58868af8b18595d5635cb7722da4f622eb5))
* Allow `fit` to take additional eval data in linear and ensemble models ([#1096](https://github.com/googleapis/python-bigquery-dataframes/issues/1096)) ([254875c](https://github.com/googleapis/python-bigquery-dataframes/commit/254875c25f39df4bc477e1ed7339ecb30b395ab6))
* Support context manager for bigframes session ([#1107](https://github.com/googleapis/python-bigquery-dataframes/issues/1107)) ([5f7b8b1](https://github.com/googleapis/python-bigquery-dataframes/commit/5f7b8b189c093629d176ffc99364767dc766397a))


### Performance Improvements

* Improve series.unique performance and replace drop_duplicates i… ([#1108](https://github.com/googleapis/python-bigquery-dataframes/issues/1108)) ([499f24a](https://github.com/googleapis/python-bigquery-dataframes/commit/499f24a5f22ce484db96eb09cd3a0ce972398d81))

## [1.24.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.23.0...v1.24.0) (2024-10-24)


Expand Down
6 changes: 4 additions & 2 deletions bigframes/_config/bigquery_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,8 +235,10 @@ def use_regional_endpoints(self) -> bool:
.. note::
Use of regional endpoints is a feature in Preview and available only
in regions "europe-west3", "europe-west9", "europe-west8",
"me-central2", "us-east4" and "us-west1".
in regions "europe-west3", "europe-west8", "europe-west9",
"me-central2", "us-central1", "us-central2", "us-east1", "us-east4",
"us-east5", "us-east7", "us-south1", "us-west1", "us-west2", "us-west3"
and "us-west4".
.. deprecated:: 0.13.0
Use of locational endpoints is available only in selected projects.
Expand Down
13 changes: 13 additions & 0 deletions bigframes/_config/experiment_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class ExperimentOptions:

def __init__(self):
self._semantic_operators = False
self._blob = False

@property
def semantic_operators(self) -> bool:
Expand All @@ -34,3 +35,15 @@ def semantic_operators(self, value: bool):
"Semantic operators are still under experiments, and are subject to change in the future."
)
self._semantic_operators = value

@property
def blob(self) -> bool:
return self._blob

@blob.setter
def blob(self, value: bool):
if value is True:
warnings.warn(
"BigFrames Blob is still under experiments. It may not work and subject to change in the future."
)
self._blob = value
2 changes: 2 additions & 0 deletions bigframes/bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from bigframes.bigquery._operations.json import (
json_extract,
json_extract_array,
json_extract_string_array,
json_set,
)
from bigframes.bigquery._operations.search import create_vector_index, vector_search
Expand All @@ -37,6 +38,7 @@
"json_set",
"json_extract",
"json_extract_array",
"json_extract_string_array",
"approx_top_count",
"struct",
"create_vector_index",
Expand Down
119 changes: 104 additions & 15 deletions bigframes/bigquery/_operations/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,17 @@

from __future__ import annotations

from typing import Any, Sequence, Tuple
from typing import Any, cast, Optional, Sequence, Tuple, Union

import bigframes.dtypes
import bigframes.operations as ops
import bigframes.series as series

from . import array


def json_set(
series: series.Series,
input: series.Series,
json_path_value_pairs: Sequence[Tuple[str, Any]],
) -> series.Series:
"""Produces a new JSON value within a Series by inserting or replacing values at
Expand All @@ -47,7 +50,7 @@ def json_set(
Name: data, dtype: string
Args:
series (bigframes.series.Series):
input (bigframes.series.Series):
The Series containing JSON data (as native JSON objects or JSON-formatted strings).
json_path_value_pairs (Sequence[Tuple[str, Any]]):
Pairs of JSON path and the new value to insert/replace.
Expand All @@ -59,6 +62,7 @@ def json_set(
# SQLGlot parser does not support the "create_if_missing => true" syntax, so
# create_if_missing is not currently implemented.

result = input
for json_path_value_pair in json_path_value_pairs:
if len(json_path_value_pair) != 2:
raise ValueError(
Expand All @@ -67,14 +71,14 @@ def json_set(
)

json_path, json_value = json_path_value_pair
series = series._apply_binary_op(
result = result._apply_binary_op(
json_value, ops.JSONSet(json_path=json_path), alignment="left"
)
return series
return result


def json_extract(
series: series.Series,
input: series.Series,
json_path: str,
) -> series.Series:
"""Extracts a JSON value and converts it to a SQL JSON-formatted `STRING` or `JSON`
Expand All @@ -93,24 +97,24 @@ def json_extract(
dtype: string
Args:
series (bigframes.series.Series):
input (bigframes.series.Series):
The Series containing JSON data (as native JSON objects or JSON-formatted strings).
json_path (str):
The JSON path identifying the data that you want to obtain from the input.
Returns:
bigframes.series.Series: A new Series with the JSON or JSON-formatted STRING.
"""
return series._apply_unary_op(ops.JSONExtract(json_path=json_path))
return input._apply_unary_op(ops.JSONExtract(json_path=json_path))


def json_extract_array(
series: series.Series,
input: series.Series,
json_path: str = "$",
) -> series.Series:
"""Extracts a JSON array and converts it to a SQL array of JSON-formatted `STRING` or `JSON`
values. This function uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys.
"""Extracts a JSON array and converts it to a SQL array of JSON-formatted
`STRING` or `JSON` values. This function uses single quotes and brackets to
escape invalid JSONPath characters in JSON keys.
**Examples:**
Expand All @@ -124,13 +128,98 @@ def json_extract_array(
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
... '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_extract_array(s, "$.fruits")
0 ['{"name":"apple"}' '{"name":"cherry"}']
1 ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_array(s, "$.fruits.names")
0 ['"apple"' '"cherry"']
1 ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow]
Args:
series (bigframes.series.Series):
input (bigframes.series.Series):
The Series containing JSON data (as native JSON objects or JSON-formatted strings).
json_path (str):
The JSON path identifying the data that you want to obtain from the input.
Returns:
bigframes.series.Series: A new Series with the JSON or JSON-formatted STRING.
bigframes.series.Series: A new Series with the parsed arrays from the input.
"""
return series._apply_unary_op(ops.JSONExtractArray(json_path=json_path))
return input._apply_unary_op(ops.JSONExtractArray(json_path=json_path))


def json_extract_string_array(
input: series.Series,
json_path: str = "$",
value_dtype: Optional[
Union[bigframes.dtypes.Dtype, bigframes.dtypes.DtypeString]
] = None,
) -> series.Series:
"""Extracts a JSON array and converts it to a SQL array of `STRING` values.
A `value_dtype` can be provided to further coerce the data type of the
values in the array. This function uses single quotes and brackets to escape
invalid JSONPath characters in JSON keys.
**Examples:**
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_string_array(s)
0 ['1' '2' '3']
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> bbq.json_extract_string_array(s, value_dtype='Int64')
0 [1 2 3]
1 [4 5]
dtype: list<item: int64>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_string_array(s, "$.fruits.names")
0 ['apple' 'cherry']
1 ['guava' 'grapes']
dtype: list<item: string>[pyarrow]
Args:
input (bigframes.series.Series):
The Series containing JSON data (as native JSON objects or JSON-formatted strings).
json_path (str):
The JSON path identifying the data that you want to obtain from the input.
value_dtype (dtype, Optional):
The data type supported by BigFrames DataFrame.
Returns:
bigframes.series.Series: A new Series with the parsed arrays from the input.
"""
array_series = input._apply_unary_op(
ops.JSONExtractStringArray(json_path=json_path)
)
if value_dtype not in [None, bigframes.dtypes.STRING_DTYPE]:
array_items_series = array_series.explode()
if value_dtype == bigframes.dtypes.BOOL_DTYPE:
array_items_series = array_items_series.str.lower() == "true"
else:
array_items_series = array_items_series.astype(value_dtype)
array_series = cast(
series.Series,
array.array_agg(
array_items_series.groupby(level=input.index.names, dropna=False)
),
)
return array_series
Loading

0 comments on commit c89b938

Please sign in to comment.