Merge branch 'main' into tswast-ordering-mode-partial-sample

googleapis · Nov 25, 2024 · c89b938 · c89b938
2 parents a6ab522 + 9015c33
commit c89b938
Show file tree

Hide file tree

Showing 119 changed files with 4,252 additions and 724 deletions.
diff --git a/.github/.OwlBot.lock.yaml b/.github/.OwlBot.lock.yaml
@@ -13,5 +13,5 @@
 # limitations under the License.
 docker:
   image: gcr.io/cloud-devrel-public-resources/owlbot-python:latest
-  digest: sha256:5efdf8d38e5a22c1ec9e5541cbdfde56399bdffcb6f531183f84ac66052a8024
-# created: 2024-10-23T18:04:53.195998718Z
+  digest: sha256:2ed982f884312e4883e01b5ab8af8b6935f0216a5a2d82928d273081fc3be562
+# created: 2024-11-12T12:09:45.821174897Z
diff --git a/.kokoro/docker/docs/requirements.txt b/.kokoro/docker/docs/requirements.txt
@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with Python 3.9
+# This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
 #    pip-compile --allow-unsafe --generate-hashes requirements.in
@@ -8,9 +8,9 @@ argcomplete==3.5.1 \
     --hash=sha256:1a1d148bdaa3e3b93454900163403df41448a248af01b6e849edc5ac08e6c363 \
     --hash=sha256:eb1ee355aa2557bd3d0145de7b06b2a45b0ce461e1e7813f5d066039ab4177b4
     # via nox
-colorlog==6.8.2 \
-    --hash=sha256:3e3e079a41feb5a1b64f978b5ea4f46040a94f11f0e8bbb8261e3dbbeca64d44 \
-    --hash=sha256:4dcbb62368e2800cb3c5abd348da7e53f6c362dda502ec27c560b2e58a66bd33
+colorlog==6.9.0 \
+    --hash=sha256:5906e71acd67cb07a71e779c47c4bcb45fb8c2993eebe9e5adcd6a6f1b283eff \
+    --hash=sha256:bfba54a1b93b94f54e1f4fe48395725a3d92fd2a4af702f6bd70946bdc0c6ac2
     # via nox
 distlib==0.3.9 \
     --hash=sha256:47f8c22fd27c27e25a65601af709b38e4f0a45ea4fc2e710f65755fa8caaaf87 \
@@ -24,9 +24,9 @@ nox==2024.10.9 \
     --hash=sha256:1d36f309a0a2a853e9bccb76bbef6bb118ba92fa92674d15604ca99adeb29eab \
     --hash=sha256:7aa9dc8d1c27e9f45ab046ffd1c3b2c4f7c91755304769df231308849ebded95
     # via -r requirements.in
-packaging==24.1 \
-    --hash=sha256:026ed72c8ed3fcce5bf8950572258698927fd1dbda10a5e981cdf0ac37f4f002 \
-    --hash=sha256:5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124
+packaging==24.2 \
+    --hash=sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759 \
+    --hash=sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f
     # via nox
 platformdirs==4.3.6 \
     --hash=sha256:357fb2acbc885b0419afd3ce3ed34564c13c9b95c89360cd9563f73aa5e2b907 \
@@ -36,7 +36,7 @@ tomli==2.0.2 \
     --hash=sha256:2ebe24485c53d303f690b0ec092806a085f07af5a5aa1464f3931eec36caaa38 \
     --hash=sha256:d46d457a85337051c36524bc5349dd91b1877838e2979ac5ced3e710ed8a60ed
     # via nox
-virtualenv==20.26.6 \
-    --hash=sha256:280aede09a2a5c317e409a00102e7077c6432c5a38f0ef938e643805a7ad2c48 \
-    --hash=sha256:7345cc5b25405607a624d8418154577459c3e0277f5466dd79c49d5e492995f2
+virtualenv==20.27.1 \
+    --hash=sha256:142c6be10212543b32c6c45d3d3893dff89112cc588b7d0879ae5a1ec03a47ba \
+    --hash=sha256:f11f1b8a29525562925f745563bfd48b189450f61fb34c4f9cc79dd5aa32a1f4
     # via nox
diff --git a/.kokoro/test-samples-impl.sh b/.kokoro/test-samples-impl.sh
@@ -33,7 +33,8 @@ export PYTHONUNBUFFERED=1
 env | grep KOKORO
 
 # Install nox
-python3.9 -m pip install --upgrade --quiet nox
+# `virtualenv==20.26.6` is added for Python 3.7 compatibility
+python3.9 -m pip install --upgrade --quiet nox virtualenv==20.26.6
 
 # Use secrets acessor service account to get secrets
 if [[ -f "${KOKORO_GFILE_DIR}/secrets_viewer_service_account.json" ]]; then

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,69 @@
 
 [1]: https://pypi.org/project/bigframes/#history
 
+## [1.27.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.26.0...v1.27.0) (2024-11-16)
+
+
+### Features
+
+* Add astype(type, errors='null') to cast safely ([#1122](https://github.com/googleapis/python-bigquery-dataframes/issues/1122)) ([b4d17ff](https://github.com/googleapis/python-bigquery-dataframes/commit/b4d17ffdd891da266ad9765a087d3512c0e056fc))
+
+
+### Bug Fixes
+
+* Dataframe fillna with scalar. ([#1132](https://github.com/googleapis/python-bigquery-dataframes/issues/1132)) ([37f8c32](https://github.com/googleapis/python-bigquery-dataframes/commit/37f8c32a541565208602f3f6ed37dded13e16b9b))
+* Exclude index columns from model fitting processes. ([#1138](https://github.com/googleapis/python-bigquery-dataframes/issues/1138)) ([8d4da15](https://github.com/googleapis/python-bigquery-dataframes/commit/8d4da1582a5965e6a1f9732ec0ce592ea47ce5fa))
+* Unordered mode too many labels issue. ([#1148](https://github.com/googleapis/python-bigquery-dataframes/issues/1148)) ([7216b21](https://github.com/googleapis/python-bigquery-dataframes/commit/7216b21abd01bc61878bb5686f83ee13ef297912))
+
+
+### Documentation
+
+* Document groupby.head and groupby.size methods ([#1111](https://github.com/googleapis/python-bigquery-dataframes/issues/1111)) ([a61eb4d](https://github.com/googleapis/python-bigquery-dataframes/commit/a61eb4d6e323e5001715d402e0e67054df6e62af))
+
+## [1.26.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.25.0...v1.26.0) (2024-11-12)
+
+
+### Features
+
+* Add basic geopandas functionality ([#962](https://github.com/googleapis/python-bigquery-dataframes/issues/962)) ([3759c63](https://github.com/googleapis/python-bigquery-dataframes/commit/3759c6397eaa3c46c4142aa51ca22be3dc8e4971))
+* Support `json_extract_string_array` in the `bigquery` module ([#1131](https://github.com/googleapis/python-bigquery-dataframes/issues/1131)) ([4ef8bac](https://github.com/googleapis/python-bigquery-dataframes/commit/4ef8bacdcc5447ba53c0f354526346f4dec7c5a1))
+
+
+### Bug Fixes
+
+* Fix Series.to_frame generating string label instead of int where name is None ([#1118](https://github.com/googleapis/python-bigquery-dataframes/issues/1118)) ([14e32b5](https://github.com/googleapis/python-bigquery-dataframes/commit/14e32b51c11c1718128f49ef94e754afc0ac0618))
+* Update the API documentation with newly added rep ([#1120](https://github.com/googleapis/python-bigquery-dataframes/issues/1120)) ([72c228b](https://github.com/googleapis/python-bigquery-dataframes/commit/72c228b15627e6047d60ae42740563a6dfea73da))
+
+
+### Performance Improvements
+
+* Reduce CURRENT_TIMESTAMP queries ([#1114](https://github.com/googleapis/python-bigquery-dataframes/issues/1114)) ([32274b1](https://github.com/googleapis/python-bigquery-dataframes/commit/32274b130849b37d7e587643cf7b6d109455ff38))
+* Reduce dry runs from read_gbq with table ([#1129](https://github.com/googleapis/python-bigquery-dataframes/issues/1129)) ([f7e4354](https://github.com/googleapis/python-bigquery-dataframes/commit/f7e435488d630cf4cf493c89ecdde94a95a7a0d7))
+
+
+### Documentation
+
+* Add file for Classification with a Boosted Treed Model and snippet for preparing sample data ([#1135](https://github.com/googleapis/python-bigquery-dataframes/issues/1135)) ([7ac6639](https://github.com/googleapis/python-bigquery-dataframes/commit/7ac6639fb0e8baf5fb3adf5785dffd8cf9b06702))
+* Add snippet for Linear Regression tutorial Predict Outcomes section ([#1101](https://github.com/googleapis/python-bigquery-dataframes/issues/1101)) ([108f4a9](https://github.com/googleapis/python-bigquery-dataframes/commit/108f4a98463596d8df6d381b3580eb72eab41b6e))
+* Update `DataFrame` docstrings to include the errors section ([#1127](https://github.com/googleapis/python-bigquery-dataframes/issues/1127)) ([a38d4c4](https://github.com/googleapis/python-bigquery-dataframes/commit/a38d4c422b6b312f6a54d7b1dd105a474ec2e91a))
+* Update GroupBy docstrings ([#1103](https://github.com/googleapis/python-bigquery-dataframes/issues/1103)) ([9867a78](https://github.com/googleapis/python-bigquery-dataframes/commit/9867a788e7c46bf0850cacbe7cd41a11fea32d6b))
+* Update Session doctrings to include exceptions ([#1130](https://github.com/googleapis/python-bigquery-dataframes/issues/1130)) ([a870421](https://github.com/googleapis/python-bigquery-dataframes/commit/a87042158b181dceee31124fe208926a3bb1071f))
+
+## [1.25.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.24.0...v1.25.0) (2024-10-29)
+
+
+### Features
+
+* Add the `ground_with_google_search` option for GeminiTextGenerator predict ([#1119](https://github.com/googleapis/python-bigquery-dataframes/issues/1119)) ([ca02cd4](https://github.com/googleapis/python-bigquery-dataframes/commit/ca02cd4b87d354c1e01c670cd9d4e36fa74896f5))
+* Add warning when user tries to access struct series fields with `__getitem__` ([#1082](https://github.com/googleapis/python-bigquery-dataframes/issues/1082)) ([20e5c58](https://github.com/googleapis/python-bigquery-dataframes/commit/20e5c58868af8b18595d5635cb7722da4f622eb5))
+* Allow `fit` to take additional eval data in linear and ensemble models ([#1096](https://github.com/googleapis/python-bigquery-dataframes/issues/1096)) ([254875c](https://github.com/googleapis/python-bigquery-dataframes/commit/254875c25f39df4bc477e1ed7339ecb30b395ab6))
+* Support context manager for bigframes session ([#1107](https://github.com/googleapis/python-bigquery-dataframes/issues/1107)) ([5f7b8b1](https://github.com/googleapis/python-bigquery-dataframes/commit/5f7b8b189c093629d176ffc99364767dc766397a))
+
+
+### Performance Improvements
+
+* Improve series.unique performance and replace drop_duplicates i… ([#1108](https://github.com/googleapis/python-bigquery-dataframes/issues/1108)) ([499f24a](https://github.com/googleapis/python-bigquery-dataframes/commit/499f24a5f22ce484db96eb09cd3a0ce972398d81))
+
 ## [1.24.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.23.0...v1.24.0) (2024-10-24)
 
 

diff --git a/bigframes/_config/bigquery_options.py b/bigframes/_config/bigquery_options.py
@@ -235,8 +235,10 @@ def use_regional_endpoints(self) -> bool:
 
         .. note::
             Use of regional endpoints is a feature in Preview and available only
-            in regions "europe-west3", "europe-west9", "europe-west8",
-            "me-central2", "us-east4" and "us-west1".
+            in regions "europe-west3", "europe-west8", "europe-west9",
+            "me-central2", "us-central1", "us-central2", "us-east1", "us-east4",
+            "us-east5", "us-east7", "us-south1", "us-west1", "us-west2", "us-west3"
+            and "us-west4".
 
         .. deprecated:: 0.13.0
             Use of locational endpoints is available only in selected projects.

diff --git a/bigframes/_config/experiment_options.py b/bigframes/_config/experiment_options.py
@@ -22,6 +22,7 @@ class ExperimentOptions:
 
     def __init__(self):
         self._semantic_operators = False
+        self._blob = False
 
     @property
     def semantic_operators(self) -> bool:
@@ -34,3 +35,15 @@ def semantic_operators(self, value: bool):
                 "Semantic operators are still under experiments, and are subject to change in the future."
             )
         self._semantic_operators = value
+
+    @property
+    def blob(self) -> bool:
+        return self._blob
+
+    @blob.setter
+    def blob(self, value: bool):
+        if value is True:
+            warnings.warn(
+                "BigFrames Blob is still under experiments. It may not work and subject to change in the future."
+            )
+        self._blob = value
diff --git a/bigframes/bigquery/__init__.py b/bigframes/bigquery/__init__.py
@@ -25,6 +25,7 @@
 from bigframes.bigquery._operations.json import (
     json_extract,
     json_extract_array,
+    json_extract_string_array,
     json_set,
 )
 from bigframes.bigquery._operations.search import create_vector_index, vector_search
@@ -37,6 +38,7 @@
     "json_set",
     "json_extract",
     "json_extract_array",
+    "json_extract_string_array",
     "approx_top_count",
     "struct",
     "create_vector_index",

diff --git a/bigframes/bigquery/_operations/json.py b/bigframes/bigquery/_operations/json.py
@@ -21,14 +21,17 @@
 
 from __future__ import annotations
 
-from typing import Any, Sequence, Tuple
+from typing import Any, cast, Optional, Sequence, Tuple, Union
 
+import bigframes.dtypes
 import bigframes.operations as ops
 import bigframes.series as series
 
+from . import array
+
 
 def json_set(
-    series: series.Series,
+    input: series.Series,
     json_path_value_pairs: Sequence[Tuple[str, Any]],
 ) -> series.Series:
     """Produces a new JSON value within a Series by inserting or replacing values at
@@ -47,7 +50,7 @@ def json_set(
             Name: data, dtype: string
 
     Args:
-        series (bigframes.series.Series):
+        input (bigframes.series.Series):
             The Series containing JSON data (as native JSON objects or JSON-formatted strings).
         json_path_value_pairs (Sequence[Tuple[str, Any]]):
             Pairs of JSON path and the new value to insert/replace.
@@ -59,6 +62,7 @@ def json_set(
     # SQLGlot parser does not support the "create_if_missing => true" syntax, so
     # create_if_missing is not currently implemented.
 
+    result = input
     for json_path_value_pair in json_path_value_pairs:
         if len(json_path_value_pair) != 2:
             raise ValueError(
@@ -67,14 +71,14 @@ def json_set(
             )
 
         json_path, json_value = json_path_value_pair
-        series = series._apply_binary_op(
+        result = result._apply_binary_op(
             json_value, ops.JSONSet(json_path=json_path), alignment="left"
         )
-    return series
+    return result
 
 
 def json_extract(
-    series: series.Series,
+    input: series.Series,
     json_path: str,
 ) -> series.Series:
     """Extracts a JSON value and converts it to a SQL JSON-formatted `STRING` or `JSON`
@@ -93,24 +97,24 @@ def json_extract(
         dtype: string
 
     Args:
-        series (bigframes.series.Series):
+        input (bigframes.series.Series):
             The Series containing JSON data (as native JSON objects or JSON-formatted strings).
         json_path (str):
             The JSON path identifying the data that you want to obtain from the input.
 
     Returns:
         bigframes.series.Series: A new Series with the JSON or JSON-formatted STRING.
     """
-    return series._apply_unary_op(ops.JSONExtract(json_path=json_path))
+    return input._apply_unary_op(ops.JSONExtract(json_path=json_path))
 
 
 def json_extract_array(
-    series: series.Series,
+    input: series.Series,
     json_path: str = "$",
 ) -> series.Series:
-    """Extracts a JSON array and converts it to a SQL array of JSON-formatted `STRING` or `JSON`
-    values. This function uses single quotes and brackets to escape invalid JSONPath
-    characters in JSON keys.
+    """Extracts a JSON array and converts it to a SQL array of JSON-formatted
+    `STRING` or `JSON` values. This function uses single quotes and brackets to
+    escape invalid JSONPath characters in JSON keys.
 
     **Examples:**
 
@@ -124,13 +128,98 @@ def json_extract_array(
         1        ['4' '5']
         dtype: list<item: string>[pyarrow]
 
+        >>> s = bpd.Series([
+        ...   '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
+        ...   '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
+        ... ])
+        >>> bbq.json_extract_array(s, "$.fruits")
+        0    ['{"name":"apple"}' '{"name":"cherry"}']
+        1    ['{"name":"guava"}' '{"name":"grapes"}']
+        dtype: list<item: string>[pyarrow]
+
+        >>> s = bpd.Series([
+        ...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
+        ...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
+        ... ])
+        >>> bbq.json_extract_array(s, "$.fruits.names")
+        0    ['"apple"' '"cherry"']
+        1    ['"guava"' '"grapes"']
+        dtype: list<item: string>[pyarrow]
+
     Args:
-        series (bigframes.series.Series):
+        input (bigframes.series.Series):
             The Series containing JSON data (as native JSON objects or JSON-formatted strings).
         json_path (str):
             The JSON path identifying the data that you want to obtain from the input.
 
     Returns:
-        bigframes.series.Series: A new Series with the JSON or JSON-formatted STRING.
+        bigframes.series.Series: A new Series with the parsed arrays from the input.
     """
-    return series._apply_unary_op(ops.JSONExtractArray(json_path=json_path))
+    return input._apply_unary_op(ops.JSONExtractArray(json_path=json_path))
+
+
+def json_extract_string_array(
+    input: series.Series,
+    json_path: str = "$",
+    value_dtype: Optional[
+        Union[bigframes.dtypes.Dtype, bigframes.dtypes.DtypeString]
+    ] = None,
+) -> series.Series:
+    """Extracts a JSON array and converts it to a SQL array of `STRING` values.
+    A `value_dtype` can be provided to further coerce the data type of the
+    values in the array. This function uses single quotes and brackets to escape
+    invalid JSONPath characters in JSON keys.
+
+    **Examples:**
+
+        >>> import bigframes.pandas as bpd
+        >>> import bigframes.bigquery as bbq
+        >>> bpd.options.display.progress_bar = None
+
+        >>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
+        >>> bbq.json_extract_string_array(s)
+        0    ['1' '2' '3']
+        1        ['4' '5']
+        dtype: list<item: string>[pyarrow]
+
+        >>> bbq.json_extract_string_array(s, value_dtype='Int64')
+        0    [1 2 3]
+        1      [4 5]
+        dtype: list<item: int64>[pyarrow]
+
+        >>> s = bpd.Series([
+        ...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
+        ...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
+        ... ])
+        >>> bbq.json_extract_string_array(s, "$.fruits.names")
+        0    ['apple' 'cherry']
+        1    ['guava' 'grapes']
+        dtype: list<item: string>[pyarrow]
+
+    Args:
+        input (bigframes.series.Series):
+            The Series containing JSON data (as native JSON objects or JSON-formatted strings).
+        json_path (str):
+            The JSON path identifying the data that you want to obtain from the input.
+        value_dtype (dtype, Optional):
+            The data type supported by BigFrames DataFrame.
+
+    Returns:
+        bigframes.series.Series: A new Series with the parsed arrays from the input.
+    """
+    array_series = input._apply_unary_op(
+        ops.JSONExtractStringArray(json_path=json_path)
+    )
+    if value_dtype not in [None, bigframes.dtypes.STRING_DTYPE]:
+        array_items_series = array_series.explode()
+        if value_dtype == bigframes.dtypes.BOOL_DTYPE:
+            array_items_series = array_items_series.str.lower() == "true"
+        else:
+            array_items_series = array_items_series.astype(value_dtype)
+        array_series = cast(
+            series.Series,
+            array.array_agg(
+                array_items_series.groupby(level=input.index.names, dropna=False)
+            ),
+        )
+    return array_series