Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for INTERVAL data type to list_rows #840

Merged
merged 39 commits into from
Oct 26, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5704f1e
test: refactor `list_rows` tests and add test for scalars
tswast Jul 28, 2021
5eb3794
WIP: INTERVAL support
tswast Jul 28, 2021
9e54277
feat: add support for INTERVAL data type to `list_rows`
tswast Jul 28, 2021
a89c1c1
Merge remote-tracking branch 'upstream/master' into issue826-interval
tswast Jul 29, 2021
60d9ca7
fix relativedelta construction for non-microseconds
tswast Jul 29, 2021
da6ef5b
WIP: support INTERVAL query params
tswast Jul 29, 2021
b73a610
remove dead code
tswast Jul 29, 2021
52f2b7b
INTERVAL not supported in query parameters
tswast Jul 29, 2021
ca8872e
Merge remote-tracking branch 'upstream/master' into issue826-interval
tswast Jul 29, 2021
018a617
Merge branch 'master' into issue826-interval
tswast Aug 3, 2021
2872d85
Merge remote-tracking branch 'upstream/master' into issue826-interval
tswast Aug 3, 2021
31c3f92
revert query parameter changes
tswast Aug 3, 2021
e3a3a6a
Merge remote-tracking branch 'origin/issue826-interval' into issue826…
tswast Aug 3, 2021
5f78311
Merge branch 'master' into issue826-interval
tswast Aug 5, 2021
bb03618
add validation error for interval
tswast Aug 5, 2021
f3711e7
add unit tests for extreme intervals
tswast Aug 5, 2021
68035ba
add dateutil to intersphinx
tswast Aug 11, 2021
9a011e9
Merge remote-tracking branch 'upstream/master' into issue826-interval
tswast Aug 11, 2021
9f6b02d
use dictionary for intersphinx
tswast Aug 11, 2021
7cccbd2
🦉 Updates from OwlBot
gcf-owl-bot[bot] Aug 11, 2021
152e8c2
Merge remote-tracking branch 'upstream/master' into issue826-interval
tswast Aug 16, 2021
f0f3fbd
Merge remote-tracking branch 'origin/issue826-interval' into issue826…
tswast Aug 16, 2021
c4636fa
🦉 Updates from OwlBot
gcf-owl-bot[bot] Aug 16, 2021
18aae17
Merge branch 'master' into issue826-interval
tswast Aug 25, 2021
eccea82
Merge branch 'main' into issue826-interval
tswast Sep 10, 2021
5cdeffb
Merge remote-tracking branch 'upstream/main' into issue826-interval
tswast Sep 30, 2021
0497b19
add test case for trailing .
tswast Sep 30, 2021
92f41b9
Merge remote-tracking branch 'origin/issue826-interval' into issue826…
tswast Sep 30, 2021
0318f54
explicit none
tswast Sep 30, 2021
6b1f238
🦉 Updates from OwlBot
gcf-owl-bot[bot] Sep 30, 2021
54e47f7
truncate nanoseconds
tswast Oct 4, 2021
ced356b
Merge remote-tracking branch 'origin/issue826-interval' into issue826…
tswast Oct 4, 2021
dcc8b57
use \d group for digits
tswast Oct 4, 2021
5c31fe2
Merge branch 'main' into issue826-interval
tswast Oct 5, 2021
2091478
Merge branch 'main' into issue826-interval
tswast Oct 25, 2021
21a2975
Merge branch 'main' into issue826-interval
plamut Oct 26, 2021
87b0d81
Merge remote-tracking branch 'upstream/main' into issue826-interval
tswast Oct 26, 2021
7bd48be
Merge remote-tracking branch 'origin/issue826-interval' into issue826…
tswast Oct 26, 2021
d62b950
use \d for consistency
tswast Oct 26, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,7 @@
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"python": ("https://python.readthedocs.org/en/latest/", None),
"dateutil": ("https://dateutil.readthedocs.io/en/latest/", None),
tswast marked this conversation as resolved.
Show resolved Hide resolved
"google-auth": ("https://googleapis.dev/python/google-auth/latest/", None),
"google.api_core": ("https://googleapis.dev/python/google-api-core/latest/", None,),
"grpc": ("https://grpc.github.io/grpc/python/", None),
Expand Down
47 changes: 46 additions & 1 deletion google/cloud/bigquery/_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@
import decimal
import math
import re
from typing import Union
from typing import Optional, Union

from dateutil import relativedelta
from google.cloud._helpers import UTC
from google.cloud._helpers import _date_from_iso8601_date
from google.cloud._helpers import _datetime_from_microseconds
Expand All @@ -42,6 +43,17 @@
re.VERBOSE,
)

# BigQuery sends data in "canonical format"
# https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type
_INTERVAL_PATTERN = re.compile(
r"(?P<calendar_sign>-?)(?P<years>[0-9]+)-(?P<months>[0-9]+) "
r"(?P<days>-?[0-9]+) "
tswast marked this conversation as resolved.
Show resolved Hide resolved
r"(?P<time_sign>-?)(?P<hours>[0-9]+):(?P<minutes>[0-9]+):"
r"(?P<seconds>[0-9]+)\.?(?P<fraction>[0-9]+)?$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a lone dot without fractional digits for microseconds possible? In other words, do we accept H:M:S. and interpret the missing digits as zero? (which is what Python does, 123. is acceptable form of 123.0).

If not, it might make more sense to treat the .F part as an atomic optional part, because in the current from, the regex hints that the time part can also be H:M:SF, but of course the seconds part will eat the F part, too, since we don't limit the number of matching digits.

Do we know yet what the limits on the number if digits are? I assume the backend does not allow multiple representations of the same interval? (e.g. 125 seconds cannot be expressed as 0:0:125, but only as 0:2:5)
(although relativedelta can handle that just fine)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that regex has a bigger problem, fraction can be present without the dot -- well not really, but it kind of looks like that at first glance and makes me think too hard :).

I think

r"(?P<seconds>[0-9]+)(?P<fraction>[.][0-9]*)?

would be better.

Or even:

r"(?P<seconds>[0-9]+(?:[.][0-9]*)?)

then something like (ignoring the time sign):

seconds, microseconds = divmod(float(seconds), 1)
micoseconds = int(micoseconds*1000000)

which seems simpler to me than the current logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that regex has a bigger problem, fraction can be present without the dot -- well not really, but it kind of looks like that at first glance and makes me think too hard :)

Indeed, that was the main point, perhaps I expressed it in a too convoluted way, sorry. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

divmod looks elegant, but I found a rounding problem when I attempted to use this implementation. :-( googleapis/python-db-dtypes-pandas#18

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the backend does not allow multiple representations of the same interval?

When sending data, it seemed to allow it. But when getting data from the API, it always normalized it.

)

# TODO: BigQuery receives data in ISO 8601 duration format

_MIN_BQ_STORAGE_VERSION = packaging.version.Version("2.0.0")
_BQ_STORAGE_OPTIONAL_READ_SESSION_VERSION = packaging.version.Version("2.6.0")

Expand Down Expand Up @@ -114,6 +126,38 @@ def _int_from_json(value, field):
return int(value)


def _interval_from_json(
value: Optional[str], field
) -> Optional[relativedelta.relativedelta]:
"""Coerce 'value' to an interval, if set or not nullable."""
if not _not_null(value, field):
return
tswast marked this conversation as resolved.
Show resolved Hide resolved
if value is None:
raise TypeError(f"got {value} for REQUIRED field: {repr(field)}")

parsed = _INTERVAL_PATTERN.match(value)
tswast marked this conversation as resolved.
Show resolved Hide resolved
calendar_sign = -1 if parsed.group("calendar_sign") == "-" else 1
years = calendar_sign * int(parsed.group("years"))
months = calendar_sign * int(parsed.group("months"))
days = int(parsed.group("days"))
time_sign = -1 if parsed.group("time_sign") == "-" else 1
hours = time_sign * int(parsed.group("hours"))
minutes = time_sign * int(parsed.group("minutes"))
seconds = time_sign * int(parsed.group("seconds"))
fraction = parsed.group("fraction")
microseconds = time_sign * int(fraction.ljust(6, "0")) if fraction else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell from https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type if there are limits on any of these fields. 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The docs don't really explain much about how this data format actually works. From what is documented and trial and error, I think these are the limits:

Min

  • years: -10000
  • months: -11 (I think?)
  • days: -3660000
  • hours: -87840000
  • minutes: -59 (I think?)
  • seconds: -59 (I think?)
  • microseconds: -999999 (I think?)

Max

  • years: 10000
  • months: 11 (I think?)
  • days: 3660000
  • hours: 87840000
  • minutes: 59 (I think?)
  • seconds: 59 (I think?)
  • microseconds: 999999 (I think?)

I don't think we'll need any client-side validation for this, but it does remind me that we should have some unit tests that exercise these limits.


return relativedelta.relativedelta(
years=years,
months=months,
days=days,
hours=hours,
minutes=minutes,
seconds=seconds,
microseconds=microseconds,
)


def _float_from_json(value, field):
"""Coerce 'value' to a float, if set or not nullable."""
if _not_null(value, field):
Expand Down Expand Up @@ -250,6 +294,7 @@ def _record_from_json(value, field):
_CELLDATA_FROM_JSON = {
"INTEGER": _int_from_json,
"INT64": _int_from_json,
"INTERVAL": _interval_from_json,
"FLOAT": _float_from_json,
"FLOAT64": _float_from_json,
"NUMERIC": _decimal_from_json,
Expand Down
25 changes: 13 additions & 12 deletions google/cloud/bigquery/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,28 +254,29 @@ class SqlTypeNames(str, enum.Enum):
DATE = "DATE"
TIME = "TIME"
DATETIME = "DATETIME"
INTERVAL = "INTERVAL" # NOTE: not available in legacy types


class SqlParameterScalarTypes:
"""Supported scalar SQL query parameter types as type objects."""

STRING = ScalarQueryParameterType("STRING")
BOOL = ScalarQueryParameterType("BOOL")
BOOLEAN = ScalarQueryParameterType("BOOL")
BIGDECIMAL = ScalarQueryParameterType("BIGNUMERIC")
BIGNUMERIC = ScalarQueryParameterType("BIGNUMERIC")
BYTES = ScalarQueryParameterType("BYTES")
INTEGER = ScalarQueryParameterType("INT64")
INT64 = ScalarQueryParameterType("INT64")
DATE = ScalarQueryParameterType("DATE")
DATETIME = ScalarQueryParameterType("DATETIME")
DECIMAL = ScalarQueryParameterType("NUMERIC")
FLOAT = ScalarQueryParameterType("FLOAT64")
FLOAT64 = ScalarQueryParameterType("FLOAT64")
NUMERIC = ScalarQueryParameterType("NUMERIC")
BIGNUMERIC = ScalarQueryParameterType("BIGNUMERIC")
DECIMAL = ScalarQueryParameterType("NUMERIC")
BIGDECIMAL = ScalarQueryParameterType("BIGNUMERIC")
BOOLEAN = ScalarQueryParameterType("BOOL")
BOOL = ScalarQueryParameterType("BOOL")
GEOGRAPHY = ScalarQueryParameterType("GEOGRAPHY")
TIMESTAMP = ScalarQueryParameterType("TIMESTAMP")
DATE = ScalarQueryParameterType("DATE")
INT64 = ScalarQueryParameterType("INT64")
INTEGER = ScalarQueryParameterType("INT64")
NUMERIC = ScalarQueryParameterType("NUMERIC")
STRING = ScalarQueryParameterType("STRING")
TIME = ScalarQueryParameterType("TIME")
DATETIME = ScalarQueryParameterType("DATETIME")
TIMESTAMP = ScalarQueryParameterType("TIMESTAMP")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to include INTERVAL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did but then reverted it because the query parameter support is incomplete. I could revert the alphabetization, but I figure we should do that at some point.



class WriteDisposition(object):
Expand Down
43 changes: 35 additions & 8 deletions google/cloud/bigquery/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@

from collections import OrderedDict
import copy
from typing import Union
import datetime
import decimal
from typing import Optional, Union

import dateutil.relativedelta

from google.cloud.bigquery.table import _parse_schema_resource
from google.cloud.bigquery._helpers import _rows_from_json
Expand Down Expand Up @@ -329,18 +333,41 @@ class ScalarQueryParameter(_AbstractQueryParameter):
Parameter name, used via ``@foo`` syntax. If None, the
parameter can only be addressed via position (``?``).

type_ (str):
Name of parameter type. One of 'STRING', 'INT64',
'FLOAT64', 'NUMERIC', 'BIGNUMERIC', 'BOOL', 'TIMESTAMP', 'DATETIME', or
'DATE'.
type_ (Union[str, google.cloud.bigquery.query.ScalarQueryParameterType]):
Name of parameter type. See
:class:`google.cloud.bigquery.enums.SqlTypeNames` and
:class:`google.cloud.bigquery.enums.SqlParameterScalarTypes` for
supported types.

value (Union[str, int, float, decimal.Decimal, bool, datetime.datetime, datetime.date]):
value (Union[ \
tswast marked this conversation as resolved.
Show resolved Hide resolved
str, int, float, dateutil.relativedelta.relativedelta, \
decimal.Decimal, bool, datetime.datetime, datetime.date \
]):
The scalar parameter value.
"""

def __init__(self, name, type_, value):
def __init__(
self,
name: Optional[str],
type_: Optional[Union[str, ScalarQueryParameterType]],
value: Optional[
Union[
str,
int,
float,
dateutil.relativedelta.relativedelta,
decimal.Decimal,
bool,
datetime.datetime,
datetime.date,
]
],
):
self.name = name
self.type_ = type_
if isinstance(type_, ScalarQueryParameterType):
tswast marked this conversation as resolved.
Show resolved Hide resolved
self.type_ = type_._type
else:
self.type_ = type_
self.value = value

@classmethod
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"google-resumable-media >= 0.6.0, < 3.0dev",
"packaging >= 14.3",
"protobuf >= 3.12.0",
"python-dateutil >= 2.7.0, <3.0dev",
plamut marked this conversation as resolved.
Show resolved Hide resolved
"requests >= 2.18.0, < 3.0.0dev",
]
extras = {
Expand Down
1 change: 1 addition & 0 deletions testing/constraints-3.6.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pandas==0.23.0
proto-plus==1.10.0
protobuf==3.12.0
pyarrow==1.0.0
python-dateutil==2.7.0
requests==2.18.0
six==1.13.0
tqdm==4.7.4
4 changes: 2 additions & 2 deletions tests/data/scalars.jsonl
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
{"bool_col": true, "bytes_col": "abcd", "date_col": "2021-07-21", "datetime_col": "2021-07-21 11:39:45", "geography_col": "POINT(-122.0838511 37.3860517)", "int64_col": "123456789", "numeric_col": "1.23456789", "bignumeric_col": "10.111213141516171819", "float64_col": "1.25", "string_col": "Hello, World", "time_col": "11:41:43.07616", "timestamp_col": "2021-07-21T17:43:43.945289Z"}
{"bool_col": null, "bytes_col": null, "date_col": null, "datetime_col": null, "geography_col": null, "int64_col": null, "numeric_col": null, "bignumeric_col": null, "float64_col": null, "string_col": null, "time_col": null, "timestamp_col": null}
{"bool_col": true, "bytes_col": "SGVsbG8sIFdvcmxkIQ==", "date_col": "2021-07-21", "datetime_col": "2021-07-21 11:39:45", "geography_col": "POINT(-122.0838511 37.3860517)", "int64_col": "123456789", "interval_col": "P7Y11M9DT4H15M37.123456S", "numeric_col": "1.23456789", "bignumeric_col": "10.111213141516171819", "float64_col": "1.25", "rowindex": 0, "string_col": "Hello, World!", "time_col": "11:41:43.07616", "timestamp_col": "2021-07-21T17:43:43.945289Z"}
{"bool_col": null, "bytes_col": null, "date_col": null, "datetime_col": null, "geography_col": null, "int64_col": null, "interval_col": null, "numeric_col": null, "bignumeric_col": null, "float64_col": null, "rowindex": 1, "string_col": null, "time_col": null, "timestamp_col": null}
10 changes: 5 additions & 5 deletions tests/data/scalars_extreme.jsonl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{"bool_col": true, "bytes_col": "DQo=\n", "date_col": "9999-12-31", "datetime_col": "9999-12-31 23:59:59.999999", "geography_col": "POINT(-135.0000 90.0000)", "int64_col": "9223372036854775807", "numeric_col": "9.9999999999999999999999999999999999999E+28", "bignumeric_col": "9.999999999999999999999999999999999999999999999999999999999999999999999999999E+37", "float64_col": "+inf", "string_col": "Hello, World", "time_col": "23:59:59.99999", "timestamp_col": "9999-12-31T23:59:59.999999Z"}
{"bool_col": false, "bytes_col": "8J+Zgw==\n", "date_col": "0001-01-01", "datetime_col": "0001-01-01 00:00:00", "geography_col": "POINT(45.0000 -90.0000)", "int64_col": "-9223372036854775808", "numeric_col": "-9.9999999999999999999999999999999999999E+28", "bignumeric_col": "-9.999999999999999999999999999999999999999999999999999999999999999999999999999E+37", "float64_col": "-inf", "string_col": "Hello, World", "time_col": "00:00:00", "timestamp_col": "0001-01-01T00:00:00.000000Z"}
{"bool_col": true, "bytes_col": "AA==\n", "date_col": "1900-01-01", "datetime_col": "1900-01-01 00:00:00", "geography_col": "POINT(-180.0000 0.0000)", "int64_col": "-1", "numeric_col": "0.000000001", "bignumeric_col": "-0.00000000000000000000000000000000000001", "float64_col": "nan", "string_col": "こんにちは", "time_col": "00:00:00.000001", "timestamp_col": "1900-01-01T00:00:00.000000Z"}
{"bool_col": false, "bytes_col": "", "date_col": "1970-01-01", "datetime_col": "1970-01-01 00:00:00", "geography_col": "POINT(0 0)", "int64_col": "0", "numeric_col": "0.0", "bignumeric_col": "0.0", "float64_col": 0.0, "string_col": "", "time_col": "12:00:00", "timestamp_col": "1970-01-01T00:00:00.000000Z"}
{"bool_col": null, "bytes_col": null, "date_col": null, "datetime_col": null, "geography_col": null, "int64_col": null, "numeric_col": null, "bignumeric_col": null, "float64_col": null, "string_col": null, "time_col": null, "timestamp_col": null}
{"bool_col": true, "bytes_col": "DQo=\n", "date_col": "9999-12-31", "datetime_col": "9999-12-31 23:59:59.999999", "geography_col": "POINT(-135.0000 90.0000)", "int64_col": "9223372036854775807", "interval_col": "P-10000Y0M-3660000DT-87840000H0M0S", "numeric_col": "9.9999999999999999999999999999999999999E+28", "bignumeric_col": "9.999999999999999999999999999999999999999999999999999999999999999999999999999E+37", "float64_col": "+inf", "rowindex": 0, "string_col": "Hello, World", "time_col": "23:59:59.999999", "timestamp_col": "9999-12-31T23:59:59.999999Z"}
{"bool_col": false, "bytes_col": "8J+Zgw==\n", "date_col": "0001-01-01", "datetime_col": "0001-01-01 00:00:00", "geography_col": "POINT(45.0000 -90.0000)", "int64_col": "-9223372036854775808", "interval_col": "P10000Y0M3660000DT87840000H0M0S", "numeric_col": "-9.9999999999999999999999999999999999999E+28", "bignumeric_col": "-9.999999999999999999999999999999999999999999999999999999999999999999999999999E+37", "float64_col": "-inf", "rowindex": 1, "string_col": "Hello, World", "time_col": "00:00:00", "timestamp_col": "0001-01-01T00:00:00.000000Z"}
{"bool_col": true, "bytes_col": "AA==\n", "date_col": "1900-01-01", "datetime_col": "1900-01-01 00:00:00", "geography_col": "POINT(-180.0000 0.0000)", "int64_col": "-1", "interval_col": "P0Y0M0DT0H0M0.000001S", "numeric_col": "0.000000001", "bignumeric_col": "-0.00000000000000000000000000000000000001", "float64_col": "nan", "rowindex": 2, "string_col": "こんにちは", "time_col": "00:00:00.000001", "timestamp_col": "1900-01-01T00:00:00.000000Z"}
{"bool_col": false, "bytes_col": "", "date_col": "1970-01-01", "datetime_col": "1970-01-01 00:00:00", "geography_col": "POINT(0 0)", "int64_col": "0", "interval_col": "P0Y0M0DT0H0M0S", "numeric_col": "0.0", "bignumeric_col": "0.0", "float64_col": 0.0, "rowindex": 3, "string_col": "", "time_col": "12:00:00", "timestamp_col": "1970-01-01T00:00:00.000000Z"}
{"bool_col": null, "bytes_col": null, "date_col": null, "datetime_col": null, "geography_col": null, "int64_col": null, "interval_col": null, "numeric_col": null, "bignumeric_col": null, "float64_col": null, "rowindex": 4, "string_col": null, "time_col": null, "timestamp_col": null}
53 changes: 31 additions & 22 deletions tests/data/scalars_schema.json
Original file line number Diff line number Diff line change
@@ -1,33 +1,32 @@
[
{
"mode": "NULLABLE",
"name": "timestamp_col",
"type": "TIMESTAMP"
"name": "bool_col",
"type": "BOOLEAN"
},
{
"mode": "NULLABLE",
"name": "time_col",
"type": "TIME"
"name": "bignumeric_col",
"type": "BIGNUMERIC"
},
{
"mode": "NULLABLE",
"name": "float64_col",
"type": "FLOAT"
"name": "bytes_col",
"type": "BYTES"
},
{
"mode": "NULLABLE",
"name": "datetime_col",
"type": "DATETIME"
"name": "date_col",
"type": "DATE"
},
{
"mode": "NULLABLE",
"name": "bignumeric_col",
"type": "BIGNUMERIC"
"name": "datetime_col", "type": "DATETIME"
},
{
"mode": "NULLABLE",
"name": "numeric_col",
"type": "NUMERIC"
"name": "float64_col",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
Expand All @@ -36,27 +35,37 @@
},
{
"mode": "NULLABLE",
"name": "date_col",
"type": "DATE"
"name": "int64_col",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "string_col",
"type": "STRING"
"name": "interval_col",
"type": "INTERVAL"
},
{
"mode": "NULLABLE",
"name": "bool_col",
"type": "BOOLEAN"
"name": "numeric_col",
"type": "NUMERIC"
},
{
"mode": "REQUIRED",
"name": "rowindex",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "bytes_col",
"type": "BYTES"
"name": "string_col",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "int64_col",
"type": "INTEGER"
"name": "time_col",
"type": "TIME"
},
{
"mode": "NULLABLE",
"name": "timestamp_col",
"type": "TIMESTAMP"
}
]
36 changes: 30 additions & 6 deletions tests/system/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,14 @@

"""System tests for Arrow connector."""

from typing import Optional

import pytest

from google.cloud import bigquery
from google.cloud.bigquery import enums


pyarrow = pytest.importorskip(
"pyarrow", minversion="3.0.0"
) # Needs decimal256 for BIGNUMERIC columns.
Expand All @@ -31,17 +37,35 @@
),
)
def test_list_rows_nullable_scalars_dtypes(
bigquery_client,
scalars_table,
scalars_extreme_table,
max_results,
scalars_table_name,
bigquery_client: bigquery.Client,
scalars_table: str,
scalars_extreme_table: str,
max_results: Optional[int],
scalars_table_name: str,
):
table_id = scalars_table
if scalars_table_name == "scalars_extreme_table":
table_id = scalars_extreme_table

# Avoid INTERVAL columns until they are supported by the BigQuery Storage
# API and pyarrow.
schema = [
bigquery.SchemaField("bool_col", enums.SqlTypeNames.BOOLEAN),
bigquery.SchemaField("bignumeric_col", enums.SqlTypeNames.BIGNUMERIC),
bigquery.SchemaField("bytes_col", enums.SqlTypeNames.BYTES),
bigquery.SchemaField("date_col", enums.SqlTypeNames.DATE),
bigquery.SchemaField("datetime_col", enums.SqlTypeNames.DATETIME),
bigquery.SchemaField("float64_col", enums.SqlTypeNames.FLOAT64),
bigquery.SchemaField("geography_col", enums.SqlTypeNames.GEOGRAPHY),
bigquery.SchemaField("int64_col", enums.SqlTypeNames.INT64),
bigquery.SchemaField("numeric_col", enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("string_col", enums.SqlTypeNames.STRING),
bigquery.SchemaField("time_col", enums.SqlTypeNames.TIME),
bigquery.SchemaField("timestamp_col", enums.SqlTypeNames.TIMESTAMP),
]

arrow_table = bigquery_client.list_rows(
table_id, max_results=max_results,
table_id, max_results=max_results, selected_fields=schema,
).to_arrow()

schema = arrow_table.schema
Expand Down
Loading