Skip to content

Commit

Permalink
feat: read_gbq_table supports LIKE as a operator in filters (#454)
Browse files Browse the repository at this point in the history
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes internal issue 330149095
 🦕
  • Loading branch information
tswast authored Mar 18, 2024
1 parent 718a00c commit d2d425a
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 2 deletions.
1 change: 1 addition & 0 deletions bigframes/session/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,7 @@ def _to_query(
valid_operators: Mapping[third_party_pandas_gbq.FilterOps, str] = {
"in": "IN",
"not in": "NOT IN",
"LIKE": "LIKE",
"==": "=",
">": ">",
"<": "<",
Expand Down
12 changes: 12 additions & 0 deletions tests/system/small/test_session.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,18 @@ def test_read_gbq_twice_with_same_timestamp(session, penguins_table_id):
assert df3 is not None


def test_read_gbq_table_clustered_with_filter(session: bigframes.Session):
df = session.read_gbq_table(
"bigquery-public-data.cloud_storage_geo_index.landsat_index",
filters=[[("sensor_id", "LIKE", "OLI%")], [("sensor_id", "LIKE", "%TIRS")]], # type: ignore
columns=["sensor_id"],
)
sensors = df.groupby(["sensor_id"]).agg("count").to_pandas(ordered=False)
assert "OLI" in sensors.index
assert "TIRS" in sensors.index
assert "OLI_TIRS" in sensors.index


def test_read_gbq_wildcard(session: bigframes.Session):
df = session.read_gbq("bigquery-public-data.noaa_gsod.gsod193*")
assert df.shape == (348485, 32)
Expand Down
4 changes: 2 additions & 2 deletions third_party/bigframes_vendored/pandas/io/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from bigframes import constants

FilterOps = Literal["in", "not in", "<", "<=", "==", "!=", ">=", ">"]
FilterOps = Literal["in", "not in", "<", "<=", "==", "!=", ">=", ">", "LIKE"]
FilterType = Tuple[str, FilterOps, Any]
FiltersType = Union[Iterable[FilterType], Iterable[Iterable[FilterType]]]

Expand Down Expand Up @@ -112,7 +112,7 @@ def read_gbq(
query results.
filters (Union[Iterable[FilterType], Iterable[Iterable[FilterType]]], default ()): To
filter out data. Filter syntax: [[(column, op, val), …],…] where
op is [==, >, >=, <, <=, !=, in, not in]. The innermost tuples
op is [==, >, >=, <, <=, !=, in, not in, LIKE]. The innermost tuples
are transposed into a set of filters applied through an AND
operation. The outer Iterable combines these sets of filters
through an OR operation. A single Iterable of tuples can also
Expand Down

0 comments on commit d2d425a

Please sign in to comment.