Skip to content

Commit

Permalink
Add typed filtering (#3385)
Browse files Browse the repository at this point in the history
* Add typed filtering

PBENCH-1124

Support type-cast filter expressions in `GET /datasets`. The primary objective
is to support a paginated "Expiring Soon" view in the dashboard, requiring the
ability to look for datasets expiring before a fixed timestamp. Previously,
`GET /datasets?filter` worked by casting all SQL data to "string" and then
comparing against the raw extracted string from the query parameter. Now it's
possible to identify a type as well as additional comparison operators. For
example, `GET /datasets?filter=server.deletion:<2023-05-01:date` will select
all datasets with expiration timestamps earlier than 2023-05-1.

To target this specific capability, the functional tests now override the
default `server.deletion` on some uploads and verify that those datasets are
returned by the filtered query.
  • Loading branch information
dbutenhof authored Apr 24, 2023
1 parent d1977e2 commit 6d9c40f
Show file tree
Hide file tree
Showing 6 changed files with 566 additions and 100 deletions.
84 changes: 60 additions & 24 deletions docs/API/V1/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,34 +45,66 @@ specified date.

`filter` metadata filtering \
Select datasets matching the metadata expressions specified via `filter`
query parameters. Each expression is the name of a metadata key (for example,
`dataset.name`), followed by a colon (`:`) and the comparison string. The
comparison string may be prefixed with a tilde (`~`) to make it a partial
("contains") comparison instead of an exact match. For example,
`dataset.name:foo` looks for datasets with the name "foo" exactly, whereas
`dataset.name:~foo` looks for datasets with a name containing the substring
"foo".

These may be combined across multiple `filter` query parameters or as
comma-separated lists in a single query parameter. Multiple filter expressions
form an `AND` expression, however consecutive filter expressions can be joined
in an `OR` expression by using the circumflex (`^`) character prior to each.
(The first expression with `^` begins an `OR` list while the first subsequent
expression outout `^` ends the `OR` list and is combined with an `AND`.)
query parameters. Each expression has the format `[chain]key:[op]value[:type]`:

* `chain` Prefix an expression with `^` (circumflex) to allow combining a set
of expressions with `OR` rather than the default `AND`.
* `key` The name of a metadata key (for example, `dataset.name`)

* `op` An operator to specify how to compare the key value:

* `=` (Default) Compare for equality
* `~` Compare against a substring
* `>` Greater than
* `<` Less than
* `>=` Greater than or equal to
* `<=` Less than or equal to
* `!=` Not equal

* `value` The value to compare against. This will be interpreted based on the specified type.
* `type` The string value will be cast to this type. Any value can be cast to
type `str`. General metadata keys (`server`, `global`, `user`, and
`dataset.metalog` namespaces) that have values incompatible with the specified
type will be ignored. If you specify an incompatible type for a primary
`dataset` key, an error will be returned as these types are defined by the
Pbench schema so no match would be possible. (For example, `dataset.name:2:int`
or `dataset.access:2023-05-01:date`.)

* `str` (Default) Compare as a string
* `bool` Compare as a boolean
* `int` Compare as an integer
* `date` Compare as a date-time string. ISO-8601 recommended, and UTC is
assumed if no timezone is specified.

For example, `dataset.name:foo` looks for datasets with the name "foo" exactly,
whereas `dataset.name:~foo` looks for datasets with a name containing the
substring "foo".

Multiple expressions may be combined across multiple `filter` query parameters
or as comma-separated lists in a single query parameter. Multiple filter
expressions are combined as an `AND` expression, matching only when all
expressions match. However any consecutive set of expressions starting with `^`
are collected into an "`OR` list" that will be `AND`-ed with the surrounding
terms.

For example,
- `filter=dataset.name:a,server.origin:EC2` returns datasets with a name of
"a" and an origin of "EC2".
- `filter=dataset.name:a,^server.origin:EC2,^dataset.metalog.pbench.script:fio`
returns datasets with a name of "a" and *either* an origin of "EC2" or generated
from the "pbench-fio" script.

_NOTE_: `filter` expression values, like the `true` in
`GET /api/v1/datasets?filter=server.archiveonly:true`, are always interpreted
as strings, so be careful about the string representation of the value (in this
case, a boolean, which is represented in JSON as `true` or `false`). Beware
especially when attempting to match a JSON document (such as
`dataset.metalog.pbench`).
- `filter=dataset.name:~andy,^server.origin:EC2,^server.origin:RIYA,
dataset.access:public`
returns only "public" datasets with a name containing the string "andy" which also
have an origin of either "EC2" or "RIYA". As a SQL query, we might write it
as `dataset.name like "%andy%" and (server.origin = 'EC2' or
server.origin = 'RIYA') and dataset.access = 'public'`.

_NOTE_: `filter` expression term values, like the `true` in
`GET /api/v1/datasets?filter=server.archiveonly:true`, are by default
interpreted as strings, so be careful about the string representation of the
value. In this case, `server.archiveonly` is a boolean, which will be matched
as a string value "true" or "false". You can instead specify the expression
term as `server.archiveonly:t:bool` which will treat the specified match value
as a boolean (`t[rue]` or `y[es]` for true, `f[alse]` or `n[o]` for false) and
match against the boolean metadata value.

`keysummary` boolean \
Instead of displaying a list of selected datasets and metadata, use the set of
Expand Down Expand Up @@ -105,6 +137,10 @@ Allows filtering for datasets owned by the authenticated client (if the value
is omitted, e.g., `?mine` or `?mine=true`) or owned by *other* users (e.g.,
`?mine=false`).

`name` string \
Select only datasets with a specified substring in their name. The filter
`?name=fio` is semantically equivalent to `?filter=dataset.name:~fio`.

`offset` integer \
"Paginate" the selected datasets by skipping the first `offset` datasets that
would have been selected by the other query terms. This can be used with
Expand Down
15 changes: 10 additions & 5 deletions jenkins/run-server-func-tests
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ elif [[ -n "${1}" ]]; then
exit 2
fi

function dump_journal {
printf -- "+++ journalctl dump +++\n"
# Try to capture the functional test container's logs.
podman exec ${PB_SERVER_CONTAINER_NAME} journalctl
printf -- "\n--- journalctl dump ---\n\n"
}

function cleanup {
if [[ -n "${cleanup_flag}" ]]; then
# Remove the Pbench Server container and the dependencies pod which we
Expand Down Expand Up @@ -59,6 +66,7 @@ until curl -s -o /dev/null ${SERVER_API_ENDPOINTS}; do
if [[ $(date +"%s") -ge ${end_in_epoch_secs} ]]; then
echo "Timed out waiting for the reverse proxy to show up!" >&2
exit_status=1
dump_journal
exit ${exit_status}
fi
sleep 1
Expand All @@ -84,11 +92,8 @@ else
fi

if [[ ${exit_status} -ne 0 ]]; then
printf -- "\nFunctional tests exited with code %s\n" ${exit_status}
printf -- "+++ journalctl dump +++\n"
# Try to capture the functional test container's logs.
podman exec ${PB_SERVER_CONTAINER_NAME} journalctl
printf -- "\n--- journalctl dump ---\n\n"
dump_journal
printf -- "\nFunctional tests exited with code %s\n" ${exit_status} >&2
fi

if [[ -z "${cleanup_flag}" ]]; then
Expand Down
12 changes: 12 additions & 0 deletions lib/pbench/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,3 +489,15 @@ def update(
uri_params={"dataset": dataset_id},
params=params,
).json()

def get_settings(self, key: str = "") -> JSONOBJECT:
"""Return requested server setting.
Args:
key: A server settings key; if omitted, return all settings
Returns:
A JSON document containing the requested key values
"""
params = {"key": key}
return self.get(api=API.SERVER_SETTINGS, uri_params=params).json()
Loading

0 comments on commit 6d9c40f

Please sign in to comment.