Expose aggregate metadata namespace for UI #3345

dbutenhof · 2023-03-14T17:44:46Z

PBENCH-1091

Managing the display of metadata in the dashboard is complicated by the fact that much of the namespace is highly dynamic: the global and user spaces are completely defined by clients, and can be completely different for each dataset, while the dataset.metalog namespace depends on the agent benchmark parameters and postprocessing.

In order to provide a reference point, this PR proposes a way to acquire a JSON document representing the aggregated namespace across a set of selected datasets. It leverages the existing GET /datasets collection query for this purpose, including all the filter keywords: however instead of returning a list of matching datasets, it builds a JSON document showing the nested key namespace.

While the example below is a bit large, the following is real output from a query made during a functional test run. Note that "leaf" nodes in the tree are designated by the value None.

So, for example, the output shows valid metadata keys dataset.access, dataset.id, dataset.metalog, dataset.metalog.iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64.clients, and so forth...

GET /api/v1/datasets?keysummary=true
-----
{
    "dataset": {
        "access": null,
        "id": null,
        "metalog": {
            "iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64": {
                "clients": null,
                "iteration_name": null,
                "iteration_number": null,
                "linpack-binary": null
            },
            "iterations/1": {
                "iteration_name": null,
                "user_script": null
            },
            "iterations/1-forwarding_test.json-5pct_drop": {
                "iteration_name": null,
                "iteration_number": null,
                "max_loss_pct": null,
                "profile": null
            },
            "iterations/1-rw-4KiB": {
                "block_size_kib": null,
                "iteration_name": null,
                "iteration_number": null,
                "test_type": null
            },
            "iterations/1-tcp_stream-64B-1i": {
                "instances": null,
                "iteration_name": null,
                "iteration_number": null,
                "message_size_bytes": null,
                "protocol": null,
                "test_type": null
            },
            "iterations/1__linpack-binary=:root:linpack:bob:xlinpack_xeon64": {
                "clients": null,
                "iteration_name": null,
                "iteration_number": null,
                "linpack-binary": null
            },
            "iterations/2__linpack-binary=:root:linpack:l_mklb_p_2019.6.005:benchmarks_2019:linux:mkl:benchmarks:linpack:xlinpack_xeon64": {
                "clients": null,
                "iteration_name": null,
                "iteration_number": null,
                "linpack-binary": null
            },
            "pbench": {
                "config": null,
                "date": null,
                "iterations": null,
                "name": null,
                "rpm-version": null,
                "script": null,
                "tar-ball-creation-timestamp": null
            },
            "run": {
                "controller": null,
                "end_run": null,
                "prefix": null,
                "raw_size": null,
                "start_run": null,
                "user": null
            },
            "tools": {
                "group": null,
                "hosts": null
            },
            "tools/10.10.10.153": {
                "hostname-s": null,
                "turbostat": null
            },
            "tools/10.10.10.19": {
                "hostname-s": null,
                "turbostat": null
            },
            "tools/ansible-host": {
                "hostname-s": null,
                "iostat": null,
                "pidstat": null,
                "[email protected]": null,
                "[email protected]": null,
                "[email protected]": null,
                "[email protected]": null,
                "[email protected]": null
            },
            "tools/app-node-0.scale-ci.example.com": {
                "hostname-s": null,
                "vmstat": null
            },
            "tools/app-node-1.scale-ci.example.com": {
                "hostname-s": null,
                "vmstat": null
            },
            "tools/ctlrA": {
                "hostname-s": null,
                "turbostat": null
            },
            "tools/dhcp31-44": {
                "hostname-s": null,
                "iostat": null,
                "mpstat": null,
                "perf": null,
                "pidstat": null,
                "proc-interrupts": null,
                "proc-vmstat": null,
                "sar": null,
                "turbostat": null
            },
            "tools/infra-node-0.scale-ci.example.com": {
                "hostname-s": null,
                "vmstat": null
            },
            "tools/infra-node-1.scale-ci.example.com": {
                "hostname-s": null,
                "vmstat": null
            },
            "tools/infra-node-2.scale-ci.example.com": {
                "hostname-s": null,
                "vmstat": null
            },
            "tools/perf124": {
                "hostname-s": null,
                "turbostat": null
            },
            "tools/rhel8-1": {
                "hostname-s": null,
                "[email protected]": null,
                "[email protected]": null,
                "turbostat": null
            }
        },
        "name": null,
        "owner_id": null,
        "resource_id": null,
        "uploaded": null
    },
    "server": {
        "archiveonly": null,
        "deletion": null,
        "origin": null,
        "tarball-path": null
    },
    "user": {
        "pbench": {
            "access": null
        }
    }
}

webbnh · 2023-03-14T20:37:12Z

"iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64": {
                "clients": {},
                "iteration_name": {},
                "iteration_number": {},
                "linpack-binary": {}
            },

My immediate reaction is to ask, how or why is this response better than

"iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64": [
                 "clients",
                 "iteration_name",
                 "iteration_number",
                 "linpack-binary"
             ],

(I.e., why return a mapping of keys to (known-to-be) empty objects when you could just return the list of keys? Presumably, the client side is either going to iterate over the mapping's keys or extract them into a list, anyway.)

dbutenhof · 2023-03-14T20:43:23Z

The point was that this is the same format all the way down instead of having to look at each level to see whether it's a list of leaves or a dict with another level to descend. (Or even worse, a list at each level where some elements of the list are nested dicts...)

I toyed with several other representations, but this was straightforward to create and I liked the consistent format.

webbnh

I've got a pointed question for you, a couple of nits, and some whinging, but other than those these changes look really good.

this is the same format all the way down instead of having to look at each level to see whether it's a list of leaves or a dict with another level to descend.

Yes, having reviewed the code, I can see why you're proposing this output format.

However, my instinct is that it's not quite what we want. While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list for the client's use. (I think that this can be done nicely with a small amount of recursive code, very like the proposed accumulate().)

lib/pbench/server/api/resources/datasets_list.py

siddardh-ra

Looks good!

dbutenhof

While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list for the client's use.

I'm thinking in terms of display as a hierarchical collapsing list like the demo we saw, which suggests that if I flattened it she'd just have to parse the key paths to construct the hierarchy.

However, the opinion I'm really looking for here is Varshini's. 😄

lib/pbench/server/api/resources/datasets_list.py

webbnh

While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list for the client's use.

I'm thinking in terms of display as a hierarchical collapsing list like the demo we saw, which suggests that if I flattened it she'd just have to parse the key paths to construct the hierarchy.

Right. I think that that approach is probably good for navigating a metadata instance and looking at the values, but I'm concerned that it will be cumbersome for users trying to select keys for search/display -- it'll be a lot of clicks to get to deeply-nested names. As an alternative, providing a pull-down with a flat namespace, perhaps with an auto-completed search field (somewhat like the "add reviewer" widget here in GitHub PRs), would be easier and quicker to use (it could also offer shortcuts to commonly-referenced or favorite keys).

the opinion I'm really looking for here is Varshini's. 😄

Quite right. @MVarshini, what are your thoughts?

lib/pbench/server/api/resources/datasets_list.py

PBENCH-1091 Managing the display of metadata in the dashboard is complicated by the fact that much of the namespace is highly dynamic: the `global` and `user` spaces are completely defined by clients, and can be completely different for each dataset, while the `dataset.metalog` namespace depends on the agent benchmark parameters and postprocessing. In order to provide a reference point, this PR proposes a way to acquire a JSON document representing the aggregated namespace across a set of selected datasets. It leverages the existing `GET /datasets` collection query for this purpose, including all the filter keywords: however instead of returning a list of matching datasets, it builds a JSON document showing the nested key namespace. While the example below is a bit large, the following is real output from a query made during a functional test run. Note that "leaf" values in the tree are designated by empty JSON object values. I thought of using `None`, but this was a bit easier and seemed more consistent. (I'd particularly like to see feedback on how useful this format is to the UI Javascript code.) ``` GET /api/v1/datasets?keysummary=true ----- { "dataset": { "access": {}, "id": {}, "metalog": { "iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "iterations/1": { "iteration_name": {}, "user_script": {} }, "iterations/1-forwarding_test.json-5pct_drop": { "iteration_name": {}, "iteration_number": {}, "max_loss_pct": {}, "profile": {} }, "iterations/1-rw-4KiB": { "block_size_kib": {}, "iteration_name": {}, "iteration_number": {}, "test_type": {} }, "iterations/1-tcp_stream-64B-1i": { "instances": {}, "iteration_name": {}, "iteration_number": {}, "message_size_bytes": {}, "protocol": {}, "test_type": {} }, "iterations/1__linpack-binary=:root:linpack:bob:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "iterations/2__linpack-binary=:root:linpack:l_mklb_p_2019.6.005:benchmarks_2019:linux:mkl:benchmarks:linpack:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "pbench": { "config": {}, "date": {}, "iterations": {}, "name": {}, "rpm-version": {}, "script": {}, "tar-ball-creation-timestamp": {} }, "run": { "controller": {}, "end_run": {}, "prefix": {}, "raw_size": {}, "start_run": {}, "user": {} }, "tools": { "group": {}, "hosts": {} }, "tools/10.10.10.153": { "hostname-s": {}, "turbostat": {} }, "tools/10.10.10.19": { "hostname-s": {}, "turbostat": {} }, "tools/ansible-host": { "hostname-s": {}, "iostat": {}, "pidstat": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {} }, "tools/app-node-0.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/app-node-1.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/ctlrA": { "hostname-s": {}, "turbostat": {} }, "tools/dhcp31-44": { "hostname-s": {}, "iostat": {}, "mpstat": {}, "perf": {}, "pidstat": {}, "proc-interrupts": {}, "proc-vmstat": {}, "sar": {}, "turbostat": {} }, "tools/infra-node-0.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/infra-node-1.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/infra-node-2.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/perf124": { "hostname-s": {}, "turbostat": {} }, "tools/rhel8-1": { "hostname-s": {}, "[email protected]": {}, "[email protected]": {}, "turbostat": {} } }, "name": {}, "owner_id": {}, "resource_id": {}, "uploaded": {} }, "server": { "archiveonly": {}, "deletion": {}, "index-map": { "container-pbench.v5.result-data-sample.2020-02-28": {}, "container-pbench.v6.run-data.2018-10": {}, "container-pbench.v6.run-data.2020-02": {}, "container-pbench.v6.run-toc.2018-10": {}, "container-pbench.v6.run-toc.2020-02": {} }, "origin": {}, "tarball-path": {} }, "user": { "pbench": { "access": {} } } } ```

Primarily change to report leaf key nodes with None rather than an empty JSON object at Varshini's request. Revise the "left outer join" comment to clarify.

webbnh

I see you just posted a response, so I'll post these and then look at those.

lib/pbench/server/api/resources/datasets_list.py

npalaska

Looks generally good. But I do wonder whether we should always send the metalog information in the key summary though.

npalaska · 2023-03-15T23:12:05Z

lib/pbench/server/api/resources/datasets_list.py

+            # "metalog" is a top-level key in the Metadata schema, but we
+            # report it as a sub-key of "dataset".


Okay, a bit confused, why did you choose to add it as a sub-key of the dataset?

Also, I realized I think we should send the metalog only if asked explicitly, the output can get big quickly for complex and long benchmark runs. Thought?

Okay, a bit confused, why did you choose to add it as a sub-key of the dataset?

Actually I originally wanted to make it a column of Dataset because it really is part of the "identity" of the dataset rather than external tracking context like the server namespace. At the time I added it, I felt that I didn't want to change the DB schema. And at this point I'd feel obligated to write, and somehow test, migration code if I moved it, so I probably won't. 😆

Also, I realized I think we should send the metalog only if asked explicitly, the output can get big quickly for complex and long benchmark runs. Thought?

The metadata.log is an important part of the test run, including the script name, config data, and such. The problems here are (1) the weird key names that the agent postprocessing generates, and (2) the extremely "irregular" key space depending on the benchmark and parameters; but I think it's important and valuable to expose all this for display and filtering.

What I did wrestle with was whether to arbitrarily suppress the server.index-map. While it's not so bad here since we only represent the keys, the value (if someone decided to enable display) can be enormous.

I forgot to drop the `warning` message; instead just set the key to `None`. Although I'd liked the idea of aggregating metadata over the three datasets visible to the `drb` test user, leaving the third (`fio_2`) *without* metadata should increase test coverage by 1/2 line to 100%...

webbnh

Just a couple of pointed questions....

lib/pbench/test/unit/server/test_datasets_list.py

lib/pbench/server/api/resources/datasets_list.py

So, forget I even thought about dealing with the `MetadataError` here. I can't collect "errors" here because `_get_dataset_metadata` actually internally traps any "real" metadata error and returns a `None` value already. To collect a list of keyspace errors I'd need to change `_get_dataset_metadata`, and I'm not going to do that here. The only `MetadataError` that can get through here is actually a bad keyname, and we've already done that validation so we really don't need to worry about it. I'd like to revise this separately because I've realized that while I allow `filter` to reference the weird non-alphanumeric key paths in `metalog`, I didn't do that for the `metadata` parameter... and that'll affect the `_get_dataset_metadata` as well because it independently validates the key paths and wouldn't know about that exception. (Ugh.) Not here. I did however add a test case showing that a keyspace conflict error results in a `None` value.

lib/pbench/test/unit/server/test_datasets_list.py

webbnh

👍

webbnh

👍

dbutenhof added Server Dashboard Of and relating to the Dashboard GUI API Of and relating to application programming interfaces to services and functions labels Mar 14, 2023

dbutenhof requested review from MVarshini, webbnh, npalaska, vishalvvr and siddardh-ra March 14, 2023 17:44

dbutenhof self-assigned this Mar 14, 2023

portante added this to the v0.73 milestone Mar 14, 2023

webbnh previously approved these changes Mar 14, 2023

View reviewed changes

siddardh-ra previously approved these changes Mar 15, 2023

View reviewed changes

dbutenhof commented Mar 15, 2023

View reviewed changes

webbnh reviewed Mar 15, 2023

View reviewed changes

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

dbutenhof added 2 commits March 15, 2023 11:03

Review comments

ba43de7

Primarily change to report leaf key nodes with None rather than an empty JSON object at Varshini's request. Revise the "left outer join" comment to clarify.

dbutenhof dismissed stale reviews from siddardh-ra and webbnh via ba43de7 March 15, 2023 15:13

dbutenhof force-pushed the keys branch from e608ed5 to ba43de7 Compare March 15, 2023 15:13

dbutenhof requested review from webbnh and siddardh-ra March 15, 2023 15:45

webbnh reviewed Mar 15, 2023

View reviewed changes

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

npalaska previously approved these changes Mar 15, 2023

View reviewed changes

dbutenhof dismissed npalaska’s stale review via 3ead23a March 16, 2023 12:31

Instead of getting blackened, this is flakey

0c135d6

webbnh previously approved these changes Mar 16, 2023

View reviewed changes

lib/pbench/test/unit/server/test_datasets_list.py Outdated Show resolved Hide resolved

lib/pbench/server/api/resources/datasets_list.py Outdated Show resolved Hide resolved

dbutenhof dismissed webbnh’s stale review via 093d72b March 16, 2023 17:47

dbutenhof requested review from webbnh and npalaska March 16, 2023 17:49

webbnh reviewed Mar 16, 2023

View reviewed changes

lib/pbench/test/unit/server/test_datasets_list.py Show resolved Hide resolved

webbnh previously approved these changes Mar 16, 2023

View reviewed changes

Review comment

402e7df

dbutenhof dismissed webbnh’s stale review via 402e7df March 16, 2023 20:04

webbnh approved these changes Mar 16, 2023

View reviewed changes

npalaska approved these changes Mar 16, 2023

View reviewed changes

dbutenhof merged commit 1f5cd39 into distributed-system-analysis:main Mar 17, 2023

dbutenhof deleted the keys branch March 17, 2023 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose aggregate metadata namespace for UI #3345

Expose aggregate metadata namespace for UI #3345

dbutenhof commented Mar 14, 2023 •

edited

Loading

webbnh commented Mar 14, 2023

dbutenhof commented Mar 14, 2023

webbnh left a comment

siddardh-ra left a comment

dbutenhof left a comment

webbnh left a comment

webbnh left a comment

npalaska left a comment

npalaska Mar 15, 2023

dbutenhof Mar 16, 2023

webbnh left a comment

webbnh left a comment

webbnh left a comment

		# "metalog" is a top-level key in the Metadata schema, but we
		# report it as a sub-key of "dataset".

Expose aggregate metadata namespace for UI #3345

Expose aggregate metadata namespace for UI #3345

Conversation

dbutenhof commented Mar 14, 2023 • edited Loading

webbnh commented Mar 14, 2023

dbutenhof commented Mar 14, 2023

webbnh left a comment

Choose a reason for hiding this comment

siddardh-ra left a comment

Choose a reason for hiding this comment

dbutenhof left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

npalaska left a comment

Choose a reason for hiding this comment

npalaska Mar 15, 2023

Choose a reason for hiding this comment

dbutenhof Mar 16, 2023

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

dbutenhof commented Mar 14, 2023 •

edited

Loading