-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose aggregate metadata namespace for UI #3345
Conversation
My immediate reaction is to ask, how or why is this response better than
(I.e., why return a mapping of keys to (known-to-be) empty objects when you could just return the list of keys? Presumably, the client side is either going to iterate over the mapping's keys or extract them into a list, anyway.) |
The point was that this is the same format all the way down instead of having to look at each level to see whether it's a list of leaves or a dict with another level to descend. (Or even worse, a list at each level where some elements of the list are nested dicts...) I toyed with several other representations, but this was straightforward to create and I liked the consistent format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a pointed question for you, a couple of nits, and some whinging, but other than those these changes look really good.
this is the same format all the way down instead of having to look at each level to see whether it's a list of leaves or a dict with another level to descend.
Yes, having reviewed the code, I can see why you're proposing this output format.
However, my instinct is that it's not quite what we want. While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list
for the client's use. (I think that this can be done nicely with a small amount of recursive code, very like the proposed accumulate()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list for the client's use.
I'm thinking in terms of display as a hierarchical collapsing list like the demo we saw, which suggests that if I flattened it she'd just have to parse the key paths to construct the hierarchy.
However, the opinion I'm really looking for here is Varshini's. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it seems like the accumulated key-space tree which we're returning is a very useful intermediate result, it should be flattened into a list for the client's use.
I'm thinking in terms of display as a hierarchical collapsing list like the demo we saw, which suggests that if I flattened it she'd just have to parse the key paths to construct the hierarchy.
Right. I think that that approach is probably good for navigating a metadata instance and looking at the values, but I'm concerned that it will be cumbersome for users trying to select keys for search/display -- it'll be a lot of clicks to get to deeply-nested names. As an alternative, providing a pull-down with a flat namespace, perhaps with an auto-completed search field (somewhat like the "add reviewer" widget here in GitHub PRs), would be easier and quicker to use (it could also offer shortcuts to commonly-referenced or favorite keys).
the opinion I'm really looking for here is Varshini's. 😄
Quite right. @MVarshini, what are your thoughts?
PBENCH-1091 Managing the display of metadata in the dashboard is complicated by the fact that much of the namespace is highly dynamic: the `global` and `user` spaces are completely defined by clients, and can be completely different for each dataset, while the `dataset.metalog` namespace depends on the agent benchmark parameters and postprocessing. In order to provide a reference point, this PR proposes a way to acquire a JSON document representing the aggregated namespace across a set of selected datasets. It leverages the existing `GET /datasets` collection query for this purpose, including all the filter keywords: however instead of returning a list of matching datasets, it builds a JSON document showing the nested key namespace. While the example below is a bit large, the following is real output from a query made during a functional test run. Note that "leaf" values in the tree are designated by empty JSON object values. I thought of using `None`, but this was a bit easier and seemed more consistent. (I'd particularly like to see feedback on how useful this format is to the UI Javascript code.) ``` GET /api/v1/datasets?keysummary=true ----- { "dataset": { "access": {}, "id": {}, "metalog": { "iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "iterations/1": { "iteration_name": {}, "user_script": {} }, "iterations/1-forwarding_test.json-5pct_drop": { "iteration_name": {}, "iteration_number": {}, "max_loss_pct": {}, "profile": {} }, "iterations/1-rw-4KiB": { "block_size_kib": {}, "iteration_name": {}, "iteration_number": {}, "test_type": {} }, "iterations/1-tcp_stream-64B-1i": { "instances": {}, "iteration_name": {}, "iteration_number": {}, "message_size_bytes": {}, "protocol": {}, "test_type": {} }, "iterations/1__linpack-binary=:root:linpack:bob:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "iterations/2__linpack-binary=:root:linpack:l_mklb_p_2019.6.005:benchmarks_2019:linux:mkl:benchmarks:linpack:xlinpack_xeon64": { "clients": {}, "iteration_name": {}, "iteration_number": {}, "linpack-binary": {} }, "pbench": { "config": {}, "date": {}, "iterations": {}, "name": {}, "rpm-version": {}, "script": {}, "tar-ball-creation-timestamp": {} }, "run": { "controller": {}, "end_run": {}, "prefix": {}, "raw_size": {}, "start_run": {}, "user": {} }, "tools": { "group": {}, "hosts": {} }, "tools/10.10.10.153": { "hostname-s": {}, "turbostat": {} }, "tools/10.10.10.19": { "hostname-s": {}, "turbostat": {} }, "tools/ansible-host": { "hostname-s": {}, "iostat": {}, "pidstat": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {}, "[email protected]": {} }, "tools/app-node-0.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/app-node-1.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/ctlrA": { "hostname-s": {}, "turbostat": {} }, "tools/dhcp31-44": { "hostname-s": {}, "iostat": {}, "mpstat": {}, "perf": {}, "pidstat": {}, "proc-interrupts": {}, "proc-vmstat": {}, "sar": {}, "turbostat": {} }, "tools/infra-node-0.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/infra-node-1.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/infra-node-2.scale-ci.example.com": { "hostname-s": {}, "vmstat": {} }, "tools/perf124": { "hostname-s": {}, "turbostat": {} }, "tools/rhel8-1": { "hostname-s": {}, "[email protected]": {}, "[email protected]": {}, "turbostat": {} } }, "name": {}, "owner_id": {}, "resource_id": {}, "uploaded": {} }, "server": { "archiveonly": {}, "deletion": {}, "index-map": { "container-pbench.v5.result-data-sample.2020-02-28": {}, "container-pbench.v6.run-data.2018-10": {}, "container-pbench.v6.run-data.2020-02": {}, "container-pbench.v6.run-toc.2018-10": {}, "container-pbench.v6.run-toc.2020-02": {} }, "origin": {}, "tarball-path": {} }, "user": { "pbench": { "access": {} } } } ```
Primarily change to report leaf key nodes with None rather than an empty JSON object at Varshini's request. Revise the "left outer join" comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you just posted a response, so I'll post these and then look at those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good. But I do wonder whether we should always send the metalog information in the key summary though.
# "metalog" is a top-level key in the Metadata schema, but we | ||
# report it as a sub-key of "dataset". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a bit confused, why did you choose to add it as a sub-key of the dataset?
Also, I realized I think we should send the metalog
only if asked explicitly, the output can get big quickly for complex and long benchmark runs. Thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a bit confused, why did you choose to add it as a sub-key of the dataset?
Actually I originally wanted to make it a column of Dataset
because it really is part of the "identity" of the dataset rather than external tracking context like the server
namespace. At the time I added it, I felt that I didn't want to change the DB schema. And at this point I'd feel obligated to write, and somehow test, migration code if I moved it, so I probably won't. 😆
Also, I realized I think we should send the
metalog
only if asked explicitly, the output can get big quickly for complex and long benchmark runs. Thought?
The metadata.log
is an important part of the test run, including the script name, config data, and such. The problems here are (1) the weird key names that the agent postprocessing generates, and (2) the extremely "irregular" key space depending on the benchmark and parameters; but I think it's important and valuable to expose all this for display and filtering.
What I did wrestle with was whether to arbitrarily suppress the server.index-map
. While it's not so bad here since we only represent the keys, the value (if someone decided to enable display) can be enormous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of pointed questions....
So, forget I even thought about dealing with the `MetadataError` here. I can't collect "errors" here because `_get_dataset_metadata` actually internally traps any "real" metadata error and returns a `None` value already. To collect a list of keyspace errors I'd need to change `_get_dataset_metadata`, and I'm not going to do that here. The only `MetadataError` that can get through here is actually a bad keyname, and we've already done that validation so we really don't need to worry about it. I'd like to revise this separately because I've realized that while I allow `filter` to reference the weird non-alphanumeric key paths in `metalog`, I didn't do that for the `metadata` parameter... and that'll affect the `_get_dataset_metadata` as well because it independently validates the key paths and wouldn't know about that exception. (Ugh.) Not here. I did however add a test case showing that a keyspace conflict error results in a `None` value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
PBENCH-1091
Managing the display of metadata in the dashboard is complicated by the fact that much of the namespace is highly dynamic: the
global
anduser
spaces are completely defined by clients, and can be completely different for each dataset, while thedataset.metalog
namespace depends on the agent benchmark parameters and postprocessing.In order to provide a reference point, this PR proposes a way to acquire a JSON document representing the aggregated namespace across a set of selected datasets. It leverages the existing
GET /datasets
collection query for this purpose, including all the filter keywords: however instead of returning a list of matching datasets, it builds a JSON document showing the nested key namespace.While the example below is a bit large, the following is real output from a query made during a functional test run. Note that "leaf" nodes in the tree are designated by the value
None
.So, for example, the output shows valid metadata keys
dataset.access
,dataset.id
,dataset.metalog
,dataset.metalog.iterations/0__linpack-binary=:root:linpack:xlinpack_xeon64.clients
, and so forth...