-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBENCH-1127 Implementation of Quisby API #3463
PBENCH-1127 Implementation of Quisby API #3463
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, Siddardh!
I think we don't want to expose the name "quisby" in our API or messaging. For one thing, we may not always/exclusively use the quisby package... what we're doing here is postprocessing results data for client-side "visualization", and "quisby-ing" data won't mean anything to anyone outside Perf & Scale (and possibly not to everyone within Perf & Scale) so I suggest we call this something more appropriately generic and descriptive like "visualize".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to fix the client API
enum (and adding a functional test would be a great "bonus"). I also don't like the trailing slash API mapping: I don't think it's appropriate here. I have one other minor style comment. Aside from that, this is looking really good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally excellent. However, I have a few concerns (in descending order):
- Are we sure we like that API method path?
- There is a useless assignment in one of the tests which looks like it should instead be an assertion.
- I have a coding suggestion for selecting the Quisby benchmark type.
- A missing
result.csv
file shouldn't be an internal Server error. - There's a
raise
which should have afrom
. - There's an error message for a missing dataset which should be trimmed or reworked.
And, while you're at it, there are a few other smaller things and nits.
try: | ||
file = tarball.extract(tarball.tarball_path, f"{name}/result.csv") | ||
except TarballUnpackError as e: | ||
raise APIInternalError(str(e)) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this should be an Internal Server Error: as things currently stand, the result.csv
file is created by the Agent, right? So, if it's not there, it's a problem with the dataset, not with the Pbench Server.
I would be inclined to go with a NOT_FOUND
result (with a suitable error message). (Or, possibly, UNSUPPORTED_MEDIA_TYPE
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since NOT_FOUND
technically refers to the primary resource (the dataset), that would be misleading and unhelpful here. I'm not crazy about UNSUPPORTED_MEDIA_TYPE
either, though it's probably no worse than the other contexts we've already used it, and I can't think of anything in the limited HTTP error space that's less inappropriate. 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "medium" (i.e., the result) has to have a result.csv
file produced by the post-processing of a supported benchmark...otherwise, it's...um...unsupported. 😁
But, @dbutenhof, you concur that it should not be an internal error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we've already failed if dataset.metalog.pbench.script
isn't exactly "uperf", I'd be inclined to say by the time we get here we expect that this is a pbench-uperf
benchmark wrapper execution which can reasonably be expected to have proper post-processing, and lack of the proper summary file is definitely unexpected. Maybe that's potentially more of an "agent-side internal error" but I can't see any reasonable way to express that in HTTPStatus
aside from "internal server error". And, at best, I think we can reasonably say it means the server API was insufficiently paranoid about the dataset format...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm confused, but I think we're checking for the result.csv
file before we're trying to decode the benchmark type. (Do you think we should do this in a different order?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re METHOD_NOT_ALLOWED
, Mozilla explains it as
indicates that the server knows the request method, but the target resource doesn't support this method.
which sounds pretty close to me (although, technically, we're supposed to respond with a list of methods which are allowed, which sounds like a pain).
UNSUPPORTED_MEDIA_TYPE
, but that's really intended to report incompatible requestContent-Type
Ah, yes, I'd missed the fact that it refers to the request payload. Bummer.
I'm not entirely happy with
UNPROCESSABLE_ENTITY
Yeah, the WebDAV ones always look so attractive until you get to the details. 😛
Re BAD_REQUEST
, Mozilla describes it as
indicates that the server cannot or will not process the request due to something that is perceived to be a client error
I would suggest that the error is trying to visualize a dataset which has no results.csv
file in it...the client just cannot to that. 😒
I like 501/NOT_IMPLEMENTED
in principal, but, based on what Mozilla says, it doesn't fit:
501 is the appropriate response when the server does not recognize the request method and is incapable of supporting it for any resource. The only methods that servers are required to support (and therefore that must not return 501) are GET and HEAD.
That is, we do recognize the method (and, worse, it is a GET
).
Interestingly, Mozilla goes on to suggest:
If the server does recognize the method, but intentionally does not support it, the appropriate response is 405 Method Not Allowed.
So, 405 sounds like a reasonable fit (better than 403, which I agree would cover it, but too generically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do support the HTTP POST
method for this endpoint. That's telling them they shouldn't be using a POST
and is otherwise completely unhelpful. I think that's an incredibly bad fit. 😦 In particular, clients are encouraged to cache this to know that a subsequent POST
would be inappropriate. It's not and can't be resource-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from now on I'm sticking with RFC 9110
Fair enough. But, that reference says that 405 is resource-specific (and therefore, hopefully, the caching is resource-sensitive, as well...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still mean rejecting a POST
on the /api/v1/datasets/<id>/visualize
with a quite specifically defined complaint that POST
isn't acceptable and we only allow POST
. And that's the problem; the HTTP errors are a very narrow and specifically defined set of errors, which is absolutely ridiculous.
The most "wiggle room" I see in any of these is FORBIDDEN
and UNPROCESSABLE_ENTITY
, and I'm not thrilled about either of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but there are two small items which should probably be fixed, and there's some follow-up for the method path change.
- I'm happy with the change to the method path; however, it has a bunch of knock-on effects which should be attended to...if Dave is onboard with the change.
- There's a use of
get()
in one of the test assertions which is problematic...but we should probably just drop the assertion altogether. - There's another "missing assertion"
The rest of my comments are nits and small stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error codes in general are a problem, and we can potentially decide to come up with and implement a consistent/different scheme later but I don't think we need to hold up this critical change while we think/argue about it.
try: | ||
file = tarball.extract(tarball.tarball_path, f"{name}/result.csv") | ||
except TarballUnpackError as e: | ||
raise APIInternalError(str(e)) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
METHOD_NOT_ALLOWED
normally refers to POST
vs GET
/etc; I think that would be confusing when what we mean might best be respresented as "endpoint not allowed for this resource".
Yes, we've squeezed UNSUPPORTED_MEDIA_TYPE
, but that's really intended to report incompatible request Content-Type
(I presented XML but the API only supports JSON). I've never been entirely comfortable with this mapping, although at the time I hadn't noticed anything that looked measurably better or less misleading.
I'm not entirely happy with UNPROCESSABLE_ENTITY
, either, which is oriented towards WebDAV protocol payloads ... and we're not talking about request payloads here... but as it's a niche error a lot less common than UNSUPPORTED_MEDIA_TYPE
, it'd likely be a safer victim to steal and corrupt to our evil ends ...
Similarly for BAD_REQUEST
... this isn't a bad request, (there's nothing wrong with the URL, headers, or payload), but the resource isn't appropriate for the endpoint. You'd think this would be a common problem, but I really don't see any HTTP status with a description in that ballpark. It's even worse than trying to describe thread programming errors with those ridiculous UNIX error numbers... 😬
I could actually see 501/NOT_IMPLEMENTED
for "benchmark other than uperf", come to think of it; along with the implicit indication that this might change in the future, which is wholly appropriate.
Interesting, although common usage (including ours) makes this potentially misleading, 403/FORBIDDEN
is technically described as "The request contained valid data and was understood by the server, but the server is refusing action." (Failed permission check is just a "for example".) Aside from that common inference, though, this isn't a bad fit at all. And the message explains why we're refusing action... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is just one lingering issue (other than the nit that Dave pointed out, which I don't think either of us would necessarily hold the merge for): how to report a missing results.csv
file. Given that we don't seem to be converging on a solution for that problem, I'm approving the code as is.
…le results of uperf-data
…ated that change in it
Implementation of Quisby API
Integrated pquisby package into the Pbench server. The first pass of this API implementation will be used for retrieving quisby data for single dataset results.
Currently,
pquisby
supports only theuperf
benchmark; eventually, we will increase the support for other benchmarks too.Right now, for the
benchmark_type
we are fetching it fromdataset.metalog.pbench.script
.But if we are runningpbench-user-benchmark
and this method won't work; we need to find a way to capture the desiredbenchmark_type
in this case. I will address those in subsequent PRsGET /api/v1/quisby/{dataset}