-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add alternative schema for array items in RunListResponse #207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @uniqueg For putting this together! I think it has succinctly summarized the entirety of our discussion in the other thread in a very clean and concise way.
Overall I am in favour of this change! I addressed my original concerns and solves a few complexities with the breaking changes my PR introduced
end_time: | ||
type: string | ||
description: When the run stopped executing (completed, failed, or cancelled), in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ" | ||
tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favour of this , but I would want to check with other implementors of WES (ie @wleepang) to see if this is a non-starter for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this as well
- $ref: '#/components/schemas/RunStatus' | ||
- type: object | ||
properties: | ||
name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to tags, I am largely in favour of this attribute, but I would probably want to ensure this is accessible by implementors before adding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would this come from? The current RunRequest
doesn't have an explicit attribute for name
. IIRC, the way we've handled it is with a specific tag
. That said, if name
is important enough to be used by other parts of the API, it might be worth elevating it to a top level property of RunRequest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a property of the Log
model:
Log:
title: Log
type: object
properties:
name:
type: string
description: The task or workflow name
anyOf: | ||
- $ref: '#/components/schemas/RunStatus' | ||
- $ref: '#/components/schemas/RunSummary' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, since it does not break the API, but strongly suggests to implementors to use the new approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
@uniqueg can you please rebase this on develop |
@wleepang @cjllanwarne I wonder what your thoughts are on this modified PR? |
96cc1b8
to
9e2f986
Compare
required: | ||
- name | ||
- start_time | ||
- end_time | ||
- tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I like the idea of these being all required, I realized that this information is not always going to be available.
name
: this requires parsing of a workflow, which in systems that lazily evaluate submissions like Cromwell will not be immediately available. Also I am not sure what would be the name for a snake make workflowstart_time
andend_time
before the run has started, or has yet to finish these values may not be availabletags
: so long as everyone is happy with an empty map, this is fine to be required."
I am still in favour of extending the RunStatus
object like you are doing here, I think the RunSummary
is a better approch overall. But, Thinking about this more we probably want most of these as optional, simply out of necessity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right.
But then we could set defaults for these values (e.g., empty strings). That way we would perhaps encourage implementers a little more to give their best at producing these values, wherever possible. And on the client side we could rely on these fields being available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I prefer null
to an empty value like ""
(empty maps are fine). Empty strings end up requiring a bunch of edge case handling on the client side, ie in typed systems you now need to ensure that non null values can actually be deserialized as the expected type.
For example, I can represent a RunSummary
in java like the following:
public record RunSummary(String id, String name, Instant start_time, Instant end_time, Map<String,String> tags){
}
Now, if I actually wanted to use this record when receiving responses, I would not be able to directly deserialize the value directly, but would now need to have a "holding" class because ""
is not a valid Date. Trying to deserialize the following using a library like Jackson
would result in a JsonProcessingException
, OR a DateTimeParseException
.
{
"id": "foo",
"name": "bar",
"start_time": "",
"end_time": ""
"tags": {}
}
Whereas the following would work:
{
"id": "foo",
"name": "bar",
"start_time": null,
"end_time": null
"tags": {}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is sensible and is what I'd expect from JSON. That said, I'd have to see if our implementation can handle this. I know in the past we've had issues handling null
values when the client library uses types that can't be Nullible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think name
requires parsing of a workflow definition. As I mentioned on a separate comment, name
is effectively a property of RunRequest
(possibly defined as a tag
) - that is it is the name of the run and not the name of the workflow. For that I think I'll backtrack a bit and say that tags
is the better place to hold this information and that both a run name and workflow name are a good idea.
For example:
- workflow names can be:
- nf-core-sarek-3.1.2
- gatk4-variant-discovery
- run names can be:
- sample-id-12345
- lims-id-{{uuid}}
Having these as tags presents a way to filter the runs list response along the lines of:
- "how many runs of a specific workflow are there?"
- "how many workflows or runs are processing data for a specific sample?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would already have unique identifiers for runs (the run_id
). Having an optional human-readable alternative to these is useful, I think, but would indeed go well in tags
, as it's really just an alternative.
Workflow names, on the other hand, are extremely useful. I would venture to say that these would be among the features which users filter there reads by most often, if not the most commonly used of all. If I run a lot of analyses, next to inputs, I think that the workflow "name" is really what I would look for first to narrow things down. And with that, I think we could indeed add an optional workflow_name
to RunRequest
(I think TES has something similar) - and then encourage implementers to provide a reasonable default (e.g., repo name, TRS resource name, workflow URL suffix) if a workflow name is not provided by the client. For this, tags
feels a little too optional for me personally.
However, wouldn't strongly object to having both in tags
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If name
is a property of RunSummary
I'd prefer the spect to be clear about what item is being named. If part of RunSummary
, my expectation is that it is the name of the run and not the name of the workflow the run invoked.
With just workflow name, and assuming tags
doesn't provide additional human readable information to help disambiguate, a list response would look like:
[
{
"id": "e81da9ff-f5a8-4f20-9d5f-772efc4a93b4",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "nf-core-rnaseq",
"tags": { ... }
},
{
"id": "1ae61546-7cfa-4959-bf28-247ce1390508",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "nf-core-rnaseq",
"tags": { ... }
},
{
"id": "c9be874b-5b9c-48b3-879e-40986fe9dbf0",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "nf-core-rnaseq",
"tags": { ... }
},
{
"id": "86aaf850-b2c4-4c9d-b898-16ed3ce1daba",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "nf-core-rnaseq",
"tags": { ... }
}
]
This would be challenging for an end-user to find the specific run that say processed sample "LIMSJVLO6YWS". Omitting name
from the top level of RunSummary
and having tags explicitly for workflow_name
and run_name
looks like:
[
{
"id": "e81da9ff-f5a8-4f20-9d5f-772efc4a93b4",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"tags": {
"workflow_name": "nf-core-rnaseq",
"run_name": "LIMS2ZR5ZS29"
}
},
{
"id": "1ae61546-7cfa-4959-bf28-247ce1390508",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"tags": {
"workflow_name": "nf-core-rnaseq",
"run_name": "LIMSXXOFXDMW"
}
},
{
"id": "c9be874b-5b9c-48b3-879e-40986fe9dbf0",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"tags": {
"workflow_name": "nf-core-rnaseq",
"run_name": "LIMSJVLO6YWS"
}
},
{
"id": "86aaf850-b2c4-4c9d-b898-16ed3ce1daba",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"tags": {
"workflow_name": "nf-core-rnaseq",
"run_name": "LIMS42IXEIFO"
}
}
]
IMO this provides a more useful response and maintains flexibility for clients to display or filter information how they need.
A compromise would be to maintain semantic consistency where the name
property in RunSummary is the run_name
and workflow_name
is additional metadata provided in tags
:
[
{
"id": "e81da9ff-f5a8-4f20-9d5f-772efc4a93b4",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "LIMS2ZR5ZS29",
"tags": {
"workflow_name": "nf-core-rnaseq"
}
},
{
"id": "1ae61546-7cfa-4959-bf28-247ce1390508",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "LIMSXXOFXDMW",
"tags": {
"workflow_name": "nf-core-rnaseq"
}
},
{
"id": "c9be874b-5b9c-48b3-879e-40986fe9dbf0",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "LIMSJVLO6YWS",
"tags": {
"workflow_name": "nf-core-rnaseq"
}
},
{
"id": "86aaf850-b2c4-4c9d-b898-16ed3ce1daba",
"start_time": "YYYY-MM-DD HH:MM:SS Z",
"end_time": "YYYY-MM-DD HH:MM:SS Z",
"name": "LIMS42IXEIFO",
"tags": {
"workflow_name": "nf-core-rnaseq"
}
}
]
I think From a client perspective, |
I'm with you. It seems to me like we all agree and it's mostly a semantic argument. The main point is that we want to encourage implementers as much as possible to make this info available, without making their life unduly hard if, for whatever reason, they can't. So let it be in |
@uniqueg once you update this PR to remove name and make fields like start_time and end_time optional, I will go ahead and merge it! |
Done |
Alternative to #172
Closes #165
This PR differs from #172 in the following ways:
RunSummary
is defined, to be used specifically for theruns
array items inRunListResponse
RunSummary
extendsRunStatus
(not modified!) withrequired
additional fieldsRunSummary
orRunStatus
(anyOf
) are acceptable responses forGET /runs
start_time
andend_time
, also workflowname
andtags
are included in the extended response;workflow_url
was not included, as it can point to a local object and its support would therefore require WES to serve objects (up for debate of course, following feat: Add extra fields to RunStatus to provide additional information #172 and Proposal: Add timestamps to RunStatus #165)Taken together, this proposal avoids or partly addresses the following concerns raised against #172:
RunStatus
schema remains untouched, thus retaining the behavior ofGET /runs/{run_id}/status
(which also usesRunStatus
).RunStatus
instead ofRunSummary
(or indeed any other response that has therun_id
andstate
properties) in run list responses for backward compatibility (anyOf
), this PR adds a couple of measures to more strongly encourage the use of the new extended schema. In particular,DEPRECIATION WARNING
is added to the description ofRunListResponse
, informing implementers that the use ofRunStatus
inRunListResponse
will be discontinued in favor ofRunSchema
only.RunSummary
, which - together with the depreciation warning - should tell every implementer clearly where the specs will be going in this respect; indeed, as soon as support forRunStatus
inRunListResponse
is removed (a simple change), any compliant implementation MUST provide these additional fields in their run list responses.It does NOT address the concern that "clients may not agree with selection of fields".