Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Swagger tag metadata (freeform blurbs) into parameter annotations #665

Closed
eecavanna opened this issue Sep 2, 2024 · 10 comments · Fixed by #804
Closed

Convert Swagger tag metadata (freeform blurbs) into parameter annotations #665

eecavanna opened this issue Sep 2, 2024 · 10 comments · Fixed by #804
Assignees
Labels
cleanup 🧹 Related to cleaning up code, documentation, etc. documentation Improvements or additions to documentation

Comments

@eecavanna
Copy link
Collaborator

eecavanna commented Sep 2, 2024

I volunteered to attempt to transfer the endpoint documentation currently implemented as freeform blurbs for "sections" (i.e. tags) in Swagger UI; into endpoint parameter-specific annotations.

Today, the freeform blurbs are stored in a list variable named tags_metadata in the file nmdc_runtime/api/main.py:

tags_metadata = [
{
"name": "sites",
"description": (
"""A site corresponds to a physical place that may participate in job execution.
A site may register data objects and capabilties with NMDC. It may claim jobs to execute, and it may
update job operations with execution info.
A site must be able to service requests for any data objects it has registered.
A site may expose a "put object" custom method for authorized users. This method facilitates an
operation to upload an object to the site and have the site register that object with the runtime
system.
"""
),
},
{
"name": "users",
"description": (
"""Endpoints for user identification.
Currently, accounts for use of the runtime API are created manually by system administrators.
"""
),
},
{
"name": "workflows",
"description": (
"""A workflow is a template for creating jobs.
Workflow jobs are typically created by the system via trigger associations between
workflows and object types. A workflow may also require certain capabilities of sites
in order for those sites to claim workflow jobs.
"""
),
},
{
"name": "capabilities",
"description": (
"""A workflow may require an executing site to have particular capabilities.
These capabilities go beyond the simple ability to access the data object resources registered with
the runtime system. Sites register their capabilities, and sites are only able to claim workflow
jobs if they are known to have the capabilities required by the workflow.
"""
),
},
{
"name": "object types",
"description": (
"""An object type is an object annotation that is useful for triggering workflows.
A data object may be annotated with one or more types, which in turn can be associated with
workflows through trigger resources.
The data-object type system may be used to trigger workflow jobs on a subset of data objects when a
new version of a workflow is deployed. This could be done by minting a special object type for the
occasion, annotating the subset of data objects with that type, and registering the association of
object type to workflow via a trigger resource.
"""
),
},
{
"name": "triggers",
"description": (
"""A trigger is an association between a workflow and a data object type.
When a data object is annotated with a type, perhaps shortly after object registration, the NMDC
Runtime will check, via trigger associations, for potential new jobs to create for any workflows.
"""
),
},
{
"name": "jobs",
"description": """A job is a resource that isolates workflow configuration from execution.
Rather than directly creating a workflow operation by supplying a workflow ID along with
configuration, NMDC creates a job that pairs a workflow with configuration. Then, a site can claim a
job ID, allowing the site to execute the intended workflow without additional configuration.
A job can have multiple executions, and a workflow's executions are precisely the executions of all
jobs created for that workflow.
A site that already has a compatible job execution result can preempt the unnecessary creation of a
job by pre-claiming it. This will return like a claim, and now the site can register known data
object inputs for the job without the risk of the runtime system creating a claimable job of the
pre-claimed type.
""",
},
{
"name": "objects",
"description": (
"""\
A [Data Repository Service (DRS)
object](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/docs/#_drs_datatypes)
represents content necessary for a workflow job to execute, and/or output from a job execution.
An object may be a *blob*, analogous to a file, or a *bundle*, analogous to a folder. Sites register
objects, and sites must ensure that these objects are accessible to the NMDC data broker.
An object may be associated with one or more object types, useful for triggering workflows.
"""
),
},
{
"name": "operations",
"description": """An operation is a resource for tracking the execution of a job.
When a job is claimed by a site for execution, an operation resource is created.
An operation is akin to a "promise" or "future" in that it should eventually resolve to either a
successful result, i.e. an execution resource, or to an error.
An operation is parameterized to return a result type, and a metadata type for storing progress
information, that are both particular to the job type.
Operations may be paused, resumed, and/or cancelled.
Operations may expire, i.e. not be stored indefinitely. In this case, it is recommended that
execution resources have longer lifetimes / not expire, so that information about successful results
of operations are available.
""",
},
{
"name": "queries",
"description": (
"""A query is an operation (find, update, etc.) against the metadata store.
Metadata -- for studies, biosamples, omics processing, etc. -- is used by sites to execute jobs,
as the parameterization of job executions may depend not only on the content of data objects, but
also on objects' associated metadata.
Also, the function of many workflows is to extract or produce new metadata. Such metadata products
should be registered as data objects, and they may also be supplied by sites to the runtime system
as an update query (if the latter is not done, the runtime system will sense the new metadata and
issue an update query).
"""
),
},
{
"name": "metadata",
"description": """
The [metadata endpoints](https://api.microbiomedata.org/docs#/metadata) can be used to get and filter metadata from
collection set types (including [studies](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Study.html),
[biosamples](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Biosample.html),
[data objects](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/DataObject.html), and
[activities](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Activity.html)).<br/>
The __metadata__ endpoints allow users to retrieve metadata from the data portal using the various GET endpoints
that are slightly different than the __find__ endpoints, but some can be used similarly. As with the __find__ endpoints,
parameters for the __metadata__ endpoints that do not have a red ___* required___ next to them are optional. <br/>
Unlike the compact syntax used in the __find__ endpoints, the syntax for the filter parameter of the metadata endpoints
uses [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/).
The applicable parameters of the __metadata__ endpoints, with acceptable syntax and examples, are in the table below.
<details>
<summary>More Details</summary>
| Parameter | Description | Syntax | Example |
| :---: | :-----------: | :-------: | :---: |
| collection_name | The name of the collection to be queried. For a list of collection names please see the [Database class](https://microbiomedata.github.io/nmdc-schema/Database/) of the NMDC Schema | String | `biosample_set` |
| filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | [MongoDB-like query language](https://www.mongodb.com/docs/manual/tutorial/query-documents/). All strings should be in double quotation marks. | `{"lat_lon.latitude": {"$gt": 45.0}, "ecosystem_category": "Plants"}` |
| max_page_size | Specifies the maximum number of documents returned at a time | Integer | `25`
| page_token | Specifies the token of the page to return. If unspecified, the first page is returned. To retrieve a subsequent page, the value received as the `next_page_token` from the bottom of the previous results can be provided as a `page_token`. ![next_page_token](../_static/images/howto_guides/api_gui/metadata_page_token_param.png) | String | `nmdc:sys0ae1sh583`
| projection | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ecosystem_type` |
| doc_id | The unique identifier of the item being requested. For example, the identifier of a biosample or an extraction | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-ha3vfb58` |<br/>
<br/>
</details>
""",
},
{
"name": "find",
"description": """
The [find endpoints](https://api.microbiomedata.org/docs#/find:~:text=Find%20NMDC-,metadata,-entities.) are provided with
NMDC metadata entities already specified - where metadata about [studies](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Study.html),
[biosamples](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Biosample.html),
[data objects](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/DataObject.html), and
[activities](https://nmdc-documentation.readthedocs.io/en/latest/reference/metadata/Activity.html) can be retrieved using GET requests.
Each endpoint is unique and requires the applicable attribute names to be known in order to structure a query in a meaningful way.
Please note that endpoints with parameters that do not have a red ___* required___ label next to them are optional.<br/>
The applicable parameters of the ___find___ endpoints, with acceptable syntax and examples, are in the table below.
<details><summary>More Details</summary>
| Parameter | Description | Syntax | Example |
| :---: | :-----------: | :-------: | :---: |
| filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | Comma separated string of attribute:value pairs. Can include comparison operators like >=, <=, <, and >. May use a `.search` after the attribute name to conduct a full text search of the field that are of type string. e.g. `attribute:value,attribute.search:value` | `ecosystem_category:Plants, lat_lon.latitude:>35.0` |
| search | Not yet implemented | Coming Soon | Not yet implemented |
| sort | Specifies the order in which the query returns the matching documents | Comma separated string of attribute:value pairs, where the value can be empty, `asc`, or `desc` (for ascending or descending order) e.g. `attribute` or `attribute:asc` or `attribute:desc`| `depth.has_numeric_value:desc, ecosystem_type` |
| page | Specifies the desired page number among the paginated results | Integer | `3` |
| per_page | Specifies the number of results returned per page. Maximum allowed is 2,000 | Integer | `50` |
| cursor | A bookmark for where a query can pick up where it has left off. To use cursor paging, set the `cursor` parameter to `*`. The results will include a `next_cursor` value in the response's `meta` object that can be used in the `cursor` parameter to retrieve the subsequent results ![next_cursor](../_static/images/howto_guides/api_gui/find_cursor.png) | String | `*` or `nmdc:sys0zr0fbt71` |
| group_by | Not yet implemented | Coming Soon | Not yet implemented |
| fields | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ess_dive_datasets` |
| study_id | The unique identifier of a study | Curie e.g. `prefix:identifier` | `nmdc:sty-11-34xj1150` |
| sample_id | The unique identifier of a biosample | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-w43vsm21` |
| data_object_id | The unique identifier of a data object | Curie e.g. `prefix:identifier` | `nmdc:dobj-11-7c6np651` |
| activity_id | The unique identifier for an NMDC workflow execution activity | Curie e.g. `prefix:identifier` | `nmdc:wfmgan-11-hvcnga50.1`|<br/>
<br/>
</details>
""",
},
{
"name": "runs",
"description": (
"[WORK IN PROGRESS] Run simple jobs. "
"For off-site job runs, keep the Runtime appraised of run events."
),
},
]

An example of "parameter-specific annotations" is shown in this issue: #651

@eecavanna eecavanna added documentation Improvements or additions to documentation cleanup 🧹 Related to cleaning up code, documentation, etc. labels Sep 2, 2024
@eecavanna eecavanna self-assigned this Sep 2, 2024
@eecavanna eecavanna changed the title Convert tag metadata (freeform blurbs) into parameter annotations (Swagger) Convert Swagger tag metadata (freeform blurbs) into parameter annotations Sep 2, 2024
@eecavanna
Copy link
Collaborator Author

Some of the tag metadata was copied — at least, partially — from this how-to guide (https://github.com/microbiomedata/NMDC_documentation/blob/main/docs/howto_guides/api_gui.md) via commit 1a9ff23.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Sep 2, 2024

The FindRequest class (which is something that the definitions of find-related parameters are "consolidated" within) is implemented as a Pydantic class. According to https://stackoverflow.com/questions/64364499/set-description-for-query-parameter-in-swagger-doc-using-pydantic-model-fastapi, this can't directly be used to specify an endpoint's parameters in a way that results in Swagger UI showing the parameter metadata. There is a workaround. At this point, I don't know why someone implemented the FindRequest class as a Pydantic class (maybe it was to take advantage of some Pydantic feature).

@ssarrafan
Copy link

Moving to next sprint @eecavanna

@eecavanna
Copy link
Collaborator Author

eecavanna commented Sep 18, 2024

I'll defer this until after the Berkeley Schema Roll Out (unless the Runtime code freeze allows for this type of change via an exception process—TBD).

CC: @aclum

@turbomam
Copy link
Member

@eecavanna @ssarrafan

I got an email, in my LBL inbox, that claimed to be a comment on this issue from a user named "levente". It consisted of a suggestion to click on a bit.ly link.

I was tagged as spam by Gmail

Did any of you get that. Does it concern you?

@eecavanna
Copy link
Collaborator Author

Hi @turbomam,

If the username was Klevente12 (which contains the substring, "levente") and the comment date was September 2nd (or a few days before that); I did see a spam comment from that user and reported it to GitHub (the company), who confirmed it violated their terms. Here's an excerpt from the follow-up message I got from GitHub (the company):

Our review of the account named in your report has concluded. We have determined that one or more violations of GitHub’s Terms of Service have occurred and have taken appropriate action in response.

When I see spam comments like that, I (a) refrain from engaging with them and (b) report them to GitHub using the action menu on the comment.

At this point, it's not something I'm worried about (any more than spam on other public forums).

@eecavanna
Copy link
Collaborator Author

These are some web pages (related to this task) I want to preserve links to before I reboot my laptop:

@ssarrafan
Copy link

@eecavanna can this be moved out of the sprint and backlog labeled?

@eecavanna
Copy link
Collaborator Author

If there is a board for the "sprint after next," I'd like for this to be moved to there instead of to a backlog. Otherwise, backlog is OK with me. The task is partially done, just got lowered in priority in favor of other tasks.

@ssarrafan
Copy link

Moved to sprint 48. Thanks Eric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 Related to cleaning up code, documentation, etc. documentation Improvements or additions to documentation
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants