Dependency versions: nmdc-runtime=1.9.1.dev86+g66b36e9
Metadata describing NMDC data (e.g. studies, biosamples, data objects, etc.) may be retrieved with GET requests, using the NMDC API Graphical User Interface (GUI). The API GUI provides a guided user interface for direct access to the NMDC data portal. It allows for:
- performing highly granular and targeted queries directly. This is especially helpful if a user has a query that may not be supported by the NMDC Data Portal yet.
- interactive exploration of querying capabilities. It provides code snippets that can be used in scripts for programmatic access, i.e. the request
curl
commands and URLs provided in the responses (please see the examples below).
Please note that the endpoints discussed in this documentation are targeted for users, such as NMDC data consumers. For documentation describing other endpoints, primarily used by developers, please see nmdc-runtime-docs.
Requests can include various parameters to filter, sort, and organize the requested information. Attribute names in the parameters will vary depending on the collection. The required syntax of the parameters will also vary, depending on if it is a find or a metadata endpoint. Find endpoints are designed to use more compact syntax (for example, filtering biosamples for an "Ecosystem Category" of "Plants" would look like ecosystem_category:Plants
using the GET /biosamples
endpoint). While metadata endpoints use MongoDB-like language querying (e.g. the same filter would look like {"ecosystem_category": "Plants"}
using the GET /nmdcshema/{collection_name}
endpoint with collection_name
set to biosample_set
).
The find endpoints are provided with NMDC metadata entities already specified - where metadata about studies, biosamples, data objects, and planned processes can be retrieved using GET requests.
The applicable parameters of the find endpoints, with acceptable syntax and examples, are in the table below.
Each endpoint is unique and requires the applicable attribute names to be known in order to structure a query in a meaningful way. Please note that endpoints with parameters that do not have a red * required
label next to them, are optional.
The GET /studies
endpoint is a general purpose way to retrieve NMDC studies based on parameters provided by the user. Studies can be filtered and sorted based on the applicable Study attributes.
If the study identifier is known, a study can be retrieved directly using the GET /studies/{study_id}
endpoint. Note that only one study can be retrieved at a time using this method.
The GET /biosamples
endpoint is a general purpose way to retrieve biosample metadata using user-provided filter and sort criteria. Please see the applicable Biosample attributes.
If the biosample identifier is known, a biosample can be retrieved directly using the GET /biosamples/{sample_id}
. Note that only one biosample metadata record can be retrieved at a time using this method.
To retrieve metadata about NMDC data objects (such as files, records, or omics data) the GET /data_objects
endpoint may be used along with various parameters. Please see the applicable Data Object attributes.
If the data object identifier is known, the metadata can be retrieved using the GET /data_objects/{data_object_id}
endpoint. Note that only one data object metadata record may be retrieved at a time using this method.
The GET /planned_processes
endpoint is a general way to fetch metadata about various planned processes (e.g. mass spectrometry, natural organic matter analysis, library preparation, etc.). Any "slot" (a.k.a. attribute) for PlannedProcess subclasses may be used in the filter and sort parameters. For example, attributes used in subclasses such as, MetabolomicsAnalysis (subclass of WorkflowExecution
which is a subclass of PlannedProcess
) or (Extraction)[https://microbiomedata.github.io/nmdc-schema/Extraction/] (subclass of MaterialProcessing
which is a subclass of PlannedProcess
), can be used as input criteria for the filter and sort parameters of this endpoint.
If the planned process identifier is known, the metadata can be retrieved using the GET /planned_processes/{planned_process_id}
endpoint. Note that only one metadata record for a planned process may be returned at a time using this method.
For more information and to see more examples of find endpoints outside of the autogenerated user interface, please visit: https://api.microbiomedata.org/search
Find Endpoint Example: get all studies that have EMSL (Environmental Molecular Sciences Laboratory) related funding
- Click on the drop down arrow to the right side of the
GET /studies
endpoint - Click Try it out in the upper right of the expanded endpoint box
- Enter in parameters. In this case, we will input
funding_sources.search:EMSL
into the filter parameter. The.search
performs a full text search to find studies withfunding_sources
that have the word "EMSL" in its value. - Click Execute
- View the results in JSON format, available to download by clicking Download, or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case, two studies were retrieved.
The metadata endpoints can be used to get and filter metadata from collection set types (including studies, biosamples, planned processes, and data objects as discussed in the find section).
Unlike the compact syntax used in the find endpoints, the syntax for the filter parameter of the metadata endpoints uses MongoDB-like language querying. The applicable parameters of the metadata endpoints, with acceptable syntax and examples, are in the table below.
Parameter | Description | Syntax | Example |
---|---|---|---|
collection_name | The name of the collection to be queried. For a list of collection names please see the Database class of the NMDC Schema | String | biosample_set |
filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | MongoDB-like query language. All strings should be in double quotation marks. | {"lat_lon.latitude": {"$gt": 45.0}, "ecosystem_category": "Plants"} |
max_page_size | Specifies the maximum number of documents returned at a time | Integer | 25 |
page_token | Specifies the token of the page to return. If unspecified, the first page is returned. To retrieve a subsequent page, the value received as the next_page_token from the bottom of the previous results can be provided as a page_token . |
String | nmdc:sys0ae1sh583 |
projection | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | name, ecosystem_type |
doc_id | The unique identifier of the item being requested. For example, the identifier of a biosample or an extraction | Curie e.g. prefix:identifier |
nmdc:bsm-11-ha3vfb58 |
The metadata endpoints allow users to retrieve metadata from the data portal using the various GET
endpoints that are slightly different than the find endpoints, but some can be used similarly. As with the find endpoints, parameters for the metadata endpoints that do not have a red * required
next to them are optional.
To view the NMDC Schema version the database is currently using, try executing the GET /nmdcschema/version
endpoint
To get the NMDC Database collection statistics, like the total count of records in a collection or the size of the collection, try executing the GET /nmdcschema/collection_stats
endpoint
The GET /nmdcschema/{collection_name}
endpoint is a general purpose way to retrieve metadata about a specified collection given user-provided filter and projection criteria. Please see the Collection Names that may be retrieved. Please note that metadata may only be retrieved about one collection at a time.
If the identifier of the record is known, the GET /nmdcshema/ids/{doc_id}
can be used to retrieve the specified record. Note that only one identifier may be used at a time, and therefore, only one record may be retrieved at a time using this method.
If both the identifier and the collection name of the desired record is known, the GET /nmdcschema/{collection_name}/{doc_id}
can be used to retrieve the record. The projection parameter is optionally available for this endpoint to retrieve only desired attributes from a record. Please note that only one record can be retrieved at one time using this method.
Metadata Endpoints Example 1: Get all of the biosamples that are part of the 1000 Soils Research Campaign Study sampled from Colorado
- Click on the drop down arrow to the right side of the
GET /nmdcschema/{collection_name}
endpoint - Click Try it out in the upper right of the expanded endpoint box
- In order to enter in the parameters, get the identifier for this study by navigating to the 1000 Soils Research Campaign study page in the data portal and copying the
ID
- Enter in the parameters in the
GET /nmdcschema/{collection_name}
endpoint. For this example, we will inputbiosample_set
into the collection_name parameter and{"part_of": "nmdc:sty-11-28tm5d36", "geo_loc_name.has_raw_value": {"$regex": "Colorado"}}
into the filter parameter. See the Biosample Class in the NMDC Schema to view the applicable biosample attributes (slots); for this example, they arepart_of
andgeo_loc_name.has_raw_value
. Note that$regex
conducts a full text search for the word "Colorado" in thegeo_loc_name.has_raw_value
attribute. - Click Execute
- View the results in JSON format, available to download by clicking Download; or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case, two studies were retrieved. Note that the curl and request URL are provided as well.