-
Notifications
You must be signed in to change notification settings - Fork 0
Search API Docs
Some resources:
- Documentation: https://data.microbiomedata.org/docs
Some of the API requires authentication. You will be automatically authenticated if you are signed in with your ORCiD account on the data portal website. As long as you have an active session in the portal, you can use the swagger API page.
In order to make Authenticated calls programmatically, you'll need to obtain an access token.
- Log in to the NMDC Data portal: https://data.microbiomedata.org
- Navigate to your user page by clicking your name in the top right corner or clicking this link.
- Under "Developer Tools," copy the refresh token
- Use the API to exchange the refresh token for an access token:
curl -X POST 'https://data.microbiomedata.org/auth/refresh' -H "Content-Type: application/json" -d '{"refresh_token": "PASTE_REFRESH_TOKEN_HERE"}'
- Extract the access token from the response and store it somewhere for reuse
- These instructions, as well as details on how to use the access token for API calls is documented on the User page under "Developer Tools"
- Study ID lookup: https://data.microbiomedata.org/details/study/nmdc:sty-11-r2h77870
- Sample ID lookup: https://data.microbiomedata.org/details/sample/nmdc:bsm-11-bd18tk15
- Sample search: https://data.microbiomedata.org/docs#/biosample/Search_for_biosamples_api_biosample_search_post
- Study search: https://data.microbiomedata.org/docs#/study/Search_for_studies_api_study_search_post
Search is constructed by POST request with a JSON body that describes the query.
- See what you can search by using https://data.microbiomedata.org/api/summary
- You can also interactively learn how to build search payloads by using the Chrome debug tools network inspector.
Example 1:
{"conditions":[{"value":"nmdc:sty-11-r2h77870","table":"study","op":"==","field":"study_id"}],"data_object_filter":[]}
Example 2:
{"conditions":[{"op":"==","field":"omics_type","value":"Organic Matter Characterization","table":"omics_processing"}],"data_object_filter":[]}
The response JSON structure for most endpoints can be inferred from looking at the TypeScript interfaces.
For example, study search returns a SearchResponse<StudySearchResults>
, which can be interpreted as a SearchResponse
where the generic result slot is typed as StudySearchResults
.
The following code snippet can help construct and run queries from Python.
import requests
import json
class Query(object):
offset = 0
limit = 15
item = None
filters = []
results = None
def __getitem__(self, key):
if isinstance(key, slice):
if key.start is None:
self.offset = 0
else:
self.offset = key.start
if key.stop is None:
self.limit = 100 - key.start
else:
self.limit = key.stop - key.start
return self
elif isinstance(key, int):
self.offset = key
self.limit = 1
return self
def __iter__(self):
# print(self._request().json())
if self.results is None:
try:
self.results = self._request().json()["results"]
except:
print(self._request().json())
for result in self.results:
yield result
def filter(self, **kwargs):
if kwargs.get("item") is not None:
table = kwargs["item"]
else:
table = self.item
for arg in kwargs:
if arg == "item":
continue
self.filters.append(dict(op="==", field=arg, value=kwargs[arg], table=table))
return self
def _request(self):
url = f"https://data.microbiomedata.org/api/{self.item}/search?offset={self.offset}&limit={self.limit}"
data = json.dumps(dict(conditions=self.filters))
# print(url)
# print(data)
return requests.post(url, data=data)
class BiosampleQuery(Query):
item = "biosample"
class StudyQuery(Query):
item = "study"
class OmicsProcessingQuery(Query):
item = "omics_processing"
With that code loaded you can perform queries such as:
query = (BiosampleQuery()
.filter(ecosystem_type="Soil")
.filter(item="kegg_function", id="KEGG.ORTHOLOGY:K00003"))
for item in query[0:5]:
print(item["name"])
Filters assume the fields apply to the current query type, which in this example allows searching by ecosystem_type
without specifying the item
. Gene functions, on the other hand, must be joined in so the item="kegg_function"
is necessary to indicate the id
field applies a gene function.
The slicing operation [0:5]
on the query above sets an offset and limit on the search to retrieve partial results.
This example prints the download URLs of assembly contig fasta files of soil samples:
query = OmicsProcessingQuery().filter(omics_type="Metagenome").filter(item="biosample", ecosystem_type="Soil")
for item in query[0:20]:
for omics in item["omics_data"]:
for output in omics["outputs"]:
if output["file_type"] == "Assembly Contigs":
print(f"https://data.microbiomedata.org{output['url']}")
This example finds studies with James Stegen as PI:
query = StudyQuery().filter(principal_investigator_name="James Stegen")
for item in query:
print(item["name"])
The following searches for biosamples based on the following criteria
- PI name ("Mitchel J. Doktycz")
- Broad-scale Envioronmental Context ("terrestrial biome" [ENVO:00000446])
- Environmental medium ("bulk soil" [ENVO:00005802])
- Omics type ("metagenome")
Find all biosamples matching this criteria
POST
https://data-dev.microbiomedata.org/api/biosample/search
Payload
{
"conditions": [
{
"op": "==",
"field": "principal_investigator_name",
"value": "Mitchel J. Doktycz",
"table": "study"
},
{
"op": "==",
"field": "env_broad_scale",
"value": "terrestrial biome",
"table": "biosample"
},
{
"op": "==",
"field": "env_medium",
"value": "bulk soil",
"table": "biosample"
},
{
"op": "==",
"field": "omics_type",
"value": "Metagenome",
"table": "omics_processing"
},
{
"op": "==",
"field": "processing_institution",
"value": "JGI",
"table": "omics_processing"
}
],
"data_object_filter": []
}
Response (truncated)
{
"count": 119,
"results": [
{
"id": "nmdc:bsm-11-dd8yd668",
"name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
"description": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA",
"alternate_identifiers": [
"gold:Gb0291628",
"img.taxon:3300053075"
],
"annotations": {
"ph": 6.48,
"elev": 62,
"type": "nmdc:Biosample",
"zinc": "4.2708 mg/kg",
"depth": {
"type": "nmdc:QuantityValue",
"has_unit": "m",
"has_raw_value": "0.1",
"has_numeric_value": 0.1
},
"lbceq": "1531.2 ppm",
"calcium": "2044.05 mg/kg",
"habitat": "Soil",
"lat_lon": "44.5876 -123.1939",
"location": "USA",
"magnesium": "398.928 mg/kg",
"manganese": "18.3765 mg/kg",
"potassium": "184.092 mg/kg",
"samp_name": "BESC-388-Co2_54_7 soil",
"tot_nitro": "0.327 Percent",
"lbc_thirty": "528 ppm",
"geo_loc_name": "USA: Oregon",
"samp_taxon_id": "soil metagenome [NCBITaxon:410658]",
"collected_from": "nmdc:frsite-11-wdy01683",
"nitrate_nitrogen": "0.463 mg/kg",
"nitrite_nitrogen": "0 mg/kg",
"ammonium_nitrogen": "0 mg/kg",
"ncbi_taxonomy_name": "soil metagenome",
"sample_collection_site": "Bulk Soil"
},
"study_id": "nmdc:sty-11-r2h77870",
"depth": 0.1,
"env_broad_scale_id": "ENVO:00000446",
"env_local_scale_id": "ENVO:00000011",
"env_medium_id": "ENVO:00005802",
"longitude": -123.1939,
"latitude": 44.5876,
"add_date": "2021-05-01T00:00:00",
"mod_date": "2021-05-01T00:00:00",
"collection_date": "2020-09-03T00:00:00",
"ecosystem": "Environmental",
"ecosystem_category": "Terrestrial",
"ecosystem_type": "Soil",
"ecosystem_subtype": "Botanical garden",
"specific_ecosystem": "Bulk soil",
"open_in_gold": null,
"env_broad_scale": {
"id": "ENVO:00000446",
"label": "terrestrial biome",
"url": "http://purl.obolibrary.org/obo/ENVO:00000446",
"data": {}
},
"env_local_scale": {
"id": "ENVO:00000011",
"label": "garden",
"url": "http://purl.obolibrary.org/obo/ENVO:00000011",
"data": {}
},
"env_medium": {
"id": "ENVO:00005802",
"label": "bulk soil",
"url": "http://purl.obolibrary.org/obo/ENVO:00005802",
"data": {}
},
"env_broad_scale_terms": [
"entity",
"continuant",
"independent continuant",
"material entity",
"biome",
"terrestrial biome",
"environmental system",
"astronomical body part",
"environmental system determined by a quality",
"ecosystem",
"terrestrial ecosystem",
"system"
],
"env_local_scale_terms": [
"entity",
"continuant",
"independent continuant",
"material entity",
"geographic feature",
"anthropogenic geographic feature",
"garden",
"astronomical body part"
],
"env_medium_terms": [
"entity",
"continuant",
"independent continuant",
"material entity",
"soil",
"bulk soil",
"environmental material",
"astronomical body part"
],
"emsl_biosample_identifiers": [],
"omics_processing": [
{
"id": "nmdc:omprc-11-7w7kd053",
"name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
"description": "",
"alternate_identifiers": [],
"annotations": {
"type": "nmdc:NucleotideSequencing",
"omics_type": "Metagenome",
"instrument_name": "Illumina NovaSeq 6000",
"analyte_category": "Metagenome",
"ncbi_project_name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
"principal_investigator": "Mitchel Doktycz",
"processing_institution": "JGI"
},
"study_id": "nmdc:sty-11-r2h77870",
"biosample_inputs": [
{
"id": "nmdc:bsm-11-dd8yd668",
"name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
"description": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA",
"alternate_identifiers": [
"gold:Gb0291628",
"img.taxon:3300053075"
],
"annotations": {
"ph": 6.48,
"elev": 62,
"type": "nmdc:Biosample",
"zinc": "4.2708 mg/kg",
"depth": {
"type": "nmdc:QuantityValue",
"has_unit": "m",
"has_raw_value": "0.1",
"has_numeric_value": 0.1
},
"lbceq": "1531.2 ppm",
"calcium": "2044.05 mg/kg",
"habitat": "Soil",
"lat_lon": "44.5876 -123.1939",
"location": "USA",
"magnesium": "398.928 mg/kg",
"manganese": "18.3765 mg/kg",
"potassium": "184.092 mg/kg",
"samp_name": "BESC-388-Co2_54_7 soil",
"tot_nitro": "0.327 Percent",
"lbc_thirty": "528 ppm",
"geo_loc_name": "USA: Oregon",
"samp_taxon_id": "soil metagenome [NCBITaxon:410658]",
"collected_from": "nmdc:frsite-11-wdy01683",
"nitrate_nitrogen": "0.463 mg/kg",
"nitrite_nitrogen": "0 mg/kg",
"ammonium_nitrogen": "0 mg/kg",
"ncbi_taxonomy_name": "soil metagenome",
"sample_collection_site": "Bulk Soil"
},
"study_id": "nmdc:sty-11-r2h77870",
"depth": 0.1,
"env_broad_scale_id": "ENVO:00000446",
"env_local_scale_id": "ENVO:00000011",
"env_medium_id": "ENVO:00005802",
"longitude": -123.1939,
"latitude": 44.5876,
"add_date": "2021-05-01T00:00:00",
"mod_date": "2021-05-01T00:00:00",
"collection_date": "2020-09-03T00:00:00",
"ecosystem": "Environmental",
"ecosystem_category": "Terrestrial",
"ecosystem_type": "Soil",
"ecosystem_subtype": "Botanical garden",
"specific_ecosystem": "Bulk soil"
}
],
"add_date": "2021-05-01T00:00:00",
"mod_date": "2021-05-01T00:00:00",
"open_in_gold": null,
"biosample_ids": [],
"omics_data": [],
"outputs": []
}
],
"multiomics": 16
}
]
}
Find the count of biosamples for each geographic location.
POST
https://data-dev.microbiomedata.org/api/biosample/facet
Payload
{
"conditions": [
{
"op": "==",
"field": "principal_investigator_name",
"value": "Mitchel J. Doktycz",
"table": "study"
},
{
"op": "==",
"field": "env_broad_scale",
"value": "terrestrial biome",
"table": "biosample"
},
{
"op": "==",
"field": "env_medium",
"value": "bulk soil",
"table": "biosample"
},
{
"op": "==",
"field": "omics_type",
"value": "Metagenome",
"table": "omics_processing"
},
{
"op": "==",
"field": "processing_institution",
"value": "JGI",
"table": "omics_processing"
}
],
"attribute": "geo_loc_name"
}
Response
{
"facets": {
"USA: Oregon": 108,
"USA: Oregon, Clatskanie": 1,
"USA: Tennessee": 10
}
}
Find the counts of biosamples collected in each month
POST
https://data-dev.microbiomedata.org/api/biosample/binned_facet
Payload
{
"attribute": "collection_date",
"conditions": [
{
"op": "==",
"field": "principal_investigator_name",
"value": "Mitchel J. Doktycz",
"table": "study"
},
{
"op": "==",
"field": "env_broad_scale",
"value": "terrestrial biome",
"table": "biosample"
},
{
"op": "==",
"field": "env_medium",
"value": "bulk soil",
"table": "biosample"
},
{
"op": "==",
"field": "omics_type",
"value": "Metagenome",
"table": "omics_processing"
},
{
"op": "==",
"field": "processing_institution",
"value": "JGI",
"table": "omics_processing"
}
],
"resolution": "month"
}
Response
{
"facets": [
10,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
109
],
"bins": [
"2014-08-01T00:00:00",
"2014-09-01T00:00:00",
"2014-10-01T00:00:00",
"2014-11-01T00:00:00",
"2014-12-01T00:00:00",
"2015-01-01T00:00:00",
"2015-02-01T00:00:00",
"2015-03-01T00:00:00",
"2015-04-01T00:00:00",
"2015-05-01T00:00:00",
"2015-06-01T00:00:00",
"2015-07-01T00:00:00",
"2015-08-01T00:00:00",
"2015-09-01T00:00:00",
"2015-10-01T00:00:00",
"2015-11-01T00:00:00",
"2015-12-01T00:00:00",
"2016-01-01T00:00:00",
"2016-02-01T00:00:00",
"2016-03-01T00:00:00",
"2016-04-01T00:00:00",
"2016-05-01T00:00:00",
"2016-06-01T00:00:00",
"2016-07-01T00:00:00",
"2016-08-01T00:00:00",
"2016-09-01T00:00:00",
"2016-10-01T00:00:00",
"2016-11-01T00:00:00",
"2016-12-01T00:00:00",
"2017-01-01T00:00:00",
"2017-02-01T00:00:00",
"2017-03-01T00:00:00",
"2017-04-01T00:00:00",
"2017-05-01T00:00:00",
"2017-06-01T00:00:00",
"2017-07-01T00:00:00",
"2017-08-01T00:00:00",
"2017-09-01T00:00:00",
"2017-10-01T00:00:00",
"2017-11-01T00:00:00",
"2017-12-01T00:00:00",
"2018-01-01T00:00:00",
"2018-02-01T00:00:00",
"2018-03-01T00:00:00",
"2018-04-01T00:00:00",
"2018-05-01T00:00:00",
"2018-06-01T00:00:00",
"2018-07-01T00:00:00",
"2018-08-01T00:00:00",
"2018-09-01T00:00:00",
"2018-10-01T00:00:00",
"2018-11-01T00:00:00",
"2018-12-01T00:00:00",
"2019-01-01T00:00:00",
"2019-02-01T00:00:00",
"2019-03-01T00:00:00",
"2019-04-01T00:00:00",
"2019-05-01T00:00:00",
"2019-06-01T00:00:00",
"2019-07-01T00:00:00",
"2019-08-01T00:00:00",
"2019-09-01T00:00:00",
"2019-10-01T00:00:00",
"2019-11-01T00:00:00",
"2019-12-01T00:00:00",
"2020-01-01T00:00:00",
"2020-02-01T00:00:00",
"2020-03-01T00:00:00",
"2020-04-01T00:00:00",
"2020-05-01T00:00:00",
"2020-06-01T00:00:00",
"2020-07-01T00:00:00",
"2020-08-01T00:00:00",
"2020-09-01T00:00:00",
"2020-10-01T00:00:00"
]
}