-
Notifications
You must be signed in to change notification settings - Fork 0
Search API Docs
Some resources:
- Documentation: https://data.microbiomedata.org/docs
Some of the API requires authentication. You will be automatically authenticated if you are signed in with your ORCiD account on the data portal website. As long as you have an active session in the portal, you can use the swagger API page.
This API relies on cookie auth.
- Find the session cookie for your current browser login
chrome://settings/siteData?searchSubpage=microbiomedata&search=cookies
- Your CURL request will look like this:
curl -X 'GET' 'https://data.dev.microbiomedata.org/api/metadata_submission?offset=0&limit=25' -H 'accept: application/json' --cookie "session=PASTE_COOKIE_HERE"
- Study ID lookup: https://data.microbiomedata.org/api/study/gold:Gs0114663
- Sample ID lookup: https://data.microbiomedata.org/api/biosample/gold:Gb0126437
- Sample search: https://data.microbiomedata.org/docs#/biosample/Search_for_biosamples_api_biosample_search_post
- Study search: https://data.microbiomedata.org/docs#/study/Search_for_studies_api_study_search_post
Search is constructed by POST request with a JSON body that describes the query.
- See what you can search by using https://data.microbiomedata.org/api/summary
- You can also interactively learn how to build search payloads by using the Chrome debug tools network inspector.
Example 1:
{"conditions":[{"value":"gold:Gs0114675","table":"study","op":"==","field":"study_id"}],"data_object_filter":[]}
Example 2:
{"conditions":[{"op":"==","field":"omics_type","value":"Organic Matter Characterization","table":"omics_processing"}],"data_object_filter":[]}
The response JSON structure for most endpoints can be inferred from looking at the TypeScript interfaces.
For example, study search returns a SearchResponse<StudySearchResults>
, which can be interpreted as a SearchResponse
where the generic result slot is typed as StudySearchResults
.
The following code snippet can help construct and run queries from Python.
import requests
import json
class Query(object):
offset = 0
limit = 15
item = None
filters = []
results = None
def __getitem__(self, key):
if isinstance(key, slice):
if key.start is None:
self.offset = 0
else:
self.offset = key.start
if key.stop is None:
self.limit = 100 - key.start
else:
self.limit = key.stop - key.start
return self
elif isinstance(key, int):
self.offset = key
self.limit = 1
return self
def __iter__(self):
# print(self._request().json())
if self.results is None:
try:
self.results = self._request().json()["results"]
except:
print(self._request().json())
for result in self.results:
yield result
def filter(self, **kwargs):
if kwargs.get("item") is not None:
table = kwargs["item"]
else:
table = self.item
for arg in kwargs:
if arg == "item":
continue
self.filters.append(dict(op="==", field=arg, value=kwargs[arg], table=table))
return self
def _request(self):
url = f"https://data.microbiomedata.org/api/{self.item}/search?offset={self.offset}&limit={self.limit}"
data = json.dumps(dict(conditions=self.filters))
# print(url)
# print(data)
return requests.post(url, data=data)
class BiosampleQuery(Query):
item = "biosample"
class StudyQuery(Query):
item = "study"
class OmicsProcessingQuery(Query):
item = "omics_processing"
With that code loaded you can perform queries such as:
query = (BiosampleQuery()
.filter(ecosystem_type="Soil")
.filter(item="gene_function", id="KEGG.ORTHOLOGY:K00003"))
for item in query[0:5]:
print(item["name"])
Filters assume the fields apply to the current query type, which in this example allows searching by ecosystem_type
without specifying the item
. Gene functions, on the other hand, must be joined in so the item="gene_function"
is necessary to indicate the id
field applies a gene function.
The slicing operation [0:5]
on the query above sets an offset and limit on the search to retrieve partial results.
This example prints the download URLs of assembly contig fasta files of soil samples:
query = OmicsProcessingQuery().filter(omics_type="Metagenome").filter(item="biosample", ecosystem_type="Soil")
for item in query[0:20]:
for omics in item["omics_data"]:
for output in omics["outputs"]:
if output["file_type"] == "Assembly Contigs":
print(f"https://data.microbiomedata.org{output['url']}")
This example finds studies with James Stegen as PI:
query = StudyQuery().filter(principal_investigator_name="James Stegen")
for item in query:
print(item["name"])