Skip to content

Search API Docs

Brandon Davis edited this page Jun 9, 2022 · 20 revisions

Some resources:

Authenticated endpoints

Some of the API requires authentication. You will be automatically authenticated if you are signed in with your ORCiD account on the data portal website. As long as you have an active session in the portal, you can use the swagger API page.

Programmatic Authentication

This API relies on cookie auth.

  1. Find the session cookie for your current browser login chrome://settings/siteData?searchSubpage=microbiomedata&search=cookies
  2. Your CURL request will look like this: curl -X 'GET' 'https://data.dev.microbiomedata.org/api/metadata_submission?offset=0&limit=25' -H 'accept: application/json' --cookie "session=PASTE_COOKIE_HERE"

Direct ID lookup

Search

Search is constructed by POST request with a JSON body that describes the query.

Example 1:

{"conditions":[{"value":"gold:Gs0114675","table":"study","op":"==","field":"study_id"}],"data_object_filter":[]}

Example 2:

{"conditions":[{"op":"==","field":"omics_type","value":"Organic Matter Characterization","table":"omics_processing"}],"data_object_filter":[]}

Response payload structure

The response JSON structure for most endpoints can be inferred from looking at the TypeScript interfaces.

For example, study search returns a SearchResponse<StudySearchResults>, which can be interpreted as a SearchResponse where the generic result slot is typed as StudySearchResults.

Python

The following code snippet can help construct and run queries from Python.

import requests
import json

class Query(object):
    offset = 0
    limit = 15
    item = None
    filters = []
    results = None

    def __getitem__(self, key):
        if isinstance(key, slice):
            if key.start is None:
                self.offset = 0
            else:
                self.offset = key.start
            if key.stop is None:
                self.limit = 100 - key.start
            else:
                self.limit = key.stop - key.start
            return self
        elif isinstance(key, int):
            self.offset = key
            self.limit = 1
            return self
    
    def __iter__(self):
        # print(self._request().json())
        if self.results is None:
            try:
                self.results = self._request().json()["results"]
            except:
                print(self._request().json())
        for result in self.results:
            yield result

    def filter(self, **kwargs):
        if kwargs.get("item") is not None:
            table = kwargs["item"]
        else:
            table = self.item
        for arg in kwargs:
            if arg == "item":
                continue
            self.filters.append(dict(op="==", field=arg, value=kwargs[arg], table=table))
        return self

    def _request(self):
        url = f"https://data.microbiomedata.org/api/{self.item}/search?offset={self.offset}&limit={self.limit}"
        data = json.dumps(dict(conditions=self.filters))
        # print(url)
        # print(data)
        return requests.post(url, data=data)

class BiosampleQuery(Query):
    item = "biosample"

class StudyQuery(Query):
    item = "study"

class OmicsProcessingQuery(Query):
    item = "omics_processing"

With that code loaded you can perform queries such as:

query = (BiosampleQuery()
    .filter(ecosystem_type="Soil")
    .filter(item="gene_function", id="KEGG.ORTHOLOGY:K00003"))

for item in query[0:5]:
    print(item["name"])

Filters assume the fields apply to the current query type, which in this example allows searching by ecosystem_type without specifying the item. Gene functions, on the other hand, must be joined in so the item="gene_function" is necessary to indicate the id field applies a gene function.

The slicing operation [0:5] on the query above sets an offset and limit on the search to retrieve partial results.

This example prints the download URLs of assembly contig fasta files of soil samples:

query = OmicsProcessingQuery().filter(omics_type="Metagenome").filter(item="biosample", ecosystem_type="Soil")

for item in query[0:20]:
    for omics in item["omics_data"]:
        for output in omics["outputs"]:
            if output["file_type"] == "Assembly Contigs":
                print(f"https://data.microbiomedata.org{output['url']}")

This example finds studies with James Stegen as PI:

query = StudyQuery().filter(principal_investigator_name="James Stegen")

for item in query:
    print(item["name"])
Clone this wiki locally