Skip to content

Search API Docs

Michael Nagler edited this page Dec 23, 2024 · 20 revisions

Some resources:

Authenticated endpoints

Some of the API requires authentication. You will be automatically authenticated if you are signed in with your ORCiD account on the data portal website. As long as you have an active session in the portal, you can use the swagger API page.

Programmatic Authentication

In order to make Authenticated calls programmatically, you'll need to obtain an access token.

  1. Log in to the NMDC Data portal: https://data.microbiomedata.org
  2. Navigate to your user page by clicking your name in the top right corner or clicking this link.
  3. Under "Developer Tools," copy the refresh token
  4. Use the API to exchange the refresh token for an access token: curl -X POST 'https://data.microbiomedata.org/auth/refresh' -H "Content-Type: application/json" -d '{"refresh_token": "PASTE_REFRESH_TOKEN_HERE"}'
  5. Extract the access token from the response and store it somewhere for reuse
  6. These instructions, as well as details on how to use the access token for API calls is documented on the User page under "Developer Tools"

Direct ID lookup

Search

Search is constructed by POST request with a JSON body that describes the query.

Example 1:

{"conditions":[{"value":"nmdc:sty-11-r2h77870","table":"study","op":"==","field":"study_id"}],"data_object_filter":[]}

Example 2:

{"conditions":[{"op":"==","field":"omics_type","value":"Organic Matter Characterization","table":"omics_processing"}],"data_object_filter":[]}

Response payload structure

The response JSON structure for most endpoints can be inferred from looking at the TypeScript interfaces.

For example, study search returns a SearchResponse<StudySearchResults>, which can be interpreted as a SearchResponse where the generic result slot is typed as StudySearchResults.

Python

The following code snippet can help construct and run queries from Python.

import requests
import json

class Query(object):
    offset = 0
    limit = 15
    item = None
    filters = []
    results = None

    def __getitem__(self, key):
        if isinstance(key, slice):
            if key.start is None:
                self.offset = 0
            else:
                self.offset = key.start
            if key.stop is None:
                self.limit = 100 - key.start
            else:
                self.limit = key.stop - key.start
            return self
        elif isinstance(key, int):
            self.offset = key
            self.limit = 1
            return self
    
    def __iter__(self):
        # print(self._request().json())
        if self.results is None:
            try:
                self.results = self._request().json()["results"]
            except:
                print(self._request().json())
        for result in self.results:
            yield result

    def filter(self, **kwargs):
        if kwargs.get("item") is not None:
            table = kwargs["item"]
        else:
            table = self.item
        for arg in kwargs:
            if arg == "item":
                continue
            self.filters.append(dict(op="==", field=arg, value=kwargs[arg], table=table))
        return self

    def _request(self):
        url = f"https://data.microbiomedata.org/api/{self.item}/search?offset={self.offset}&limit={self.limit}"
        data = json.dumps(dict(conditions=self.filters))
        # print(url)
        # print(data)
        return requests.post(url, data=data)

class BiosampleQuery(Query):
    item = "biosample"

class StudyQuery(Query):
    item = "study"

class OmicsProcessingQuery(Query):
    item = "omics_processing"

With that code loaded you can perform queries such as:

query = (BiosampleQuery()
    .filter(ecosystem_type="Soil")
    .filter(item="kegg_function", id="KEGG.ORTHOLOGY:K00003"))

for item in query[0:5]:
    print(item["name"])

Filters assume the fields apply to the current query type, which in this example allows searching by ecosystem_type without specifying the item. Gene functions, on the other hand, must be joined in so the item="kegg_function" is necessary to indicate the id field applies a gene function.

The slicing operation [0:5] on the query above sets an offset and limit on the search to retrieve partial results.

This example prints the download URLs of assembly contig fasta files of soil samples:

query = OmicsProcessingQuery().filter(omics_type="Metagenome").filter(item="biosample", ecosystem_type="Soil")

for item in query[0:20]:
    for omics in item["omics_data"]:
        for output in omics["outputs"]:
            if output["file_type"] == "Assembly Contigs":
                print(f"https://data.microbiomedata.org{output['url']}")

This example finds studies with James Stegen as PI:

query = StudyQuery().filter(principal_investigator_name="James Stegen")

for item in query:
    print(item["name"])

Full Example Searching for Biosamples

The following searches for biosamples based on the following criteria

  • PI name ("Mitchel J. Doktycz")
  • Broad-scale Envioronmental Context ("terrestrial biome" [ENVO:00000446])
  • Environmental medium ("bulk soil" [ENVO:00005802])
  • Omics type ("metagenome")

Search Page

Search

Find all biosamples matching this criteria

POST https://data-dev.microbiomedata.org/api/biosample/search

Try it out!

Payload
{
    "conditions": [
        {
            "op": "==",
            "field": "principal_investigator_name",
            "value": "Mitchel J. Doktycz",
            "table": "study"
        },
        {
            "op": "==",
            "field": "env_broad_scale",
            "value": "terrestrial biome",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "env_medium",
            "value": "bulk soil",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "omics_type",
            "value": "Metagenome",
            "table": "omics_processing"
        },
        {
            "op": "==",
            "field": "processing_institution",
            "value": "JGI",
            "table": "omics_processing"
        }
    ],
    "data_object_filter": []
}
Response (truncated)
{
  "count": 119,
  "results": [
    {
      "id": "nmdc:bsm-11-dd8yd668",
      "name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
      "description": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA",
      "alternate_identifiers": [
        "gold:Gb0291628",
        "img.taxon:3300053075"
      ],
      "annotations": {
        "ph": 6.48,
        "elev": 62,
        "type": "nmdc:Biosample",
        "zinc": "4.2708 mg/kg",
        "depth": {
          "type": "nmdc:QuantityValue",
          "has_unit": "m",
          "has_raw_value": "0.1",
          "has_numeric_value": 0.1
        },
        "lbceq": "1531.2 ppm",
        "calcium": "2044.05 mg/kg",
        "habitat": "Soil",
        "lat_lon": "44.5876 -123.1939",
        "location": "USA",
        "magnesium": "398.928 mg/kg",
        "manganese": "18.3765 mg/kg",
        "potassium": "184.092 mg/kg",
        "samp_name": "BESC-388-Co2_54_7 soil",
        "tot_nitro": "0.327 Percent",
        "lbc_thirty": "528 ppm",
        "geo_loc_name": "USA: Oregon",
        "samp_taxon_id": "soil metagenome [NCBITaxon:410658]",
        "collected_from": "nmdc:frsite-11-wdy01683",
        "nitrate_nitrogen": "0.463 mg/kg",
        "nitrite_nitrogen": "0 mg/kg",
        "ammonium_nitrogen": "0 mg/kg",
        "ncbi_taxonomy_name": "soil metagenome",
        "sample_collection_site": "Bulk Soil"
      },
      "study_id": "nmdc:sty-11-r2h77870",
      "depth": 0.1,
      "env_broad_scale_id": "ENVO:00000446",
      "env_local_scale_id": "ENVO:00000011",
      "env_medium_id": "ENVO:00005802",
      "longitude": -123.1939,
      "latitude": 44.5876,
      "add_date": "2021-05-01T00:00:00",
      "mod_date": "2021-05-01T00:00:00",
      "collection_date": "2020-09-03T00:00:00",
      "ecosystem": "Environmental",
      "ecosystem_category": "Terrestrial",
      "ecosystem_type": "Soil",
      "ecosystem_subtype": "Botanical garden",
      "specific_ecosystem": "Bulk soil",
      "open_in_gold": null,
      "env_broad_scale": {
        "id": "ENVO:00000446",
        "label": "terrestrial biome",
        "url": "http://purl.obolibrary.org/obo/ENVO:00000446",
        "data": {}
      },
      "env_local_scale": {
        "id": "ENVO:00000011",
        "label": "garden",
        "url": "http://purl.obolibrary.org/obo/ENVO:00000011",
        "data": {}
      },
      "env_medium": {
        "id": "ENVO:00005802",
        "label": "bulk soil",
        "url": "http://purl.obolibrary.org/obo/ENVO:00005802",
        "data": {}
      },
      "env_broad_scale_terms": [
        "entity",
        "continuant",
        "independent continuant",
        "material entity",
        "biome",
        "terrestrial biome",
        "environmental system",
        "astronomical body part",
        "environmental system determined by a quality",
        "ecosystem",
        "terrestrial ecosystem",
        "system"
      ],
      "env_local_scale_terms": [
        "entity",
        "continuant",
        "independent continuant",
        "material entity",
        "geographic feature",
        "anthropogenic geographic feature",
        "garden",
        "astronomical body part"
      ],
      "env_medium_terms": [
        "entity",
        "continuant",
        "independent continuant",
        "material entity",
        "soil",
        "bulk soil",
        "environmental material",
        "astronomical body part"
      ],
      "emsl_biosample_identifiers": [],
      "omics_processing": [
        {
          "id": "nmdc:omprc-11-7w7kd053",
          "name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
          "description": "",
          "alternate_identifiers": [],
          "annotations": {
            "type": "nmdc:NucleotideSequencing",
            "omics_type": "Metagenome",
            "instrument_name": "Illumina NovaSeq 6000",
            "analyte_category": "Metagenome",
            "ncbi_project_name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
            "principal_investigator": "Mitchel Doktycz",
            "processing_institution": "JGI"
          },
          "study_id": "nmdc:sty-11-r2h77870",
          "biosample_inputs": [
            {
              "id": "nmdc:bsm-11-dd8yd668",
              "name": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA - BESC-388-Co2_54_7 soil",
              "description": "Bulk soil microbial communities from poplar common garden site in Corvallis, Oregon, USA",
              "alternate_identifiers": [
                "gold:Gb0291628",
                "img.taxon:3300053075"
              ],
              "annotations": {
                "ph": 6.48,
                "elev": 62,
                "type": "nmdc:Biosample",
                "zinc": "4.2708 mg/kg",
                "depth": {
                  "type": "nmdc:QuantityValue",
                  "has_unit": "m",
                  "has_raw_value": "0.1",
                  "has_numeric_value": 0.1
                },
                "lbceq": "1531.2 ppm",
                "calcium": "2044.05 mg/kg",
                "habitat": "Soil",
                "lat_lon": "44.5876 -123.1939",
                "location": "USA",
                "magnesium": "398.928 mg/kg",
                "manganese": "18.3765 mg/kg",
                "potassium": "184.092 mg/kg",
                "samp_name": "BESC-388-Co2_54_7 soil",
                "tot_nitro": "0.327 Percent",
                "lbc_thirty": "528 ppm",
                "geo_loc_name": "USA: Oregon",
                "samp_taxon_id": "soil metagenome [NCBITaxon:410658]",
                "collected_from": "nmdc:frsite-11-wdy01683",
                "nitrate_nitrogen": "0.463 mg/kg",
                "nitrite_nitrogen": "0 mg/kg",
                "ammonium_nitrogen": "0 mg/kg",
                "ncbi_taxonomy_name": "soil metagenome",
                "sample_collection_site": "Bulk Soil"
              },
              "study_id": "nmdc:sty-11-r2h77870",
              "depth": 0.1,
              "env_broad_scale_id": "ENVO:00000446",
              "env_local_scale_id": "ENVO:00000011",
              "env_medium_id": "ENVO:00005802",
              "longitude": -123.1939,
              "latitude": 44.5876,
              "add_date": "2021-05-01T00:00:00",
              "mod_date": "2021-05-01T00:00:00",
              "collection_date": "2020-09-03T00:00:00",
              "ecosystem": "Environmental",
              "ecosystem_category": "Terrestrial",
              "ecosystem_type": "Soil",
              "ecosystem_subtype": "Botanical garden",
              "specific_ecosystem": "Bulk soil"
            }
          ],
          "add_date": "2021-05-01T00:00:00",
          "mod_date": "2021-05-01T00:00:00",
          "open_in_gold": null,
          "biosample_ids": [],
          "omics_data": [],
          "outputs": []
        }
      ],
      "multiomics": 16
    }
  ]
}

Facet

Find the count of biosamples for each geographic location.

POST https://data-dev.microbiomedata.org/api/biosample/facet

Try it out!

Payload
{
    "conditions": [
        {
            "op": "==",
            "field": "principal_investigator_name",
            "value": "Mitchel J. Doktycz",
            "table": "study"
        },
        {
            "op": "==",
            "field": "env_broad_scale",
            "value": "terrestrial biome",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "env_medium",
            "value": "bulk soil",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "omics_type",
            "value": "Metagenome",
            "table": "omics_processing"
        },
        {
            "op": "==",
            "field": "processing_institution",
            "value": "JGI",
            "table": "omics_processing"
        }
    ],
    "attribute": "geo_loc_name"
}
Response
{
  "facets": {
    "USA: Oregon": 108,
    "USA: Oregon, Clatskanie": 1,
    "USA: Tennessee": 10
  }
}

Binned Facet

Find the counts of biosamples collected in each month

POST https://data-dev.microbiomedata.org/api/biosample/binned_facet

Try it out!

Payload
{
    "attribute": "collection_date",
    "conditions": [
        {
            "op": "==",
            "field": "principal_investigator_name",
            "value": "Mitchel J. Doktycz",
            "table": "study"
        },
        {
            "op": "==",
            "field": "env_broad_scale",
            "value": "terrestrial biome",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "env_medium",
            "value": "bulk soil",
            "table": "biosample"
        },
        {
            "op": "==",
            "field": "omics_type",
            "value": "Metagenome",
            "table": "omics_processing"
        },
        {
            "op": "==",
            "field": "processing_institution",
            "value": "JGI",
            "table": "omics_processing"
        }
    ],
    "resolution": "month"
}
Response
{
  "facets": [
    10,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    109
  ],
  "bins": [
    "2014-08-01T00:00:00",
    "2014-09-01T00:00:00",
    "2014-10-01T00:00:00",
    "2014-11-01T00:00:00",
    "2014-12-01T00:00:00",
    "2015-01-01T00:00:00",
    "2015-02-01T00:00:00",
    "2015-03-01T00:00:00",
    "2015-04-01T00:00:00",
    "2015-05-01T00:00:00",
    "2015-06-01T00:00:00",
    "2015-07-01T00:00:00",
    "2015-08-01T00:00:00",
    "2015-09-01T00:00:00",
    "2015-10-01T00:00:00",
    "2015-11-01T00:00:00",
    "2015-12-01T00:00:00",
    "2016-01-01T00:00:00",
    "2016-02-01T00:00:00",
    "2016-03-01T00:00:00",
    "2016-04-01T00:00:00",
    "2016-05-01T00:00:00",
    "2016-06-01T00:00:00",
    "2016-07-01T00:00:00",
    "2016-08-01T00:00:00",
    "2016-09-01T00:00:00",
    "2016-10-01T00:00:00",
    "2016-11-01T00:00:00",
    "2016-12-01T00:00:00",
    "2017-01-01T00:00:00",
    "2017-02-01T00:00:00",
    "2017-03-01T00:00:00",
    "2017-04-01T00:00:00",
    "2017-05-01T00:00:00",
    "2017-06-01T00:00:00",
    "2017-07-01T00:00:00",
    "2017-08-01T00:00:00",
    "2017-09-01T00:00:00",
    "2017-10-01T00:00:00",
    "2017-11-01T00:00:00",
    "2017-12-01T00:00:00",
    "2018-01-01T00:00:00",
    "2018-02-01T00:00:00",
    "2018-03-01T00:00:00",
    "2018-04-01T00:00:00",
    "2018-05-01T00:00:00",
    "2018-06-01T00:00:00",
    "2018-07-01T00:00:00",
    "2018-08-01T00:00:00",
    "2018-09-01T00:00:00",
    "2018-10-01T00:00:00",
    "2018-11-01T00:00:00",
    "2018-12-01T00:00:00",
    "2019-01-01T00:00:00",
    "2019-02-01T00:00:00",
    "2019-03-01T00:00:00",
    "2019-04-01T00:00:00",
    "2019-05-01T00:00:00",
    "2019-06-01T00:00:00",
    "2019-07-01T00:00:00",
    "2019-08-01T00:00:00",
    "2019-09-01T00:00:00",
    "2019-10-01T00:00:00",
    "2019-11-01T00:00:00",
    "2019-12-01T00:00:00",
    "2020-01-01T00:00:00",
    "2020-02-01T00:00:00",
    "2020-03-01T00:00:00",
    "2020-04-01T00:00:00",
    "2020-05-01T00:00:00",
    "2020-06-01T00:00:00",
    "2020-07-01T00:00:00",
    "2020-08-01T00:00:00",
    "2020-09-01T00:00:00",
    "2020-10-01T00:00:00"
  ]
}