API access to MongoDB collections #1070

turbomam · 2023-08-15T23:41:45Z

import pprint

import requests


class FastAPIClient:
    def __init__(self, base_url):
        self.base_url = base_url

    def _make_request(self, method, endpoint, params=None, data=None):
        url = f"{self.base_url}/{endpoint}"
        response = requests.request(method, url, params=params, json=data)
        response.raise_for_status()
        return response.json()

    def get_paginated_data(self, endpoint, params, results_key='resources', continuation_key='next_page_token',
                           continuation_parameter='page_token'):
        params = params or {}
        data = []

        while True:
            response = self._make_request('GET', endpoint, params=params)
            data.extend(response[results_key])

            if continuation_key in response:
                params[continuation_parameter] = response[continuation_key]
            else:
                break

        return data


if __name__ == "__main__":
    client_base_url = "https://api.microbiomedata.org"
    endpoint_name = "nmdcschema/study_set"
    params_string = {
        "max_page_size": 20
    }

    client = FastAPIClient(client_base_url)
    paginated_data = client.get_paginated_data(endpoint=endpoint_name, params=params_string)
    pprint.pprint(paginated_data)

turbomam · 2023-08-16T14:09:32Z

trace this requirement back the following targets in project.Makefile

dump-validate-report-mongodb: mongodb-cleanup accepting_legacy_ids_all \
local/mongodb-collection-report.txt \
local/selected_mongodb_contents.json \
local/selected_mongodb_contents_jsonschema_check.txt \
linkml-validate-mongodb \
local/selected_mongodb_contents.json.gz

dump-validate-report-convert-mongodb: mongodb-cleanup \
local/selected_mongodb_contents_fully_repaired.yaml \
local/selected_mongodb_contents_fully_repaired.yaml.gz \
local/selected_mongodb_contents_fully_repaired.ttl \
local/selected_mongodb_contents_fully_repaired.ttl.gz

turbomam · 2023-08-16T14:10:43Z

Which start with the mongodb_exporter CLI, which is defined as follows by pyproject.toml

mongodb_exporter = "nmdc_schema.mongodb_direct_to_nmdc_Database_file:export_to_yaml"

turbomam · 2023-08-16T14:11:24Z

We will be using methods from https://api.microbiomedata.org/docs

turbomam · 2023-08-16T14:16:54Z

There doesn't seem to be a get collection names method. May still need to get that from a direct MongoDB connection from now, which generally requires a NERSC ssh key, a NERSC tunnel, and MongoDB credentials.

turbomam · 2023-08-16T14:18:18Z

Could also use mongodump or mongoexport commands. Would still require assembling the JSON files into LinkML style JSON, even if it isn't validated "yet"

turbomam · 2023-08-16T14:19:54Z

object orientation:

Python dataclass?
Pydantic?

turbomam · 2023-08-22T13:05:59Z

Implemented in nmdc_schema/mongo_dump_api_emph.py from branch issue-1070-content-from-mongo

that script can't currently get functional_annotation_agg via the API and defaults back to PyMongo.
that script still uses PyMongo to get collection names and estimated sizes. @dwinston recently enabled a API solution for this and I should swtich.

eecavanna · 2023-09-05T21:23:40Z

that script still uses PyMongo to get collection names and estimated sizes. @dwinston recently enabled a API solution for this and I should swtich.

Here's a link to the PR in the nmdc-runtime repo, in which that API solution was introduced: microbiomedata/nmdc-runtime#287

Here's a link to the API endpoint on Swagger UI (in production): https://api.microbiomedata.org/docs#/metadata/get_nmdc_database_collection_stats_nmdcschema_collection_stats_get

eecavanna · 2023-09-05T21:25:05Z

Issue cleanup note:

Update Issue title to be more actionable; e.g. "Use API to get MongoDB collection names".

aclum · 2023-11-01T18:17:48Z

Anything left to do here?

turbomam · 2023-11-01T18:23:05Z

I want to update mongo_dump_api_emph.py so that it can get per-collection document counts from https://api.microbiomedata.org/nmdcschema/collection_stats

I think there is an issue for that already, but I haven't found it yet. When I do I will link it here and close this issue.

turbomam · 2023-11-01T18:31:59Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API access to MongoDB collections #1070

API access to MongoDB collections #1070

turbomam commented Aug 15, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 22, 2023

eecavanna commented Sep 5, 2023

eecavanna commented Sep 5, 2023

aclum commented Nov 1, 2023

turbomam commented Nov 1, 2023

turbomam commented Nov 1, 2023 •

edited

Loading

API access to MongoDB collections #1070

API access to MongoDB collections #1070

Comments

turbomam commented Aug 15, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 16, 2023

turbomam commented Aug 22, 2023

eecavanna commented Sep 5, 2023

eecavanna commented Sep 5, 2023

aclum commented Nov 1, 2023

turbomam commented Nov 1, 2023

turbomam commented Nov 1, 2023 • edited Loading

turbomam commented Nov 1, 2023 •

edited

Loading