API endpoint to link data objects with upstream collections #355

aclum · 2023-11-03T20:34:28Z

Running list of difficult Mongo queries

https://docs.google.com/spreadsheets/d/1a9cN9ZDyjVOp6NtHiaUlpP_92sMInZtQV-Q-L5iWcOk/edit?usp=sharing

Please add/edit edit Google Sheet

In working on the jupyter notebooks and fielding user requests we need some endpoints that make it easier to combine study or biosample filter with workflow execution activities and/or data objects.

See related issue #246

Example requests from jupyter notebook work or users requests related to linking data objects

USER REQUEST: all the filtered data for a study_id (start with a study_id or name filter on study_set, traverse classes (either in mongo or rdf) to data objects, return only data_object_set where ({'data_object_type':'Filtered Sequencing Reads'}) (see Mock up an API endpoint that, given a Study.id and DataObject.data_object_type, returns the relevant DataObjects #401)
EXAMPLE QUESTION FOR JUPYTER NOTEBOOKS: for a study id filter, return all metadata biosample_set and data_object_set records where ('{data_object_type':'Scaffold Lineage tsv'}), return what biosample a data object can from
USER (IMG) REQUEST: for a study return all data objects from a workflow execution activity (ie MetagenomeAnnotationActivity(API endpoint for returning data object records needed for IMG import #246)
NEEDED FOR INGEST FROM NMDC-> IMG return relationship between nmdc biosample, gold biosample and omics_processing_set records. @turbomam wrote a prototype sprql query for this for neon here.

Example searches for supporting API search to match data portal queries:
-Return studies and biosamples from study X that have a processing institution of Y (return study based on ({'id':'nmdc:styX'}), return biosamples based on ({'part_of':'nmdc:styX'}) AND tracing through PlannedProcess classes to determine which Biosamples or ProcessedSamples derived from Biosamples are the values for has_input for class OmicsProcessing where({'processing_institution':'Y'})
-Return studies and biosamples where the annotation results have a hit to 'KEGG.ORTHOLOGY:K00005'. Implementation sketch would search for 'KEGG.ORTHOLOGY:K00005' in functional_annotation_agg slot gene_function_id, get the metagenome_annotation_id, trace back from that to the WorkflowExecutionActivity -> OmicsProcessing-> PlannedProcess Classes -> Biosample/ProcessedSample -> Loop until getting back to a Biosample -> Study

@shreddd to determine if this can be worked on in the next month in advance of the webinar with NEON.

cc @cmungall @brynnz22 @kheal

The text was updated successfully, but these errors were encountered:

aclum · 2023-11-07T00:52:36Z

This work is part of FY24 roadmap milestone Milestone - Add support for all queries available in the data portal available via the public API (4.8)#496

dwinston · 2023-12-14T20:00:04Z

after discussion with @PeopleMakeCulture , I'd like to generate a derived single collection that allows graph-like queries across all current collections' documents in a more streamlined (less complicated mongo aggregation queries) manner.

dwinston · 2023-12-14T20:01:18Z

@jeffbaumes the approach here may benefit frok the work you did to take the data portal's postgres tables to mongo. If you have any ideas here, please note them.

PeopleMakeCulture · 2023-12-27T21:08:38Z

sample query from @brynnz22: https://github.com/microbiomedata/notebook_hackathons/blob/soil-contig-tax/taxonomic_dist_by_soil_layer/python/mongodb_query.txt.js

db.getCollection("biosample_set").aggregate(
    [
        { $match: { 'soil_horizon': { '$in': ['O horizon', 'M horizon'] } } },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1
            }
        },
        {
            $lookup:
            {
                from: "pooling_set",
                localField: "id",
                foreignField: "has_input",
                as: "pooling_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1
            }
        },
        {
            $lookup:
            {
                from: "processed_sample_set",
                localField: "pooling_set.has_output",
                foreignField: "id",
                as: "processed_sample_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1
            }
        },
        {
            $lookup:
            {
                from: "extraction_set",
                localField: "processed_sample_set.id",
                foreignField: "has_input",
                as: "extraction_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1
            }
        },
        {
            $lookup:
            {
                from: "processed_sample_set",
                localField: "extraction_set.has_output",
                foreignField: "id",
                as: "processed_sample_set2"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1
            }
        },
        {
            $lookup:
            {
                from: "library_preparation_set",
                localField: "processed_sample_set2.id",
                foreignField: "has_input",
                as: "library_preparation_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1,
                "library_preparation_set.has_input": 1,
                "library_preparation_set.has_output": 1,
                "library_preparation_set.id": 1
            }
        },
        {
            $lookup:
            {
                from: "processed_sample_set",
                localField: "library_preparation_set.has_output",
                foreignField: "id",
                as: "processed_sample_set3"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1,
                "library_preparation_set.has_input": 1,
                "library_preparation_set.has_output": 1,
                "library_preparation_set.id": 1,
                "processed_sample_set3.id": 1
            }
        },
        {
            $lookup:
            {
                from: "omics_processing_set",
                localField: "processed_sample_set3.id",
                foreignField: "has_input",
                as: "omics_processing_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1,
                "library_preparation_set.has_input": 1,
                "library_preparation_set.has_output": 1,
                "library_preparation_set.id": 1,
                "processed_sample_set3.id": 1,
                "omics_processing_set.has_input": 1,
                "omics_processing_set.id": 1
            }
        },
        {
            $lookup:
            {
                from: "metagenome_annotation_activity_set",
                localField: "omics_processing_set.id",
                foreignField: "was_informed_by",
                as: "metagenome_annotation_activity_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1,
                "library_preparation_set.has_input": 1,
                "library_preparation_set.has_output": 1,
                "library_preparation_set.id": 1,
                "processed_sample_set3.id": 1,
                "omics_processing_set.has_input": 1,
                "omics_processing_set.id": 1,
                "metagenome_annotation_activity_set.was_informed_by": 1,
                "metagenome_annotation_activity_set.has_output": 1
            }
        },
        {
            $lookup:
            {
                from: "data_object_set",
                localField: "metagenome_annotation_activity_set.has_output",
                foreignField: "id",
                as: "data_object_set"
            }
        },
        {
            $project: {
                "id": 1,
                "soil_horizon": 1,
                "pooling_set.has_input": 1,
                "pooling_set.has_output": 1,
                "processed_sample_set.id": 1,
                "extraction_set.has_input": 1,
                "extraction_set.has_output": 1,
                "extraction_set.id": 1,
                "processed_sample_set2.id": 1,
                "library_preparation_set.has_input": 1,
                "library_preparation_set.has_output": 1,
                "library_preparation_set.id": 1,
                "processed_sample_set3.id": 1,
                "omics_processing_set.has_input": 1,
                "omics_processing_set.id": 1,
                "metagenome_annotation_activity_set.was_informed_by": 1,
                "metagenome_annotation_activity_set.has_output": 1,
                "data_object_set.id": 1,
                "data_object_set.data_object_type": "Scaffold Lineage tsv",
                "data_object_set.url": 1
            }
        }
    ]
)

aclum · 2024-01-16T20:57:40Z

See work done by @brynnz22 and @kheal in this notebook to connect study to taxonomic taxonomic information. https://github.com/microbiomedata/notebook_hackathons/tree/main/taxonomic_dist_by_soil_layer
cc @cmungall @shreddd

PeopleMakeCulture · 2024-02-15T16:18:26Z

Duplicates #401

aclum · 2024-08-28T23:38:25Z

The needs here have not been addressed. In order to address this we need

filtering options besides just providing a study
be able to to return more complete intermediate records.
cc @shreddd

eecavanna mentioned this issue Nov 23, 2023

Mock up an API endpoint that, given a Study.id and DataObject.data_object_type, returns the relevant DataObjects #401

Closed

PeopleMakeCulture added the enhancement New feature or request label Nov 28, 2023

dwinston added this to Polyneme mixset Dec 4, 2023

dwinston moved this to Bench in Polyneme mixset Dec 4, 2023

dwinston moved this from Bench to Lineup in Polyneme mixset Dec 4, 2023

dwinston moved this from Lineup to At bat in Polyneme mixset Dec 14, 2023

dwinston moved this from At bat to Lineup in Polyneme mixset Dec 14, 2023

dwinston self-assigned this Dec 14, 2023

aclum mentioned this issue Dec 22, 2023

Make a list of Mongo queries people find difficult to write microbiomedata/nmdc-schema#1604

Closed

PeopleMakeCulture self-assigned this Dec 27, 2023

PeopleMakeCulture mentioned this issue Jan 3, 2024

API endpoint for returning data object records needed for IMG import #246

Closed

PeopleMakeCulture moved this from Lineup to At bat in Polyneme mixset Jan 20, 2024

PeopleMakeCulture moved this from At bat to Lineup in Polyneme mixset Jan 20, 2024

kheal mentioned this issue Feb 2, 2024

CI/CD should include tests for endpoints used in example notebooks #456

Closed

3 tasks

dwinston moved this from Lineup to At bat in Polyneme mixset Feb 9, 2024

PeopleMakeCulture moved this from At bat to On base in Polyneme mixset Feb 13, 2024

PeopleMakeCulture moved this from On base to At bat in Polyneme mixset Feb 15, 2024

PeopleMakeCulture closed this as completed Feb 15, 2024

github-project-automation bot moved this from At bat to Scored in Polyneme mixset Feb 15, 2024

aclum reopened this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API endpoint to link data objects with upstream collections #355

API endpoint to link data objects with upstream collections #355

aclum commented Nov 3, 2023 •

edited by PeopleMakeCulture

Loading

aclum commented Nov 7, 2023 •

edited

Loading

dwinston commented Dec 14, 2023

dwinston commented Dec 14, 2023

PeopleMakeCulture commented Dec 27, 2023

aclum commented Jan 16, 2024

PeopleMakeCulture commented Feb 15, 2024

aclum commented Aug 28, 2024 •

edited

Loading

API endpoint to link data objects with upstream collections #355

API endpoint to link data objects with upstream collections #355

Comments

aclum commented Nov 3, 2023 • edited by PeopleMakeCulture Loading

Running list of difficult Mongo queries

aclum commented Nov 7, 2023 • edited Loading

dwinston commented Dec 14, 2023

dwinston commented Dec 14, 2023

PeopleMakeCulture commented Dec 27, 2023

aclum commented Jan 16, 2024

PeopleMakeCulture commented Feb 15, 2024

aclum commented Aug 28, 2024 • edited Loading

aclum commented Nov 3, 2023 •

edited by PeopleMakeCulture

Loading

aclum commented Nov 7, 2023 •

edited

Loading

aclum commented Aug 28, 2024 •

edited

Loading