-
Notifications
You must be signed in to change notification settings - Fork 0
Define how the schema should be used to implement search based on metabolites, reactions, or pathways #176
Comments
This is exactly the kind of picture I was thinking of, thanks. Makes sense that these relationships are independent of a particular system like KEGG. Would Organic Matter Classification analysis also link to compound? It's my understanding that those analyses could include many more compounds with different IDs. Similar question for Lipidomics analysis. It's unclear whether all of these analysis types are targeted for Feb. It would be good to start to understand the least-surprise joins that a user would expect upon search. So, for example, would searching by one Genome feature match any megaB analysis containing any Compounds linked to the Function Descriptor of the Genome feature? If you bounce around between joins enough (esp. if they are many to many), you might start capturing much more in your search than you expected. |
@jeffbaumes Yes, both organic matter characterization and lipidomics would follow the same path through that diagram as metabolomics. And Yuri intends to include the same structure in for including compound terms for all three. |
This picture is really helpful, thanks, Chris! For Feb we have discussed metaG, metaT (if we've got processed data), metaP, and metaB since we should have NMDC pipelines for each of those types. For lipidomics and organic matter, we'd like to include the data (with guidance on which data would be useful), but only show the data as being connected to a study. If there are pipelines and appropriate annotation information, then functional links would be great. I don't think it's a priority for February. |
@jeffbaumes It looks like he still working on it, but that document Chris linked has a user story section that you might find useful |
@jeffbaumes @dehays don't worry about the linked document for now, I updated the first comment to include the relevant text |
For orientation: the genome feature class corresponds to the main entry in a GFF3 file, and the link to a descriptor corresponds to col9
See this doc for the source of the image
This shows a generalized annotation schema for functional annotation in NMDC (todo: link to actual schema). It is neutral w.r.t system used. The various subclasses of ControlledTerm are for different aspects of function, and may be covered differently by different systems (see the courier font text to the side of each box). E.g. KEGG has reactions, pathways, compounds, and links between them.
Very rough sketch of some user stories to help us think about how search would be implemented:
The text was updated successfully, but these errors were encountered: