Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicate MIxS terms #555

Closed
ssarrafan opened this issue Dec 2, 2021 · 22 comments
Closed

Indicate MIxS terms #555

ssarrafan opened this issue Dec 2, 2021 · 22 comments
Assignees
Milestone

Comments

@ssarrafan
Copy link

Indicate when a term is from MIxS

Related to #448

@ssarrafan ssarrafan added this to the Sprint 9 milestone Dec 2, 2021
@ssarrafan
Copy link
Author

Brandon, please add T-Shirt size and any questions. Team is trying to get an idea of the time/effort to do this.

@subdavis
Copy link
Contributor

subdavis commented Dec 6, 2021

Please provide information/documentation about how to know if a term comes from MIxS. Is this consumable from code, or will we need to maintain own mappings? Where should we be looking for this info?

Thanks.

@pvangay
Copy link

pvangay commented Dec 6, 2021

@ssarrafan this should be reassigned to Mark M. and Bill. Mark is currently working with Montana to identify sources of each term (where they came from) for display on the metadata submission interface. Once that information is available, hopefully Bill can include that information in the schema so that Brandon can also display it via the portal.

@ssarrafan
Copy link
Author

@pvangay ok I've assigned this to Bill and Mark. Is the expectation that this will be done this month or can I move this to the January sprint?

@pvangay
Copy link

pvangay commented Dec 6, 2021

good question for @turbomam

@cmungall
Copy link
Contributor

cmungall commented Dec 7, 2021

there are a variety of ways to programmatically extract this from the schema

I can advise but need more information about the overall dataflow. I am assuming for UI purposes you will want a ready-made json blob containing all metadata about a field including source, description, hyperlinks for more info etc. Our libs for doing this are python but we can easily precompile json for you.

@subdavis
Copy link
Contributor

subdavis commented Dec 7, 2021

I'm already consuming nmdc-schema repo as a git submodule, so any JSON file that exists in that repo is something I can grab and use. Other kinds of data (xml, yaml) would probably also be OK, but JSON is preferable.

@ssarrafan ssarrafan modified the milestones: Sprint 9, Sprint 10 Jan 4, 2022
@wdduncan
Copy link

@subdavis The mixs are in the mixs.yaml located in the directory here:
https://github.com/microbiomedata/nmdc-schema/tree/main/src/schema

You can load the yaml directly yourself and convert to json or I can add a util to do this. What do you prefer?

@ssarrafan
Copy link
Author

@wdduncan can this issue be closed?

@wdduncan
Copy link

@ssarrafan I do not know. What do you think @subdavis ?

@turbomam
Copy link
Member

turbomam commented Jan 31, 2022

Sorry, I'm late to the game.

Where should it be indicated that a term comes from MIxS?

If a term is to be used in the NMDC DataHarmonizer, it will be marked with a disposition of borrowed or use as-is on the mixs_packages_x_slots tab of Soil-NMDC-Template_Compiled

Slots/columns that are modification of a MIxS slot appear in mixs_modified_slots

I will be proposing a new structure for this Google Sheet soon, so some of that may become moot.

In terms of how the terms appear in DataHarmonizer, that will be determined by the section column in those two sheets. I believe @mslarae13 is assigning the MIxS as-is, borrowed and modified terms to DH section whose names will indicate which terms "come from" MIxS. @sujaypatil96 and I are working on the section assignment now.

@subdavis
Copy link
Contributor

  • if possible, it would be great to have a json version of https://github.com/microbiomedata/nmdc-schema/blob/main/src/schema/mixs.yaml in the schema. A util function is only useful if the resulting json will be checked in. If a util is added that I need to run myself, it would be easier for me to convert the file myself.
  • I'm not totally clear on how this should look in the data portal. Where should this be indicated in the portal?

Should be a small amount of effort. Also, I won't be able to directly map lat_lon to latitude and longitude so we should talk about what sort of interventions are needed for edge cases like that.

@ssarrafan
Copy link
Author

Based on the recent comments I will move this one to February. @turbomam and @wdduncan let me know if it should be in the backlog or assigned to someone else.

@ssarrafan ssarrafan modified the milestones: Sprint 10, Sprint 11 Feb 1, 2022
@wdduncan
Copy link

wdduncan commented Feb 1, 2022

@subdavis I can create a json file on the github repo, or you can convert it yourself. Just let me know which prefer.
As for the lat_lon issue, I don't know what best solution is. @dehays perhaps we can discuss this at the metadata meeting. @subdavis It would be helpful if you could attend the meeting too.

@mslarae13
Copy link

I am really late to this game! But saw the message in slack & checked this out.
Is this for the data harmonizer or read the docs / schema definitions?

@wdduncan
Copy link

wdduncan commented Feb 3, 2022

See work discussed in this ticket microbiomedata/nmdc-schema#252

@turbomam
Copy link
Member

There are roughly 100 elements in src/schema/mixs.yaml that already have an in_subset, like environment for elev

what are the consequences of assigning more than one subset?

Syntactically, in_subset is multivalued

@ssarrafan
Copy link
Author

@wdduncan and @turbomam can we close this issue? Seems like work is being tracked under the nmdc-schema#252?

@ssarrafan
Copy link
Author

Removed @wdduncan per his note.

HI Set. Here is an update:
microbiomedata/nmdc-schema#134
This is an ongoing issue that will need to be passed on to Mark or Sujay (not sure which).
microbiomedata/nmdc-runtime#89
I updated the comment on this. I should be able to get the change sheet edit done before I leave.
microbiomedata/nmdc-schema#195
I am working with Sujay on this. I should be able to close before the week’s end.
microbiomedata/nmdc-runtime#46
This is an ongoing issue that will need to be passed on to Mark or Sujay (not sure which).
#555
This is a lot of conversation in this thread. But, it looks like Mark has taken this one over.

@turbomam
Copy link
Member

turbomam commented Apr 11, 2022

I think (but haven't proven yet) that all MIxS slots in https://github.com/microbiomedata/nmdc-schema/blob/issue-291-mixs-submod/src/schema/mixs_6_for_nmdc.yaml are annotated as follows:

from_schema: http://w3id.org/mixs/terms

Having said that,

I'm pretty sure that those annotations appear in @sujaypatil96's new-ish gen-linkml JSON, but I haven't confirmed yet.

@turbomam
Copy link
Member

But the from_schema may change after further imports/merges.

Yes, switch to source

@turbomam
Copy link
Member

Solution in microbiomedata/nmdc-schema#292

closing this issue in anticipation of a merge in May 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants