Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very basic slurping pipeline: initial design #37

Closed
matentzn opened this issue Jul 17, 2022 · 5 comments · Fixed by #36
Closed

Very basic slurping pipeline: initial design #37

matentzn opened this issue Jul 17, 2022 · 5 comments · Fixed by #36
Assignees

Comments

@matentzn
Copy link
Member

The purpose of this ticket is to create a very basic pipeline for slurping new terms into Mondo where they do not already exist. In terms of process, I think this would be a good first start. I created a draft PR with pseudo code here:

#36

Basically, we:

  1. Look at all unmapped terms T
  2. If all of the parents of T are mapped, designate for slurping (we only slurp if parents are already slurped, iteratively
  3. Extract basic information about T and export as ROBOT template
@joeflack4
Copy link
Contributor

@matentzn Thanks for the skeleton PR. I rebased from main and pushed, so if you want to check it out you might want to just delete your slurpdraft branch and check out my new one. I also pushed a small commit.

I understand (1).

I understand 1/2 of (2). Basically, I can designate it, but where/how to do that concretely in the form of an output, I'm not sure what you'd like. Perhaps it is related to (3).

I'm not sure about (3). PR #32 has a TSV called a 'robot template' w/ the columns term_id and exclusion_reason. I'm looking at the documentation for robot --template, but I feel like this may lead me down the wrong track. Though, perhaps it's as simple as me creating a TSV with a column ID or term_id with all of the 'slurpable' candidates. I'm just wondering what I should do with the terms with parents that we haven't slurped? Ought I not include them in this output or some other output? I can also include LABEL and TYPE columns as mentioned in the docs, but maybe ID alone is enough for this?

@matentzn
Copy link
Member Author

matentzn commented Jul 26, 2022

Hey @joeflack4 great you start working on this :)

(2) - in the source ontology, say OMIM, you iterate through all terms X (OMIM:123). If and only if All parents of X (all Y's for which X subClassOf Y) are already in Mondo (there is a mapping for it in the sssom file), proceed. Else, ignore X for now (it will be picked up by the next run!

(3) A robot template is simply a table that can be turned into OWL. I am thinking of this:

ID LABEL Definition Parents xref
ID LABEL A IAO:0000115 SC % SPLIT=| A oboInOwl:hasDbXref
MONDO:123 Fake disease Fake disease is a deadly disease with a fake definition MONDO:12|MONDO:13 OMIM:998

This table I can then load straight into Mondo.

@joeflack4
Copy link
Contributor

joeflack4 commented Jul 26, 2022

(2) Just ignore them for now then, got it!

(3) Ok, great; that's just what I was looking for. Just curious about the robot templates.
Q1: Does it look for specific column headers by name or by order of columns? Ah, I see now. Is this the reason for me adding {'term_id': 'ID' ... }? I suppose robot knows to look for this mapping of alternative column names.
Q2: I only see ID and LABEL in the robot template docs. Does this mean the other ones still need to be added to the docs, or are they not yet an official part of this feature?
Q3: Do you want your row 2 in this markdown table to appear exactly as-is as row 2 in my output? The one with values ID, LABEL, A IAO:0000115, SC % SPLIT=, ?

@matentzn
Copy link
Member Author

Q2

No, everything is in the ROBOT template docs that should be.. Only look at the values in that dict of yours. ID, LABEL and TYPE are the only ones that are built-in, the rest is just freely configurable to the user using template strings.

@matentzn
Copy link
Member Author

Q3

There was a rendering error in the table above which I have now fixed. Yeah, sure, we can start with that exact table and then move our way up.

@joeflack4 joeflack4 linked a pull request Aug 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants