Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping #96

Merged
merged 9 commits into from
Mar 8, 2024

Conversation

nayib-jose-gloria
Copy link
Collaborator

@nayib-jose-gloria nayib-jose-gloria commented Mar 6, 2024

Reason for Change

Changes

  • refactored ancestry pre-processing mapping to 1) get distance from term_id for each ancestor (to support determining LCA, term distances, highest level term ancestor). This required no longer using owlready2's built-in ancestors() mapping, as it did not provide distances.
  • implemented methods to 1) get distance between term IDs, 2) get LCA between term IDs, 3) get all matching high-level terms for term_ids given a curated list (this extends to input tissue_general, cell class, cell subclass, etc.), and 4) get highest level matching term for term_ids given a curated list

Notes for Reviewer

  • TODO: implement list input equivalent for LCA

…ode + implement functions to support curated list term mapping
nayib-jose-gloria and others added 5 commits March 7, 2024 08:42
# Conflicts:
#	api/python/src/cellxgene_ontology_guide/ontology_parser.py
#	api/python/tests/test_ontology_parser.py
#	artifact-schemas/all_ontology_schema.json
#	tools/ontology-builder/src/all_ontology_generator.py
# Conflicts:
#	api/python/src/cellxgene_ontology_guide/ontology_parser.py
#	api/python/tests/test_ontology_parser.py
#	artifact-schemas/all_ontology_schema.json
#	tools/ontology-builder/src/all_ontology_generator.py
ancestors: Dict[str, int] = dict()
queue = [(onto_class, 1)]
while queue:
term, distance = queue.pop(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to mentions how we are using a list as a queue and popping the first term.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels self-explanatory by the code, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a little extra to grep. Not required but might reduce cognitive load.

it will format the id from the form CL_xxxx to CL:xxxx
Returns a list of unique ancestor ontology term ids of the given onto class. Only returns those belonging to
ontology_name, it will format the id from the form CL_xxxx to CL:xxxx. Ancestors are returned in ascending order
of distance from the given term.

:param owlready2.entity.ThingClass onto_class: the class for which ancestors will be retrieved
:param str onto_name: only ancestors from this ontology will be kept

:rtype List[str]
:return list of ancestors (term ids), it could be empty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the return type

queue.append((parent, distance + 1))
ancestors[parent_name] = distance

# filter out ancestors that are not from the ontology we are currently processing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could this be done in the for loop? What are the edge cases that prevent this?

Copy link
Collaborator Author

@nayib-jose-gloria nayib-jose-gloria Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doable but adds some complexity. The issue is, we still want to traverse the ancestors for non-ontology terms because, as it turns out, some non-ontology terms have ancestors that are a part of the ontology. For instance, the root node of EFO has 5 direct children, none of which are EFO terms. But, they all have EFO children. So, we still need to traverse these 5 non-EFO children to get to the root node and resolve the ontology graph. But we don't want to count them as ancestors. We also want to make sure we don't traverse them again after they were already visited.

We could add a second "visited_but_not_counted" set and update it in the for loop, rather than filtering out of the ancestor dict in a second loop. Whichever you prefer. More performant but more complicated loop logic to read

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let go for readability in the builder since it is a run once kinda deal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no changes needed

ancestors_1 = self.cxg_schema.ontology(ontology.name)[term_id_1]["ancestors"] + {term_id_1: 0}
ancestors_2 = self.cxg_schema.ontology(ontology.name)[term_id_2]["ancestors"] + {term_id_2: 0}
common_ancestors = set(ancestors_1.keys()) & set(ancestors_2.keys())
min_sum_distances = min(common_ancestors, key=lambda x: ancestors_1[x] + ancestors_2[x])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The last two lines would be more efficient if you put them in a single for loop rather than looping through the common_ancestors twice. It would mean half as many dictionary lookups.

@@ -96,7 +188,8 @@ def get_terms_descendants(self, term_ids: List[str], include_self: bool = False)
for ontology in ontology_names:
for candidate_descendant, candidate_metadata in self.cxg_schema.ontology(ontology).items():
for ancestor_id in descendants_dict:
if ancestor_id in candidate_metadata["ancestors"]:
ancestors = candidate_metadata["ancestors"].keys()
if ancestor_id in ancestors:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You don't need to make this change. It should work with the original implementation.

@nayib-jose-gloria nayib-jose-gloria marked this pull request as ready for review March 7, 2024 23:01
@nayib-jose-gloria nayib-jose-gloria merged commit 7fc3562 into main Mar 8, 2024
4 checks passed
@nayib-jose-gloria nayib-jose-gloria deleted the nayib/curated-list-mapper-api branch March 8, 2024 13:15
@github-actions github-actions bot mentioned this pull request Mar 8, 2024
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.1...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* lint errors
([f5e4583](f5e4583))
* python api releases
([bf0477e](bf0477e))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.1...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* lint errors
([f5e4583](f5e4583))
* python api releases
([bf0477e](bf0477e))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.2...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot mentioned this pull request Mar 15, 2024
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>python-api: 0.1.0</summary>

##
[0.1.0](python-api-v0.0.2...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* automate testpypi releases
([#118](#118))
([b5a1a66](b5a1a66))
* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* update requirements
([#114](#114))
([9888f3d](9888f3d))
</details>

<details><summary>ontology-assets: 0.1.0</summary>

##
[0.1.0](ontology-assets-v0.0.1...ontology-assets-v0.1.0)
(2024-03-15)


### Features

* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* **release:** generate descendant mapping for tissues and cells
([#100](#100))
([841fddf](841fddf))
* remove all-ontology.json.gz
([83fefd6](83fefd6))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))


### Misc

* update ontology decendant mappings
([#117](#117))
([48451af](48451af))


### BugFixes

* lint errors
([f5e4583](f5e4583))
* Schema format and validation fixes.
([#113](#113))
([0465ee7](0465ee7))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Apr 9, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>assets: 0.4.0</summary>

##
[0.4.0](assets-v0.3.0...assets-v0.4.0)
(2024-04-09)


### Features

* add function to fetch curated ontology term lists
([#141](#141))
([5c7db62](5c7db62))
* fetch ontology term descriptions, if available
([#181](#181))
([0120377](0120377))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* **release:** generate descendant mapping for tissues and cells
([#100](#100))
([841fddf](841fddf))
* remove all-ontology.json.gz
([83fefd6](83fefd6))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* upload assets on release
([#56](#56))
([84a1c5d](84a1c5d))


### Misc

* deprecate older version of cellxgene schema
([#172](#172))
([186e762](186e762))
* move curated lists to ontology-assets
([#48](#48))
([77916df](77916df))
* moving the generated artifacts
([c03c8e3](c03c8e3))
* release main
([#130](#130))
([0b37dc8](0b37dc8))
* release main
([#146](#146))
([4ca76f0](4ca76f0))
* release main
([#185](#185))
([9b2fe53](9b2fe53))
* release main
([#74](#74))
([e748fe9](e748fe9))
* release tsmith/release-assets
([63b782d](63b782d))
* release tsmith/release-assets
([#57](#57))
([6a6b02a](6a6b02a))
* update ontology decendant mappings
([#117](#117))
([48451af](48451af))
* update ontology decendant mappings
([#142](#142))
([fb23618](fb23618))
* update ontology decendant mappings
([#162](#162))
([12def74](12def74))
* update ontology descendant mappings
([#167](#167))
([5d3d097](5d3d097))
* update ontology descendant mappings
([#180](#180))
([65ca10f](65ca10f))


### BugFixes

* lint errors
([f5e4583](f5e4583))
* Schema format and validation fixes.
([#113](#113))
([0465ee7](0465ee7))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants