-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping #96
Conversation
…ode + implement functions to support curated list term mapping
# Conflicts: # api/python/src/cellxgene_ontology_guide/ontology_parser.py # api/python/tests/test_ontology_parser.py # artifact-schemas/all_ontology_schema.json # tools/ontology-builder/src/all_ontology_generator.py
# Conflicts: # api/python/src/cellxgene_ontology_guide/ontology_parser.py # api/python/tests/test_ontology_parser.py # artifact-schemas/all_ontology_schema.json # tools/ontology-builder/src/all_ontology_generator.py
ancestors: Dict[str, int] = dict() | ||
queue = [(onto_class, 1)] | ||
while queue: | ||
term, distance = queue.pop(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to mentions how we are using a list as a queue and popping the first term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels self-explanatory by the code, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a little extra to grep. Not required but might reduce cognitive load.
it will format the id from the form CL_xxxx to CL:xxxx | ||
Returns a list of unique ancestor ontology term ids of the given onto class. Only returns those belonging to | ||
ontology_name, it will format the id from the form CL_xxxx to CL:xxxx. Ancestors are returned in ascending order | ||
of distance from the given term. | ||
|
||
:param owlready2.entity.ThingClass onto_class: the class for which ancestors will be retrieved | ||
:param str onto_name: only ancestors from this ontology will be kept | ||
|
||
:rtype List[str] | ||
:return list of ancestors (term ids), it could be empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the return type
queue.append((parent, distance + 1)) | ||
ancestors[parent_name] = distance | ||
|
||
# filter out ancestors that are not from the ontology we are currently processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could this be done in the for
loop? What are the edge cases that prevent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doable but adds some complexity. The issue is, we still want to traverse the ancestors for non-ontology terms because, as it turns out, some non-ontology terms have ancestors that are a part of the ontology. For instance, the root node of EFO has 5 direct children, none of which are EFO terms. But, they all have EFO children. So, we still need to traverse these 5 non-EFO children to get to the root node and resolve the ontology graph. But we don't want to count them as ancestors. We also want to make sure we don't traverse them again after they were already visited.
We could add a second "visited_but_not_counted" set and update it in the for loop, rather than filtering out of the ancestor dict in a second loop. Whichever you prefer. More performant but more complicated loop logic to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let go for readability in the builder since it is a run once kinda deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changes needed
ancestors_1 = self.cxg_schema.ontology(ontology.name)[term_id_1]["ancestors"] + {term_id_1: 0} | ||
ancestors_2 = self.cxg_schema.ontology(ontology.name)[term_id_2]["ancestors"] + {term_id_2: 0} | ||
common_ancestors = set(ancestors_1.keys()) & set(ancestors_2.keys()) | ||
min_sum_distances = min(common_ancestors, key=lambda x: ancestors_1[x] + ancestors_2[x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The last two lines would be more efficient if you put them in a single for loop rather than looping through the common_ancestors twice. It would mean half as many dictionary lookups.
@@ -96,7 +188,8 @@ def get_terms_descendants(self, term_ids: List[str], include_self: bool = False) | |||
for ontology in ontology_names: | |||
for candidate_descendant, candidate_metadata in self.cxg_schema.ontology(ontology).items(): | |||
for ancestor_id in descendants_dict: | |||
if ancestor_id in candidate_metadata["ancestors"]: | |||
ancestors = candidate_metadata["ancestors"].keys() | |||
if ancestor_id in ancestors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You don't need to make this change. It should work with the original implementation.
🤖 I have created a release *beep* *boop* --- ## [0.1.0](python-api-v0.0.1...python-api-v0.1.0) (2024-03-15) ### Features * add data to the python package ([#87](#87)) ([0eb6831](0eb6831)) * add is_valid_term_id method to OntologyParser ([#115](#115)) ([72c2073](72c2073)) * include license file with python package ([#85](#85)) ([2be3d81](2be3d81)) * load GH Release Assets for schema version in memory ([#72](#72)) ([58bad0a](58bad0a)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) * Support getting download link for ontology from source repo ([#86](#86)) ([fd55b76](fd55b76)) ### Misc * clean-up ontology_parser single fetch and bulk fetch methods + account for acceptable non-ontology terms ([#112](#112)) ([2ef7435](2ef7435)) * **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in /api/python ([#98](#98)) ([dfe0b39](dfe0b39)) ### BugFixes * imports for api ([4cd3386](4cd3386)) * lint errors ([f5e4583](f5e4583)) * python api releases ([bf0477e](bf0477e)) * update requirements ([#114](#114)) ([9888f3d](9888f3d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [0.1.0](python-api-v0.0.1...python-api-v0.1.0) (2024-03-15) ### Features * add data to the python package ([#87](#87)) ([0eb6831](0eb6831)) * add is_valid_term_id method to OntologyParser ([#115](#115)) ([72c2073](72c2073)) * include license file with python package ([#85](#85)) ([2be3d81](2be3d81)) * load GH Release Assets for schema version in memory ([#72](#72)) ([58bad0a](58bad0a)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) * Support getting download link for ontology from source repo ([#86](#86)) ([fd55b76](fd55b76)) ### Misc * clean-up ontology_parser single fetch and bulk fetch methods + account for acceptable non-ontology terms ([#112](#112)) ([2ef7435](2ef7435)) * **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in /api/python ([#98](#98)) ([dfe0b39](dfe0b39)) ### BugFixes * imports for api ([4cd3386](4cd3386)) * lint errors ([f5e4583](f5e4583)) * python api releases ([bf0477e](bf0477e)) * update requirements ([#114](#114)) ([9888f3d](9888f3d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [0.1.0](python-api-v0.0.2...python-api-v0.1.0) (2024-03-15) ### Features * add data to the python package ([#87](#87)) ([0eb6831](0eb6831)) * add is_valid_term_id method to OntologyParser ([#115](#115)) ([72c2073](72c2073)) * include license file with python package ([#85](#85)) ([2be3d81](2be3d81)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) * Support getting download link for ontology from source repo ([#86](#86)) ([fd55b76](fd55b76)) ### Misc * clean-up ontology_parser single fetch and bulk fetch methods + account for acceptable non-ontology terms ([#112](#112)) ([2ef7435](2ef7435)) * **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in /api/python ([#98](#98)) ([dfe0b39](dfe0b39)) ### BugFixes * imports for api ([4cd3386](4cd3386)) * update requirements ([#114](#114)) ([9888f3d](9888f3d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- <details><summary>python-api: 0.1.0</summary> ## [0.1.0](python-api-v0.0.2...python-api-v0.1.0) (2024-03-15) ### Features * add data to the python package ([#87](#87)) ([0eb6831](0eb6831)) * add is_valid_term_id method to OntologyParser ([#115](#115)) ([72c2073](72c2073)) * include license file with python package ([#85](#85)) ([2be3d81](2be3d81)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) * Support getting download link for ontology from source repo ([#86](#86)) ([fd55b76](fd55b76)) ### Misc * automate testpypi releases ([#118](#118)) ([b5a1a66](b5a1a66)) * clean-up ontology_parser single fetch and bulk fetch methods + account for acceptable non-ontology terms ([#112](#112)) ([2ef7435](2ef7435)) * **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in /api/python ([#98](#98)) ([dfe0b39](dfe0b39)) ### BugFixes * imports for api ([4cd3386](4cd3386)) * update requirements ([#114](#114)) ([9888f3d](9888f3d)) </details> <details><summary>ontology-assets: 0.1.0</summary> ## [0.1.0](ontology-assets-v0.0.1...ontology-assets-v0.1.0) (2024-03-15) ### Features * load GH Release Assets for schema version in memory ([#72](#72)) ([58bad0a](58bad0a)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * **release:** generate descendant mapping for tissues and cells ([#100](#100)) ([841fddf](841fddf)) * remove all-ontology.json.gz ([83fefd6](83fefd6)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) ### Misc * update ontology decendant mappings ([#117](#117)) ([48451af](48451af)) ### BugFixes * lint errors ([f5e4583](f5e4583)) * Schema format and validation fixes. ([#113](#113)) ([0465ee7](0465ee7)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- <details><summary>assets: 0.4.0</summary> ## [0.4.0](assets-v0.3.0...assets-v0.4.0) (2024-04-09) ### Features * add function to fetch curated ontology term lists ([#141](#141)) ([5c7db62](5c7db62)) * fetch ontology term descriptions, if available ([#181](#181)) ([0120377](0120377)) * load GH Release Assets for schema version in memory ([#72](#72)) ([58bad0a](58bad0a)) * refactor ancestry mapping to include distance from descendant node + implement functions to support curated list term mapping ([#96](#96)) ([7fc3562](7fc3562)) * refer to ontology source filenames in ontology_info and return that in get_ontology_download_url ([#106](#106)) ([ff9d826](ff9d826)) * **release:** generate descendant mapping for tissues and cells ([#100](#100)) ([841fddf](841fddf)) * remove all-ontology.json.gz ([83fefd6](83fefd6)) * split all_ontology into individual files. ([#93](#93)) ([ead59e5](ead59e5)) * upload assets on release ([#56](#56)) ([84a1c5d](84a1c5d)) ### Misc * deprecate older version of cellxgene schema ([#172](#172)) ([186e762](186e762)) * move curated lists to ontology-assets ([#48](#48)) ([77916df](77916df)) * moving the generated artifacts ([c03c8e3](c03c8e3)) * release main ([#130](#130)) ([0b37dc8](0b37dc8)) * release main ([#146](#146)) ([4ca76f0](4ca76f0)) * release main ([#185](#185)) ([9b2fe53](9b2fe53)) * release main ([#74](#74)) ([e748fe9](e748fe9)) * release tsmith/release-assets ([63b782d](63b782d)) * release tsmith/release-assets ([#57](#57)) ([6a6b02a](6a6b02a)) * update ontology decendant mappings ([#117](#117)) ([48451af](48451af)) * update ontology decendant mappings ([#142](#142)) ([fb23618](fb23618)) * update ontology decendant mappings ([#162](#162)) ([12def74](12def74)) * update ontology descendant mappings ([#167](#167)) ([5d3d097](5d3d097)) * update ontology descendant mappings ([#180](#180)) ([65ca10f](65ca10f)) ### BugFixes * lint errors ([f5e4583](f5e4583)) * Schema format and validation fixes. ([#113](#113)) ([0465ee7](0465ee7)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Reason for Change
Changes
Notes for Reviewer