Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split all_ontology into individual files. #93

Merged
merged 8 commits into from
Mar 6, 2024

Conversation

Bento007
Copy link
Collaborator

@Bento007 Bento007 commented Mar 6, 2024

Reason for Change

Changes

  • add sematic_versioning as a requirement for all_ontology build and API. It is used to find the latest version of cellxgene_schema.
  • modify tools/ontology-builder/src/all_ontology_generator.py to generate a separate file for each ontology in ontology_info.json. By default all_ontology_generator.py generates json.gz file for the latest cellxgene_schema version.
  • update artifact-schemas/ontology_info_schema.json to support split ontology.json.gz formats
    • move ontologyCategory to the top level in artifact-schemas/all_ontology_schema.json. This is because each ontology is save into isn't own file, so the terms do not need to be nested under their ontology name.
  • update artifact-schemas/all_ontology_schema.json to support multiple cellxgene schema versions
    • nest ontology_info.json properties under a Cellxgene Schema version. This allows us to support multiple schema versions from ontology_info.json file.
      • It is used by tools/ontology-builder/src/all_ontology_generator.py to generate the ontology artifacts for the current schema version.
      • It is used by api/python/src/cellxgene_ontology_guide/supported_versions.py to get the schema version and determine what ontology versions to use.
  • update api/python/src/cellxgene_ontology_guide/ontology_parser.py to use CXGSchema to access ontology terms.
  • add api/python/src/cellxgene_ontology_guide/supported_versions.py to the ontologies associated with a schema version.
    • add caching of loaded ontology files to avoid excessive unzipping of ontology files.

Testing steps

  • Add unit tests for all new functionality
  • Updated unit tests for existing functionality.
  • Verified multiple ontology assets are created
  • Verified the API can used the multiple assets.

Notes for Reviewer

  • can artifact_download.py be removed?

@Bento007 Bento007 force-pushed the tsmith/split-all-ontology branch from 9334ae2 to 3916a1d Compare March 6, 2024 05:53
@Bento007 Bento007 force-pushed the tsmith/split-all-ontology branch from 30d4734 to 1c7c43b Compare March 6, 2024 06:08
Copy link
Collaborator

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of splitting these out into 1 schema per ontology? Technically this schema would allow for efo-ontology-v1.json, for example, to have CL terms in it, which should not be the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds overly verbose and prescriptive of the expected format. If there was actually a difference in the shape of the json files or how we generate them, I'd say split them up.

@@ -2,7 +2,7 @@

PACKAGE_ROOT = os.path.dirname(os.path.realpath(__file__))
DATA_ROOT = os.path.join(PACKAGE_ROOT, "data")
ALL_ONTOLOGY_FILENAME = "all_ontology.json.gz"
ONTOLOGY_FILENAME_SUFFIX = ".json.gz"
ONTOLOGY_INFO_FILENAME = "ontology_info.json"
ONTOLOGY_ASSET_RELEASE_URL = "https://github.com/chanzuckerberg/cellxgene-ontology-guide/releases/download"
SCHEMA_VERSION_TO_ONTOLOGY_ASSET_TAG = {"5.0.0": "ontology-assets-v0.0.1"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this mapping now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing ONTOLOGY_ASSET_RELEASE_URL, and SCHEMA_VERSION_TO_ONTOLOGY_ASSET_TAG

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep this if we want to support loading files from non-packaged versions, and just say that requires a network call. But, that's not a requirement for MVP so we can get rid of it for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing for now. Less code to maintain

class CXGSchema:
"""A class to represent the ontology information used by a cellxgene schema version."""

def __init__(self, version: Optional[str] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a docstring explaining that version will default to latest, packaged version if not provided as input.

from entities import Ontology, OntologyFileType, OntologyVariant

from cellxgene_ontology_guide.supported_versions import CXGSchema


class OntologyParser:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on refactoring this class to also take an ontology_name, so we can load the ontology file to read from in init and not have to request that as an input in every method? i.e. they can initialize cl_parser = OntologyParser("5.0.0", "CL")

We can also take care of this in a follow-up PR, if you prefer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that as a follow up

@Bento007 Bento007 merged commit ead59e5 into main Mar 6, 2024
4 of 5 checks passed
@Bento007 Bento007 deleted the tsmith/split-all-ontology branch March 6, 2024 18:41
@github-actions github-actions bot mentioned this pull request Mar 6, 2024
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.1...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* lint errors
([f5e4583](f5e4583))
* python api releases
([bf0477e](bf0477e))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.1...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* lint errors
([f5e4583](f5e4583))
* python api releases
([bf0477e](bf0477e))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](python-api-v0.0.2...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* update requirements
([#114](#114))
([9888f3d](9888f3d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot mentioned this pull request Mar 15, 2024
Bento007 pushed a commit that referenced this pull request Mar 15, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>python-api: 0.1.0</summary>

##
[0.1.0](python-api-v0.0.2...python-api-v0.1.0)
(2024-03-15)


### Features

* add data to the python package
([#87](#87))
([0eb6831](0eb6831))
* add is_valid_term_id method to OntologyParser
([#115](#115))
([72c2073](72c2073))
* include license file with python package
([#85](#85))
([2be3d81](2be3d81))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* Support getting download link for ontology from source repo
([#86](#86))
([fd55b76](fd55b76))


### Misc

* automate testpypi releases
([#118](#118))
([b5a1a66](b5a1a66))
* clean-up ontology_parser single fetch and bulk fetch methods + account
for acceptable non-ontology terms
([#112](#112))
([2ef7435](2ef7435))
* **deps-dev:** bump semantic-version from 2.8.5 to 2.10.0 in
/api/python
([#98](#98))
([dfe0b39](dfe0b39))


### BugFixes

* imports for api
([4cd3386](4cd3386))
* update requirements
([#114](#114))
([9888f3d](9888f3d))
</details>

<details><summary>ontology-assets: 0.1.0</summary>

##
[0.1.0](ontology-assets-v0.0.1...ontology-assets-v0.1.0)
(2024-03-15)


### Features

* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* **release:** generate descendant mapping for tissues and cells
([#100](#100))
([841fddf](841fddf))
* remove all-ontology.json.gz
([83fefd6](83fefd6))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))


### Misc

* update ontology decendant mappings
([#117](#117))
([48451af](48451af))


### BugFixes

* lint errors
([f5e4583](f5e4583))
* Schema format and validation fixes.
([#113](#113))
([0465ee7](0465ee7))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bento007 pushed a commit that referenced this pull request Apr 9, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>assets: 0.4.0</summary>

##
[0.4.0](assets-v0.3.0...assets-v0.4.0)
(2024-04-09)


### Features

* add function to fetch curated ontology term lists
([#141](#141))
([5c7db62](5c7db62))
* fetch ontology term descriptions, if available
([#181](#181))
([0120377](0120377))
* load GH Release Assets for schema version in memory
([#72](#72))
([58bad0a](58bad0a))
* refactor ancestry mapping to include distance from descendant node +
implement functions to support curated list term mapping
([#96](#96))
([7fc3562](7fc3562))
* refer to ontology source filenames in ontology_info and return that in
get_ontology_download_url
([#106](#106))
([ff9d826](ff9d826))
* **release:** generate descendant mapping for tissues and cells
([#100](#100))
([841fddf](841fddf))
* remove all-ontology.json.gz
([83fefd6](83fefd6))
* split all_ontology into individual files.
([#93](#93))
([ead59e5](ead59e5))
* upload assets on release
([#56](#56))
([84a1c5d](84a1c5d))


### Misc

* deprecate older version of cellxgene schema
([#172](#172))
([186e762](186e762))
* move curated lists to ontology-assets
([#48](#48))
([77916df](77916df))
* moving the generated artifacts
([c03c8e3](c03c8e3))
* release main
([#130](#130))
([0b37dc8](0b37dc8))
* release main
([#146](#146))
([4ca76f0](4ca76f0))
* release main
([#185](#185))
([9b2fe53](9b2fe53))
* release main
([#74](#74))
([e748fe9](e748fe9))
* release tsmith/release-assets
([63b782d](63b782d))
* release tsmith/release-assets
([#57](#57))
([6a6b02a](6a6b02a))
* update ontology decendant mappings
([#117](#117))
([48451af](48451af))
* update ontology decendant mappings
([#142](#142))
([fb23618](fb23618))
* update ontology decendant mappings
([#162](#162))
([12def74](12def74))
* update ontology descendant mappings
([#167](#167))
([5d3d097](5d3d097))
* update ontology descendant mappings
([#180](#180))
([65ca10f](65ca10f))


### BugFixes

* lint errors
([f5e4583](f5e4583))
* Schema format and validation fixes.
([#113](#113))
([0465ee7](0465ee7))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants