Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TermSetWrapper and write support #950

Merged
merged 74 commits into from
Sep 28, 2023
Merged

TermSetWrapper and write support #950

merged 74 commits into from
Sep 28, 2023

Conversation

mavaylon1
Copy link
Contributor

@mavaylon1 mavaylon1 commented Aug 29, 2023

Motivation

What was the reasoning behind this change? Please explain the changes briefly.

This will introduce the TermSetWrapper class. The point of this class is act similarly to DataIO in which it can be wrapped around objects to provide a TermSet to HDF5 datasets, and attributes.

The wrapper will hold the termset for validation for any dataset or attribute. Some concerns:

  1. This will require major changes in NWB so that docval still passes
  2. We need to be aware of write. The termset wrapper is not supposed to be written and so needs to be unwrapped.
  3. A lot of things will break and edge cases will come up with development. Adjust accordingly.

TODO:

  • Remove prior termset integration from Data.
  • Update old tests that use TermSet: container.py, table.py
  • Add tests for TermSetWrapper append/extend
  • Add tests for TermSetWrapper attributes and data (this is in test_resources)
  • Update tests for resources.py with the TermSetWrapper and new helper functions
  • Add tests for roundtrip/write/write with HERD (Partially Done)
  • Add tests for write_dataset
  • Add parameters on write to: This required changes to HDMFIO (in order to be later ported to HDMF_Zarr), changes to how HERD interacts with TermSet due to the wrapper, updates on add_ref_term_set, and assorted helper functions
  • Consider how to validate when users append new data to VectorData and DynamicTable: Currently, Termsets are integrated as a field on Data. We want to remove this and now only use the wrapper, such that we wrap the data within the VectorData. This requires us to have append and extend in the TermSetWrapper. This will also require changes to add_column and add_row in DynamicTable and add_row in VectorData (all require test changes).
  • Finalize Validation
  • Write compatible
  • Docval compatible
  • Tutorial
  • Explore how the wrapper would work on a dataset that also wants to use DataIO (Created ticket: [Feature]: TermSetWrapper with DataIO #954)

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@mavaylon1 mavaylon1 changed the title working concept TermSetWrapper and write support Aug 29, 2023
@mavaylon1
Copy link
Contributor Author

Fix #928

@mavaylon1 mavaylon1 self-assigned this Sep 6, 2023
src/hdmf/utils.py Outdated Show resolved Hide resolved
@mavaylon1
Copy link
Contributor Author

Order of Operations for appending data:

  1. VectorData
    add_row ->self.append -> append in Data.
    We want to call the self.append in the wrapper if the wrapper is present.

  2. DynamicTable add_row
    We validate before going through the process of adding the data to the rows. This is a bulk check. Once that passes, the column add_row is called, which should work with the wrapper with the changes in VectorData. This does do another validation (not ideal). This can be avoided by adding a field that dictates the data already being validated, but it doesn't really matter: the user doesn't ever interact with this level of validation and it does not add that much time to have a second validation.

  3. DynamicTable add_column
    Can revert back to pre-termset integration.

@mavaylon1
Copy link
Contributor Author

The use of "fields" is not consistent, which makes searching for the wrapper more difficult.

@mavaylon1
Copy link
Contributor Author

The meaning of "attribute" is starting to take on multiple meanings. When we are sloppy we say attribute and field, but that is not always interchangeable. Attribute in HDF5 is not always a field. For example, in data we have a description attribute but the data itself is not, even though you can think of it as a python field. Now that you can wrap hdf5 dataset and hdf5 attributes.

@mavaylon1 mavaylon1 requested a review from rly September 28, 2023 03:21
tests/unit/test_term_set.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Oliver Ruebel <[email protected]>
src/hdmf/term_set.py Outdated Show resolved Hide resolved
tests/unit/test_term_set.py Outdated Show resolved Hide resolved
src/hdmf/term_set.py Outdated Show resolved Hide resolved
tests/unit/test_io_hdf5_h5tools.py Outdated Show resolved Hide resolved
@mavaylon1 mavaylon1 requested a review from rly September 28, 2023 23:14
@rly
Copy link
Contributor

rly commented Sep 28, 2023

image

This is what I like to see. One of the U24 milestones is to reach 95% coverage by year 5 (ending Jan 2026). So we should strive to reach 100% patch coverage when possible.

@mavaylon1 mavaylon1 merged commit e1105e4 into dev Sep 28, 2023
25 of 26 checks passed
@rly rly deleted the wrapper branch October 3, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants