Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TermSetWrapper and write support #950

Merged
merged 74 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
205f763
working concept
mavaylon1 Aug 29, 2023
c9a89cc
minor cleaning
mavaylon1 Aug 29, 2023
7980937
foo file
mavaylon1 Aug 29, 2023
5f02860
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 29, 2023
7154ac5
checkpoint
mavaylon1 Sep 6, 2023
a419902
Merge branch 'wrapper' of https://github.com/hdmf-dev/hdmf into wrapper
mavaylon1 Sep 6, 2023
561e279
checkpoint
mavaylon1 Sep 6, 2023
f677647
Update src/hdmf/utils.py
mavaylon1 Sep 6, 2023
a63fe06
clean up
mavaylon1 Sep 6, 2023
6e7bbc6
checkpoint
mavaylon1 Sep 6, 2023
a0fdb24
tests placeholders
mavaylon1 Sep 6, 2023
e7034de
checkpoint
mavaylon1 Sep 8, 2023
afe5dd5
placeholder
mavaylon1 Sep 11, 2023
92bf180
placeholder
mavaylon1 Sep 11, 2023
2c8d6da
placeholder
mavaylon1 Sep 11, 2023
b698f5e
working write and herd
mavaylon1 Sep 11, 2023
1b7b3d5
cleanup
mavaylon1 Sep 11, 2023
c2c53a1
checkpoint on updating append
mavaylon1 Sep 11, 2023
4c513e5
integrate append
mavaylon1 Sep 11, 2023
c257b02
Merge branch 'dev' into wrapper
mavaylon1 Sep 11, 2023
d870133
test checkpoint
mavaylon1 Sep 18, 2023
104a7aa
test checkpoint
mavaylon1 Sep 19, 2023
ae6655a
test fixes
mavaylon1 Sep 19, 2023
86d5aa8
termset tests
mavaylon1 Sep 19, 2023
5bf83de
termset tests
mavaylon1 Sep 19, 2023
4f5d833
termset tests
mavaylon1 Sep 19, 2023
abbf12a
checkpoint/remove field_name
mavaylon1 Sep 26, 2023
e0864e8
cleanup
mavaylon1 Sep 26, 2023
7534c0d
make sure things pass without bad tests
mavaylon1 Sep 26, 2023
ab64a7d
cleanup
mavaylon1 Sep 26, 2023
bcc69d7
temp fix for test
mavaylon1 Sep 26, 2023
a872b59
termset tutorial
mavaylon1 Sep 26, 2023
d1c987e
tests and bug fix on write
mavaylon1 Sep 27, 2023
da1b006
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 27, 2023
03a51cf
tests and bug fix on write
mavaylon1 Sep 27, 2023
8fa4a9a
Merge branch 'wrapper' of https://github.com/hdmf-dev/hdmf into wrapper
mavaylon1 Sep 27, 2023
0e5f96e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 27, 2023
0fe2fb8
ruff
mavaylon1 Sep 27, 2023
8899b77
Merge branch 'wrapper' of https://github.com/hdmf-dev/hdmf into wrapper
mavaylon1 Sep 27, 2023
b29a691
bug fix
mavaylon1 Sep 27, 2023
b3ac0a4
doc
mavaylon1 Sep 27, 2023
4556c2a
doc
mavaylon1 Sep 27, 2023
b87c323
Update test_docval.py
mavaylon1 Sep 27, 2023
c60a68b
tests
mavaylon1 Sep 27, 2023
8d11383
tests
mavaylon1 Sep 27, 2023
8718dae
tests
mavaylon1 Sep 27, 2023
9c28957
Update utils.py
mavaylon1 Sep 28, 2023
80c1b3e
Update utils.py
mavaylon1 Sep 28, 2023
f14efdf
Update utils.py
mavaylon1 Sep 28, 2023
83bf3b8
ryan feedback
mavaylon1 Sep 28, 2023
2aa51f6
Update src/hdmf/build/objectmapper.py
mavaylon1 Sep 28, 2023
622fcc1
Update docs/gallery/plot_term_set.py
mavaylon1 Sep 28, 2023
14fddb0
Update docs/gallery/plot_term_set.py
mavaylon1 Sep 28, 2023
661b958
Update docs/gallery/plot_term_set.py
mavaylon1 Sep 28, 2023
d910334
Update docs/gallery/plot_term_set.py
mavaylon1 Sep 28, 2023
4c7610f
tutorial
mavaylon1 Sep 28, 2023
6879676
Update CHANGELOG.md
mavaylon1 Sep 28, 2023
6191238
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 28, 2023
33ba2a5
test next
mavaylon1 Sep 28, 2023
b3895d9
Merge branch 'wrapper' of https://github.com/hdmf-dev/hdmf into wrapper
mavaylon1 Sep 28, 2023
753468f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 28, 2023
8db5c2d
format
mavaylon1 Sep 28, 2023
9a37ecf
format
mavaylon1 Sep 28, 2023
f2504e3
validation changes
mavaylon1 Sep 28, 2023
6d38277
Update tests/unit/test_term_set.py
rly Sep 28, 2023
b783ace
clean up
mavaylon1 Sep 28, 2023
be3a17c
Update io.py
rly Sep 28, 2023
f1732ac
Update CHANGELOG.md
rly Sep 28, 2023
5d67899
tuple change
mavaylon1 Sep 28, 2023
91fef88
Merge branch 'wrapper' of https://github.com/hdmf-dev/hdmf into wrapper
mavaylon1 Sep 28, 2023
2fc236f
Update tests/unit/test_term_set.py
rly Sep 28, 2023
317865c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 28, 2023
dc1d868
Update src/hdmf/term_set.py
rly Sep 28, 2023
3536940
test feedback
mavaylon1 Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## HDMF 3.9.1 (Upcoming)

### Enhancements
- Updated `TermSet` to be used with `TermSetWrapper`, allowing for general use of validation for datasets and attributes. This also brings updates to `HERD` integration and updates on `write` to easily add references for wrapped datasets/attributes. @mavaylon1 [#950](https://github.com/hdmf-dev/hdmf/pull/950)

### Minor improvements
- Removed warning when namespaces are loaded and the attribute marking where the specs are cached is missing. @bendichter [#926](https://github.com/hdmf-dev/hdmf/pull/926)

Expand Down
110 changes: 59 additions & 51 deletions docs/gallery/plot_term_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
=======

This is a user guide for interacting with the
:py:class:`~hdmf.term_set.TermSet` class. The :py:class:`~hdmf.term_set.TermSet` type
is experimental and is subject to change in future releases. If you use this type,
:py:class:`~hdmf.term_set.TermSet` and :py:class:`~hdmf.term_set.TermSetWrapper` classes.
The :py:class:`~hdmf.term_set.TermSet` and :py:class:`~hdmf.term_set.TermSetWrapper` types
are experimental and are subject to change in future releases. If you use these types,
please provide feedback to the HDMF team so that we can improve the structure and
overall capabilities.

Expand All @@ -14,15 +15,18 @@
set of terms from brain atlases, species taxonomies, and anatomical, cell, and
gene function ontologies.

:py:class:`~hdmf.term_set.TermSet` serves two purposes: data validation and external reference
management. Users will be able to validate their data to their own set of terms, ensuring
Users will be able to validate their data and attributes to their own set of terms, ensuring
clean data to be used inline with the FAIR principles later on.
The :py:class:`~hdmf.term_set.TermSet` class allows for a reusable and sharable
pool of metadata to serve as references to any dataset.
The :py:class:`~hdmf.term_set.TermSet` class allows for a reusable and sharable
pool of metadata to serve as references for any dataset or attribute.
The :py:class:`~hdmf.term_set.TermSet` class is used closely with
:py:class:`~hdmf.common.resources.ExternalResources` to more efficiently map terms
to data. Please refer to the tutorial on ExternalResources to see how :py:class:`~hdmf.term_set.TermSet`
is used with :py:class:`~hdmf.common.resources.ExternalResources`.
:py:class:`~hdmf.common.resources.HERD` to more efficiently map terms
to data.

In order to actually use a :py:class:`~hdmf.term_set.TermSet`, users will use the
:py:class:`~hdmf.term_set.TermSetWrapper` to wrap data and attributes. The
:py:class:`~hdmf.term_set.TermSetWrapper` uses a user-provided :py:class:`~hdmf.term_set.TermSet`
to perform validation.

:py:class:`~hdmf.term_set.TermSet` is built upon the resources from LinkML, a modeling
language that uses YAML-based schema, giving :py:class:`~hdmf.term_set.TermSet`
Expand Down Expand Up @@ -68,7 +72,7 @@
import linkml_runtime # noqa: F401
except ImportError as e:
raise ImportError("Please install linkml-runtime to run this example: pip install linkml-runtime") from e
from hdmf.term_set import TermSet
from hdmf.term_set import TermSet, TermSetWrapper

try:
dir_path = os.path.dirname(os.path.abspath(__file__))
Expand Down Expand Up @@ -114,71 +118,75 @@
terms['Homo sapiens']

######################################################
# Validate Data with TermSet
# Validate Data with TermSetWrapper
# ----------------------------------------------------
# :py:class:`~hdmf.term_set.TermSet` has been integrated so that :py:class:`~hdmf.container.Data` and its
# subclasses support a term_set attribute. By having this attribute set, the data will be validated
# and all new data will be validated.
# :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around data.
# To validate data, the user will set the data to the wrapped data, in which validation must pass
# for the data object to be created.
data = VectorData(
name='species',
description='...',
data=['Homo sapiens'],
term_set=terms)
data=TermSetWrapper(value=['Homo sapiens'], termset=terms)
)

######################################################
# Validate on append with TermSet
# Validate Attributes with TermSetWrapper
# ----------------------------------------------------
# As mentioned prior, when the term_set attribute is set, then all new data is validated. This is true for both
# append and extend methods.
# Similar to wrapping datasets, :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around any attribute.
# To validate attributes, the user will set the attribute to the wrapped value, in which validation must pass
# for the object to be created.
data = VectorData(
name='species',
description=TermSetWrapper(value='Homo sapiens', termset=terms),
data=['Human']
)

######################################################
# Validate on append with TermSetWrapper
# ----------------------------------------------------
# As mentioned prior, when using a :py:class:`~hdmf.term_set.TermSetWrapper`, all new data is validated.
# This is true for adding new data with append and extend.
data = VectorData(
name='species',
description='...',
data=TermSetWrapper(value=['Homo sapiens'], termset=terms)
)

data.append('Ursus arctos horribilis')
data.extend(['Mus musculus', 'Myrmecophaga tridactyla'])

######################################################
# Validate Data in a DynamicTable with TermSet
# Validate Data in a DynamicTable
# ----------------------------------------------------
# Validating data with :py:class:`~hdmf.common.table.DynamicTable` is determined by which columns were
# initialized with the term_set attribute set. The data is validated when the columns are created or
# modified. Since adding the columns to a DynamicTable does not modify the data, validation is
# not being performed at that time.
# Validating data for :py:class:`~hdmf.common.table.DynamicTable` is determined by which columns were
# initialized with a :py:class:`~hdmf.term_set.TermSetWrapper`. The data is validated when the columns
# are created and modified using ``DynamicTable.add_row``.
col1 = VectorData(
name='Species_1',
description='...',
data=['Homo sapiens'],
term_set=terms,
data=TermSetWrapper(value=['Homo sapiens'], termset=terms),
)
col2 = VectorData(
name='Species_2',
description='...',
data=['Mus musculus'],
term_set=terms,
data=TermSetWrapper(value=['Mus musculus'], termset=terms),
)
species = DynamicTable(name='species', description='My species', columns=[col1,col2])

######################################################
# Validate new rows in a DynamicTable with TermSet
# ----------------------------------------------------
##########################################################
# Validate new rows in a DynamicTable with TermSetWrapper
# --------------------------------------------------------
# Validating new rows to :py:class:`~hdmf.common.table.DynamicTable` is simple. The
# :py:func:`~hdmf.common.table.DynamicTable.add_row` method will automatically check each column for a
# :py:class:`~hdmf.term_set.TermSet` (via the term_set attribute). If the attribute is set, the the data will be
# validated for that column using that column's :py:class:`~hdmf.term_set.TermSet`. If there is invalid data, the
# :py:class:`~hdmf.term_set.TermSetWrapper`. If a wrapper is being used, then the data will be
# validated for that column using that column's :py:class:`~hdmf.term_set.TermSet` from the
# :py:class:`~hdmf.term_set.TermSetWrapper`. If there is invalid data, the
# row will not be added and the user will be prompted to fix the new data in order to populate the table.
species.add_row(Species_1='Mus musculus', Species_2='Mus musculus')

######################################################
# Validate new columns in a DynamicTable with TermSet
# ----------------------------------------------------
# As mentioned prior, validating in a :py:class:`~hdmf.common.table.DynamicTable` is determined
# by the columns. The :py:func:`~hdmf.common.table.DynamicTable.add_column` method has a term_set attribute
# as if you were making a new instance of :py:class:`~hdmf.common.table.VectorData`. When set, this attribute
# will be used to validate the data. The column will not be added if there is invalid data.
col1 = VectorData(
name='Species_1',
description='...',
data=['Homo sapiens'],
term_set=terms,
)
species = DynamicTable(name='species', description='My species', columns=[col1])
species.add_column(name='Species_2',
description='Species data',
data=['Mus musculus'],
term_set=terms)
#############################################################
# Validate new columns in a DynamicTable with TermSetWrapper
# -----------------------------------------------------------
# To add a column that is validated using :py:class:`~hdmf.term_set.TermSetWrapper`,
# wrap the data in the :py:func:`~hdmf.common.table.DynamicTable.add_column`
# method as if you were making a new instance of :py:class:`~hdmf.common.table.VectorData`.
2 changes: 1 addition & 1 deletion src/hdmf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from .container import Container, Data, DataRegion, HERDManager
from .region import ListSlicer
from .utils import docval, getargs
from .term_set import TermSet
from .term_set import TermSet, TermSetWrapper


@docval(
Expand Down
12 changes: 10 additions & 2 deletions src/hdmf/backends/hdf5/h5tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from ...build import (Builder, GroupBuilder, DatasetBuilder, LinkBuilder, BuildManager, RegionBuilder,
ReferenceBuilder, TypeMap, ObjectMapper)
from ...container import Container
from ...term_set import TermSetWrapper
from ...data_utils import AbstractDataChunkIterator
from ...spec import RefSpec, DtypeSpec, NamespaceCatalog
from ...utils import docval, getargs, popargs, get_data_shape, get_docval, StrDataset
Expand Down Expand Up @@ -63,7 +64,7 @@
'doc': 'a pre-existing h5py.File, S3File, or RemFile object', 'default': None},
{'name': 'driver', 'type': str, 'doc': 'driver for h5py to use when opening HDF5 file', 'default': None},
{'name': 'herd_path', 'type': str,
'doc': 'The path to the HERD', 'default': None},)
'doc': 'The path to read/write the HERD file', 'default': None},)
def __init__(self, **kwargs):
"""Open an HDF5 file for IO.
"""
Expand Down Expand Up @@ -359,7 +360,10 @@
'default': True},
{'name': 'exhaust_dci', 'type': bool,
'doc': 'If True (default), exhaust DataChunkIterators one at a time. If False, exhaust them concurrently.',
'default': True})
'default': True},
{'name': 'herd', 'type': 'HERD',
'doc': 'A HERD object to populate with references.',
'default': None})
def write(self, **kwargs):
"""Write the container to an HDF5 file."""
if self.__mode == 'r':
Expand Down Expand Up @@ -1096,6 +1100,10 @@
data = data.data
else:
options['io_settings'] = {}
if isinstance(data, TermSetWrapper):
# This is for when the wrapped item is a dataset
# (refer to objectmapper.py for wrapped attributes)
data = data.value

Check warning on line 1106 in src/hdmf/backends/hdf5/h5tools.py

View check run for this annotation

Codecov / codecov/patch

src/hdmf/backends/hdf5/h5tools.py#L1106

Added line #L1106 was not covered by tests
attributes = builder.attributes
options['dtype'] = builder.dtype
dset = None
Expand Down
31 changes: 20 additions & 11 deletions src/hdmf/backends/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
{"name": "source", "type": (str, Path),
"doc": "the source of container being built i.e. file path", 'default': None},
{'name': 'herd_path', 'type': str,
'doc': 'The path to the HERD', 'default': None},)
'doc': 'The path to read/write the HERD file', 'default': None},)
def __init__(self, **kwargs):
manager, source, herd_path = getargs('manager', 'source', 'herd_path', kwargs)
if isinstance(source, Path):
Expand Down Expand Up @@ -74,20 +74,29 @@

return container

@docval({'name': 'container', 'type': Container, 'doc': 'the Container object to write'}, allow_extra=True)
@docval({'name': 'container', 'type': Container, 'doc': 'the Container object to write'},
{'name': 'herd', 'type': 'HERD',
'doc': 'A HERD object to populate with references.',
'default': None}, allow_extra=True)
def write(self, **kwargs):
"""Write a container to the IO source."""
container = popargs('container', kwargs)
f_builder = self.__manager.build(container, source=self.__source, root=True)
self.write_builder(f_builder, **kwargs)
herd = popargs('herd', kwargs)

"""Optional: Write HERD."""
if self.herd_path is not None:
herd = container.get_linked_resources()
if herd is not None:
herd.to_zip(path=self.herd_path)
else:
msg = "Could not find linked HERD. Container was still written to IO source."
warn(msg)
# If HERD is not provided, create a new one, else extend existing one
if herd is None:
from hdmf.common import HERD
herd = HERD(type_map=self.manager.type_map)

Check warning on line 90 in src/hdmf/backends/io.py

View check run for this annotation

Codecov / codecov/patch

src/hdmf/backends/io.py#L89-L90

Added lines #L89 - L90 were not covered by tests

# add_ref_term_set to search for and resolve the TermSetWrapper
herd.add_ref_term_set(container) # container would be the NWBFile

Check warning on line 93 in src/hdmf/backends/io.py

View check run for this annotation

Codecov / codecov/patch

src/hdmf/backends/io.py#L93

Added line #L93 was not covered by tests
# write HERD
herd.to_zip(path=self.herd_path)

Check warning on line 95 in src/hdmf/backends/io.py

View check run for this annotation

Codecov / codecov/patch

src/hdmf/backends/io.py#L95

Added line #L95 was not covered by tests

"""Write a container to the IO source."""
f_builder = self.__manager.build(container, source=self.__source, root=True)
self.write_builder(f_builder, **kwargs)

@docval({'name': 'src_io', 'type': 'HDMFIO', 'doc': 'the HDMFIO object for reading the data to export'},
{'name': 'container', 'type': Container,
Expand Down
4 changes: 3 additions & 1 deletion src/hdmf/build/objectmapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .manager import Proxy, BuildManager
from .warnings import MissingRequiredBuildWarning, DtypeConversionWarning, IncorrectQuantityBuildWarning
from ..container import AbstractContainer, Data, DataRegion
from ..term_set import TermSetWrapper
from ..data_utils import DataIO, AbstractDataChunkIterator
from ..query import ReferenceResolver
from ..spec import Spec, AttributeSpec, DatasetSpec, GroupSpec, LinkSpec, RefSpec
Expand Down Expand Up @@ -564,6 +565,8 @@
msg = ("%s '%s' does not have attribute '%s' for mapping to spec: %s"
% (container.__class__.__name__, container.name, attr_name, spec))
raise ContainerConfigurationError(msg)
if isinstance(attr_val, TermSetWrapper):
attr_val = attr_val.value

Check warning on line 569 in src/hdmf/build/objectmapper.py

View check run for this annotation

Codecov / codecov/patch

src/hdmf/build/objectmapper.py#L569

Added line #L569 was not covered by tests
if attr_val is not None:
attr_val = self.__convert_string(attr_val, spec)
spec_dt = self.__get_data_type(spec)
Expand Down Expand Up @@ -937,7 +940,6 @@
if attr_value is None:
self.logger.debug(" Skipping empty attribute")
continue

builder.set_attribute(spec.name, attr_value)

def __set_attr_to_ref(self, builder, attr_value, build_manager, spec):
Expand Down
Loading
Loading