-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate compound dtypes, TimeSeriesReferenceVectorData
#588
Comments
Thanks for the detailed issue. Just a couple of quick comments to provide a little bit of additional background.
The concept of referencing |
ha i knew i should have made this two separate issues. my mistake. the reason i linked the two things together here is mostly that the goal here is to think about ways that we can collapse the number of schema language constructs to simplify implementation/increase interoperability/reduce maintenance burden, so to take account of the ways that references can be made:
any others? So it seems like we currently have separate implementations for what amount to two fundamental reference operations:
currently several of the reference types are done by convention in the HDMF and pynwb implementation - not strictly based in the schema, but well defined nonetheless - could we refactor the notion of references in the schema lang to simplify them? It seems like HDF5 supports region references natively, and otherwise they can be serialized as a So then rather than having a specific This simplifies the various types of references that exist in a way that could be compatible with existing API, be grounded in the schema, and have a single method of serialization in non-hdf5 formats.
So then compound dtypes can be removed, and Currently I am not exactly sure how to implement a As usual i'm probably way off here, but just trying to brainstorm how we might simplify this bc it's definitely been the most complex part of implementing the format, and i'm thinking about the future of a very interoperable NWB, and ease of implementing bridges/adapters is key to that <3
aha, i wasn't sure this was how it was supposed to be since
thanks for the historical note, makes sense and i always love hearing about how the needs and constraints were approached given the state of the format at the time <3 |
implementing these now alongside dynamictables and it seems like these are mostly artifacts of history atp that persist bc of the relative rarity of intracellular ephys data nowadays.
Usage
compound dtypes are used 4 times in the schema:
TimeSeriesReferenceVectorData
:nwb-schema/core/nwb.base.yaml
Line 9 in 63ac845
ElectrodeGroup.position
:nwb-schema/core/nwb.ecephys.yaml
Line 243 in 63ac845
PlaneSegmentation.pixel_mask
: nwb-schema/core/nwb.ophys.yaml
Line 156 in 63ac845
PlaneSegmentation.voxel_mask
:nwb-schema/core/nwb.ophys.yaml
Line 175 in 63ac845
TimeSeriesReferenceVectorData
is used in 4 places (but actually 2)TimeIntervals.timeseries
:nwb-schema/core/nwb.epoch.yaml
Line 25 in 63ac845
IntracellularRecordingsTable
IntracellularStimuliTable.stimulus
:nwb-schema/core/nwb.icephys.yaml
Line 287 in 63ac845
IntracellularStimuliTable.stimulus_template
:nwb-schema/core/nwb.icephys.yaml
Line 290 in 63ac845
IntracellularResponsesTable.response
:nwb-schema/core/nwb.icephys.yaml
Line 304 in 63ac845
Purpose
The function of compound dtypes are to group multiple values together into something that behaves like a tuple.
The function of
TimeSeriesReferenceVectorData
is to be a 1-d vector into arbitrary contiguous spans in arbitrary other datasets.Degeneracy
compound dtypes are equivalently well-served by datasets with multiple attributes. There should be a single
Position
type and it should havex, y, z
as attributes. the hdmf/nwb schema language is actually relatively uniquely able to do this with its*_type_inc
that behaves both like a template inclusion as well as inheritance.TimeSeriesReferenceVectorData
is equivalent toDynamicTableRegion
in functionality - Both are capable of representing ragged indexes into arbitrary objects.Specifically this:
is functionally equivalent to this
with slightly different indexing semantics, and other more complex examples could make them entirely equivalent. This is especially true since
DynamicTableRegion
can also have aVectorIndex
so it is a ragged array of indices into another table.Problem
The many ways that it is possible to express references is the single greatest point of complexity in NWB and the nwb schema. it's possible to do a more or less linear translation of nwb schema language and nwb schema into other systems like linkml with the exception of references. Currently
TimeSeriesReferenceVectorData
is not even working in pynwb as far as i can tell, as an intracellular ephys response table just returns the literal HDMF dataset within a dataframe rather than the values of that dataset.Relatively few backends support the idea of compound dtypes, causing problems like this where the special-cased compound dtype cause a tool-breaking perf problem. we should probably expect that to persist and hold us back if we want this format to survive into the future, since most other schema systems would just have a means of representing compound dtypes as a type with multiple attributes.
It seems to me like this was an older idea that has been surpassed by the dynamictable system, and keeping both around when they do the same thing probably multiplies the labor cost of maintaining the format by a nontrivial factor, as well as limits the number of contemporary formats it can be translated into. ik y'all are strapped for time but i think this would be one set of deprecations that would be net-positive for y'all in terms of "complex stuff for ya to deal with" <3
The text was updated successfully, but these errors were encountered: