Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store object_id with links if available #57

Merged
merged 12 commits into from
Oct 1, 2023
Merged

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Dec 24, 2022

Fix #54

This PR updates the storage of links/references to add the following information:

  • object_id: Object id of the reference object. May be None in case the referenced object does not have and assigned object_id (e.g., in the case we reference a dataset with a fixed name but without and assigned data_type (or neurodata_type in the case of NWB).
  • source_object_id: Object id of the source Zarr file indicated by the source key. The source should always have an object_id (at least if the source file is a valid HDMF formatted file).

TODO:

  • Updated the ZarrReference class to add a source_object_id and object_id keys
  • Updated ZARRIO.__get_ref to populate the source_object_id and object_id keys
  • Updated the storage documentation to document the source_object_id and object_id keys and update examples
  • Update CHANGELOG
  • Update tests

tests/unit/test_io_zarr.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2022

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (e31f6a3) 85.66% compared to head (84240fb) 85.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev      #57      +/-   ##
==========================================
+ Coverage   85.66%   85.76%   +0.09%     
==========================================
  Files          13       13              
  Lines        3139     3189      +50     
==========================================
+ Hits         2689     2735      +46     
- Misses        450      454       +4     
Files Coverage Δ
src/hdmf_zarr/backend.py 90.55% <100.00%> (+0.14%) ⬆️
src/hdmf_zarr/utils.py 96.49% <100.00%> (+0.69%) ⬆️
tests/unit/base_tests_zarrio.py 98.54% <100.00%> (+0.04%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@oruebel oruebel added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Jan 6, 2023
@oruebel oruebel added this to the Next Release milestone Jan 6, 2023
@oruebel
Copy link
Contributor Author

oruebel commented Sep 28, 2023

@mavaylon1 we should check whether this PR now also works with the fixes in #120 It would be nice we could include this in the release as well if it works. Otherwise, its fine to move this PR to the next release, but would be nice to push this over the finish line.

@oruebel
Copy link
Contributor Author

oruebel commented Oct 1, 2023

Aside from adding/updating unit tests to check that the values for the object_id and source_object_id fields are correct, this PR should be ready.

@oruebel
Copy link
Contributor Author

oruebel commented Oct 1, 2023

adding/updating unit tests to check that the values for the object_id and source_object_id fields are correct

Done

@oruebel oruebel marked this pull request as ready for review October 1, 2023 12:23
@oruebel oruebel requested a review from mavaylon1 October 1, 2023 12:23
@mavaylon1
Copy link
Contributor

@oruebel I wanted to as about the case when the object_id is None. You said that would be the case when the data_type/neuro_datatype is not assigned for a dataset. I was thinking of an example of what that would be. Say we have TimeSeries, which is a dataset. This has a type, but it also contains a dataset "data" that does not. This would be an example where we have "object_id" as none if that was the target correct?

@oruebel
Copy link
Contributor Author

oruebel commented Oct 1, 2023

. This has a type, but it also contains a dataset "data" that does not. This would be an example where we have "object_id" as none if that was the target correct?

Correct. For TimeSeries.data the object_id would be None because it is just a dataset within a type. However, the source_object_id should always be present since the file is always represented by a Container. The object_id and source_object_id are not really being used right now, but will be useful to validate links and possibly in the future to be able to retrieve external links dynamically.

@mavaylon1 mavaylon1 merged commit 0be6b04 into dev Oct 1, 2023
20 checks passed
@mavaylon1 mavaylon1 deleted the enh/add_oid_to_link_format branch October 1, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save object id's as part of links and references
3 participants