Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimplemented incremental io #501

Merged
merged 28 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5c73c84
implemented incremental io; tests missing
LucaMarconato Mar 21, 2024
b534f6e
added draft for write_metadata(); need to write new tests
LucaMarconato Mar 22, 2024
e7b9433
wip better explanation
LucaMarconato Mar 22, 2024
ebe9f4f
fixed bug wrong order of points columns after spatial query
LucaMarconato Mar 22, 2024
f5f1098
testing the copying of metadata and their inclusion in assert_element…
LucaMarconato Mar 22, 2024
695a1d8
improved readwrite tests
LucaMarconato Mar 22, 2024
a0974ab
added tests for incremental io
LucaMarconato Mar 22, 2024
b429cc9
implemented write_metadata for transformations
LucaMarconato Mar 22, 2024
d5be580
tests for incremental io of transformation, with separate validation for
LucaMarconato Mar 22, 2024
a9cb077
tests for IO and incremental IO of consolidated metadata
LucaMarconato Mar 22, 2024
e13075d
improved control over elements only on-disk/in-memory
LucaMarconato Mar 23, 2024
fbf9e3c
added tests for delete_element_from_disk
LucaMarconato Mar 23, 2024
ff85b2e
fix
LucaMarconato Mar 23, 2024
239e0a7
added _check_element_not_on_disk_with_different_type()
LucaMarconato Mar 23, 2024
6b8069b
updated changelog
LucaMarconato Mar 23, 2024
1bfe022
fixed changelog
LucaMarconato Mar 23, 2024
87dd1a8
attempt fix docs
LucaMarconato Mar 23, 2024
7f2ec2d
Update src/spatialdata/_io/_utils.py
LucaMarconato Mar 27, 2024
bee39fc
fixes from review
LucaMarconato Mar 27, 2024
30a30ff
Merge branch 'feature/incremental_io' of https://github.com/scverse/s…
LucaMarconato Mar 27, 2024
d277eea
Merge branch 'main' into feature/incremental_io
LucaMarconato Mar 27, 2024
57dd22e
update test read write on disk (#515)
ArneDefauw Mar 27, 2024
582622f
Update src/spatialdata/_core/spatialdata.py
LucaMarconato Mar 28, 2024
98037eb
list of names for write_element() and delete_element_from_disk()
LucaMarconato Apr 8, 2024
5ee00c5
Merge branch 'main' into feature/incremental_io
LucaMarconato Apr 8, 2024
f2bea77
improved docs
LucaMarconato Apr 8, 2024
2298752
code review from Giovanni
LucaMarconato Jun 10, 2024
2585216
Merge branch 'main' into feature/incremental_io
LucaMarconato Jun 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 54 additions & 51 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,67 +24,70 @@ and this project adheres to [Semantic Versioning][].

#### Major

- Implemented support in SpatialData for storing multiple tables. These tables can annotate a SpatialElement but not
necessarily so.
- Added SQL like joins that can be executed by calling one public function `join_sdata_spatialelement_table`. The
following joins are supported: `left`, `left_exclusive`, `right`, `right_exclusive` and `inner`. The function has
an option to match rows. For `left` only matching `left` is supported and for `right` join only `right` matching of
rows is supported. Not all joins are supported for `Labels` elements. The elements and table can either exist within
a `SpatialData` object or outside.
- Added function `match_element_to_table` which allows the user to perform a right join of `SpatialElement`(s) with a
table with rows matching the row order in the table.
- Increased in-memory vs on-disk control: changes performed in-memory (e.g. adding a new image) are not automatically
performed on-disk.
- Implemented support in `SpatialData` for storing multiple tables.
- These tables can annotate a `SpatialElement` but now not necessarily so.
- Deprecated `.table` attribute in favor of `.tables` dict-like accessor.

- Added join operations
- Added SQL like joins that can be executed by calling one public function `join_sdata_spatialelement_table`. The following joins are supported: `left`, `left_exclusive`, `right`, `right_exclusive` and `inner`. The function has an option to match rows. For `left` only matching `left` is supported and for `right` join only `right` matching of rows is supported. Not all joins are supported for `Labels` elements.
- Added function `match_element_to_table` which allows the user to perform a right join of `SpatialElement`(s) with a table with rows matching the row order in the table.

- Incremental IO of data and metadata:
- Increased in-memory vs on-disk control: changes performed in-memory (e.g. adding a new image) are not automatically performed on-disk.
- Deprecated `add_image()`, `add_labels()`, `add_shapes()`, `add_points()` in favor of `.images`, `.labels`, `.shapes`, `.points` dict-like accessors.
- new methods `write_element()`, `write_transformations()`, `write_metadata()`, `remove_element_from_disk()`
- new methods `write_consolidated_metadata()` and `has_consolidated_metadata()`
- deprecated `save_transformations()`
- improved `__repr__()` with information on Zarr storage and Dask-backed files
- new utils `is_self_contained()`, `describe_elements_are_self_contained()`
- new utils `element_paths_in_memory()`, `element_paths_on_disk()`

#### Minor

- Added public helper function get_table_keys in spatialdata.models to retrieve annotation information of a given
table.
- Added public helper function check_target_region_column_symmetry in spatialdata.models to check whether annotation
metadata in table.uns['spatialdata_attrs'] corresponds with respective columns in table.obs.
- Added function validate_table_in_spatialdata in SpatialData to validate the annotation target of a table being
present in the SpatialData object.
- Added function get_annotated_regions in SpatialData to get the regions annotated by a given table.
- Added function get_region_key_column in SpatialData to get the region_key column in table.obs.
- Added function get_instance_key_column in SpatialData to get the instance_key column in table.obs.
- Added function set_table_annotates_spatialelement in SpatialData to either set or change the annotation metadata of
a table in a given SpatialData object.
- Added table_name parameter to the aggregate function to allow users to give a custom table name to table resulting
from aggregation.
- Added table_name parameter to the get_values function.
- Added tables property in SpatialData.
- Added tables setter in SpatialData.
- Added gen_spatial_elements generator in SpatialData to generate the SpatialElements in a given SpatialData object.
- Added gen_elements generator in SpatialData to generate elements of a SpatialData object including tables.
- added SpatialData.subset() API
- added SpatialData.locate_element() API
- added utils function: transform_to_data_extent()
- added utils function: are_extents_equal()
- added utils function: postpone_transformation()
- added utils function: remove_transformations_to_coordinate_system()
- added utils function: get_centroids()
- added utils function: deepcopy()
- added operation: to_circles()
- added testing utilities: assert_spatial_data_objects_are_identical(), assert_elements_are_identical(),
assert_elements_dict_are_identical()

### Changed
- Multiple table helper functions
- Added public helper function `get_table_keys()` in `spatialdata.models` to retrieve annotation information of a given table.
- Added public helper function `check_target_region_column_symmetry()` in `spatialdata.models` to check whether annotation
metadata in `table.uns['spatialdata_attrs']` corresponds with respective columns in `table.obs`.
- Added function `validate_table_in_spatialdata()` in SpatialData to validate the annotation target of a table being present in the `SpatialData` object.
- Added method `get_annotated_regions()` in `SpatialData` to get the regions annotated by a given table.
- Added method `get_region_key_column()` in `SpatialData` to get the region_key column in table.obs.
- Added method `get_instance_key_column()` in `SpatialData` to get the instance_key column in table.obs.
- Added method `set_table_annotates_spatialelement()` in `SpatialData` to either set or change the annotation metadata of a table in a given `SpatialData` object. - Added `table_name` parameter to the `aggregate()` function to allow users to give a custom table name to table resulting from aggregation.
- Added `table_name` parameter to the `get_values()` function.

- Utils
- Added `gen_spatial_elements()` generator in SpatialData to generate the `SpatialElements` in a given `SpatialData` object.
- Added `gen_elements` generator in `SpatialData` to generate elements of a `SpatialData` object including tables.
- added `SpatialData.subset()` API
- added `SpatialData.locate_element()` API
- added utils function: `get_centroids()`
- added utils function: `deepcopy()`
- added operation: `to_circles()`
- documented previously-added `get_channels()` to retrieve the channel names of a raster element indepently of it being single or multi-scale

- Transformations-related

- added utils function: `transform_to_data_extent()`
- added utils function: `are_extents_equal()`
- added utils function: `postpone_transformation()`
- added utils function: `remove_transformations_to_coordinate_system()`

- added testing utilities: `assert_spatial_data_objects_are_identical()`, `assert_elements_are_identical()`, `assert_elements_dict_are_identical()`

### Changed/fixed

#### Major

- refactored data loader for deep learning
- refactored `SpatialData.write()` to be more robust
- generalized spatial queries to any combination of 2D/3D data and 2D/3D query region #409

#### Minor

- Changed the string representation of SpatialData to reflect the changes in regard to multiple tables.

### Fixed

#### Major

- improved usability and robustness of sdata.write() when overwrite=True @aeisenbarth
- generalized queries to any combination of 2D/3D data and 2D/3D query region #409
- fixed warnings for categorical dtypes in tables in TableModel and PointsModel
- Changed the string representation of `SpatialData` to reflect the changes in regard to multiple tables and incremental IO.
- improved usability and robustness of `sdata.write()` when `overwrite=True` @aeisenbarth
- fixed warnings for categorical dtypes in tables in `TableModel` and `PointsModel`
- fixed wrong order of points after spatial queries

## [0.0.14] - 2023-10-11

Expand Down
3 changes: 2 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## SpatialData

The `SpatialData` class.
The `SpatialData` class (follow the link to explore its methods).

```{eval-rst}
.. autosummary::
Expand Down Expand Up @@ -83,6 +83,7 @@ The elements (building-blocks) that consitute `SpatialData`.
get_spatial_axes
points_geopandas_to_dask_dataframe
points_dask_dataframe_to_geopandas
get_channels
```

## Transformations
Expand Down
9 changes: 6 additions & 3 deletions src/spatialdata/_core/_deepcopy.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,14 +90,17 @@ def _(element: MultiscaleSpatialImage) -> MultiscaleSpatialImage:
def _(gdf: GeoDataFrame) -> GeoDataFrame:
new_gdf = _deepcopy(gdf)
# temporary fix for https://github.com/scverse/spatialdata/issues/286.
new_attrs = _deepcopy(gdf.attrs)
new_gdf.attrs = new_attrs
new_gdf.attrs = _deepcopy(gdf.attrs)
return new_gdf


@deepcopy.register(DaskDataFrame)
def _(df: DaskDataFrame) -> DaskDataFrame:
return PointsModel.parse(df.compute().copy(deep=True))
# bug: the parser may change the order of the columns
new_ddf = PointsModel.parse(df.compute().copy(deep=True))
# the problem is not .copy(deep=True), but the parser, which discards some metadata https://github.com/scverse/spatialdata/issues/503#issuecomment-2015275322
new_ddf.attrs = _deepcopy(df.attrs)
return new_ddf


@deepcopy.register(AnnData)
Expand Down
12 changes: 11 additions & 1 deletion src/spatialdata/_core/_elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,20 @@ def __init__(self, shared_keys: set[str | None]) -> None:
self._shared_keys = shared_keys
super().__init__()

@staticmethod
def _check_valid_name(name: str) -> None:
if not isinstance(name, str):
raise TypeError(f"Name must be a string, not {type(name).__name__}.")
if len(name) == 0:
raise ValueError("Name cannot be an empty string.")
if not all(c.isalnum() or c in "_-" for c in name):
raise ValueError("Name must contain only alphanumeric characters, underscores, and hyphens.")

@staticmethod
def _check_key(key: str, element_keys: Iterable[str], shared_keys: set[str | None]) -> None:
Elements._check_valid_name(key)
if key in element_keys:
warn(f"Key `{key}` already exists. Overwriting it.", UserWarning, stacklevel=2)
warn(f"Key `{key}` already exists. Overwriting it in-memory.", UserWarning, stacklevel=2)
else:
if key in shared_keys:
raise KeyError(f"Key `{key}` already exists.")
Expand Down
Loading
Loading