Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document enumeration path map in the spec. #5203

Merged
merged 2 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions format_spec/array_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@ The array schema file consists of a single [generic tile](./generic_tile.md), wi
| Label 1 | [Dimension Label](#dimension_label) | First dimension label |
| … | … | … |
| Label N | [Dimension Label](#dimension_label) | Nth dimension label |
| Num enumerations | `uint32_t` | Number of [enumerations](./enumeration.md) in the array |
| Enumeration name length 1 | `uint32_t` | The number of characters in the enumeration 1 name |
| Enumeration name 1 | `uint8_t[]` | The name of enumeration 1 |
| Enumeration filename length 1 | `uint32_t` | The number of characters in the enumeration 1 file |
| Enumeration filename 1 | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration 1's data |
| Enumeration name length N | `uint32_t` | The number of characters in the enumeration N name |
| Enumeration name N | `uint8_t[]` | The name of enumeration N |
| Enumeration filename length N | `uint32_t` | The number of characters in the enumeration N file |
| Enumeration filename N | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration N's data |
| CurrentDomain | [CurrentDomain](./current_domain.md) | The array current domain |

## Domain
Expand Down
30 changes: 20 additions & 10 deletions tiledb/sm/array_schema/array_schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,8 @@ ArraySchema::ArraySchema(
dim_map_[dim->name()] = dim;
}

for (auto& [enmr_name, enmr_uri] : enumeration_path_map_) {
(void)enmr_uri;
for (auto& [enmr_name, enmr_filename] : enumeration_path_map_) {
(void)enmr_filename;
enumeration_map_[enmr_name] = nullptr;
}

Expand Down Expand Up @@ -753,6 +753,16 @@ bool ArraySchema::is_nullable(const std::string& name) const {
// dimension_label #1
// dimension_label #2
// ...
// enumeration_num (uint32_t)
// enumeration_name_length #1 (uint32_t)
// enumeration_name_chars #1 (string)
// enumeration_filename_length #1 (uint32_t)
// enumeration_filename_chars #1 (string)
// enumeration_name_length #2 (uint32_t)
// enumeration_name_chars #2 (string)
// enumeration_filename_length #2 (uint32_t)
// enumeration_filename_chars #2 (string)
// ...
Comment on lines +756 to +765
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these inline storage format comments should be removed. There must be a single source of truth for the storage format, and that will be in format_spec.

// current_domain
void ArraySchema::serialize(Serializer& serializer) const {
// Write version, which is always the current version. Despite
Expand Down Expand Up @@ -812,14 +822,14 @@ void ArraySchema::serialize(Serializer& serializer) const {
utils::safe_integral_cast<size_t, uint32_t>(enumeration_map_.size());

serializer.write<uint32_t>(enmr_num);
for (auto& [enmr_name, enmr_uri] : enumeration_path_map_) {
for (auto& [enmr_name, enmr_filename] : enumeration_path_map_) {
auto enmr_name_size = static_cast<uint32_t>(enmr_name.size());
serializer.write<uint32_t>(enmr_name_size);
serializer.write(enmr_name.data(), enmr_name_size);

auto enmr_uri_size = static_cast<uint32_t>(enmr_uri.size());
serializer.write<uint32_t>(enmr_uri_size);
serializer.write(enmr_uri.data(), enmr_uri_size);
auto enmr_filename_size = static_cast<uint32_t>(enmr_filename.size());
serializer.write<uint32_t>(enmr_filename_size);
serializer.write(enmr_filename.data(), enmr_filename_size);
}

// Serialize array current domain information
Expand Down Expand Up @@ -1367,11 +1377,11 @@ shared_ptr<ArraySchema> ArraySchema::deserialize(
std::string enmr_name(
deserializer.get_ptr<char>(enmr_name_size), enmr_name_size);

auto enmr_path_size = deserializer.read<uint32_t>();
std::string enmr_path_name(
deserializer.get_ptr<char>(enmr_path_size), enmr_path_size);
auto enmr_filename_size = deserializer.read<uint32_t>();
std::string enmr_filename(
deserializer.get_ptr<char>(enmr_filename_size), enmr_filename_size);

enumeration_path_map[enmr_name] = enmr_path_name;
enumeration_path_map[enmr_name] = enmr_filename;
}
}

Expand Down
2 changes: 1 addition & 1 deletion tiledb/sm/array_schema/array_schema.h
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ class ArraySchema {
tdb::pmr::unordered_map<std::string, shared_ptr<const Enumeration>>
enumeration_map_;

/** A map of Enumeration names to Enumeration URIs */
/** A map of Enumeration names to Enumeration filenames */
tdb::pmr::unordered_map<std::string, std::string> enumeration_path_map_;

/** The filter pipeline run on offset tiles for var-length attributes. */
Expand Down
Loading