Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add String type with Utf8Raw encoding to Bigtable API #1419

Merged
merged 5 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions protos/google/bigtable/admin/v2/types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,18 @@ option ruby_package = "Google::Cloud::Bigtable::Admin::V2";
// * Natural sort: Does the encoded value sort consistently with the original
// typed value? Note that Bigtable will always sort data based on the raw
// encoded value, *not* the decoded type.
// - Example: STRING values sort in the same order as their UTF-8 encodings.
// - Example: BYTES values sort in the same order as their raw encodings.
// - Counterexample: Encoding INT64 to a fixed-width STRING does *not*
// preserve sort order when dealing with negative numbers.
// INT64(1) > INT64(-1), but STRING("-00001") > STRING("00001).
// - The overall encoding chain sorts naturally if *every* link does.
// - The overall encoding chain has this property if *every* link does.
// * Self-delimiting: If we concatenate two encoded values, can we always tell
// where the first one ends and the second one begins?
// - Example: If we encode INT64s to fixed-width STRINGs, the first value
// will always contain exactly N digits, possibly preceded by a sign.
// - Counterexample: If we concatenate two UTF-8 encoded STRINGs, we have
// no way to tell where the first one ends.
// - The overall encoding chain is self-delimiting if *any* link is.
// - The overall encoding chain has this property if *any* link does.
// * Compatibility: Which other systems have matching encoding schemes? For
// example, does this encoding have a GoogleSQL equivalent? HBase? Java?
message Type {
Expand All @@ -78,6 +78,31 @@ message Type {
Encoding encoding = 1;
}

// String
// Values of type `String` are stored in `Value.string_value`.
message String {
// Rules used to convert to/from lower level types.
message Encoding {
// UTF-8 encoding
// * Natural sort? No (ASCII characters only)
// * Self-delimiting? No
// * Compatibility?
// - BigQuery Federation `TEXT` encoding
// - HBase `Bytes.toBytes`
// - Java `String#getBytes(StandardCharsets.UTF_8)`
message Utf8Raw {}

// Which encoding to use.
oneof encoding {
// Use `Utf8Raw` encoding.
Utf8Raw utf8_raw = 1;
}
}

// The encoding to use when converting to/from lower level types.
Encoding encoding = 1;
}

// Int64
// Values of type `Int64` are stored in `Value.int_value`.
message Int64 {
Expand Down Expand Up @@ -140,6 +165,9 @@ message Type {
// Bytes
Bytes bytes_type = 1;

// String
String string_type = 2;

// Int64
Int64 int64_type = 5;

Expand Down
Loading
Loading