diff --git a/README.md b/README.md index 80388ad..90c5a95 100644 --- a/README.md +++ b/README.md @@ -7,21 +7,61 @@ [crate downloads]: https://img.shields.io/crates/d/jsonb.svg - -`jsonb` is a jsonb implementation written in Rust. It provides a fast, lightweight, and easy-to-use API for working with jsonb data. +`jsonb` is a binary format `JSON` representation inspired by [PostgreSQL](https://www.postgresql.org/docs/current/datatype-json.html) and [CockroachDB](https://www.cockroachlabs.com/docs/stable/jsonb). It provides a fast, lightweight and easy-to-use API for working with `JSON` data. ## Features -- Fast performance: `jsonb` is designed to be highly performant, allowing you to work with large jsonb data sets with ease. -- Easy to use API: `jsonb` provides a simple and intuitive API for working with jsonb data, making it easy to get started. -- Safe and secure: `jsonb` is written in Rust, which provides memory safety and thread safety guarantees, making it a safe choice for handling sensitive data. -- Flexible: `jsonb` supports a wide range of data types and can be used to store complex data structures. +- Good compatibility: `jsonb` fully supports the `JSON` standard and can be used to store complex data structures. +- Fast performance: `jsonb` is designed for high performance, allowing you to work with large `JSON` data sets with ease. +- Easy to use: `jsonb` provides a number of built-in functions to support various operations, and also supports the `JSONPath` syntax for selecting and extracting subset elements. +- Safe and secure: `jsonb` is written in Rust, which provides memory and thread safety guarantees, making it a safe choice for handling sensitive data. + +## Encoding format + +The `jsonb` encoding format is a tree-like structure. Each node contains a container header, a number of JEntry headers, and nested encoding values. + +- 32-bit container header. 3 bits identify the type of value, including `scalar`, `object` and `array`, and 29 bits identify the number of JEntries in the `array` or `object`. The root node of the `jsonb` value is always a container header. + - `scalar` container header: `0x20000000` + - `object` container header: `0x40000000` + - `array` container header: `0x80000000` +- 32-bit JEntry header. 1 bit identifies whether the JEntry stores a length or an offset, 3 bits identify the type of value, including `null`, `string`, `number`, `false`, `true` and `container`, and the remaining 28 bits identify the length or offset of the encoding value. + - `null` JEntry header: `0x00000000` + - `string` JEntry header: `0x10000000` + - `number` JEntry header: `0x20000000` + - `false` JEntry header: `0x30000000` + - `true` JEntry header: `0x40000000` + - `container` JEntry header `0x50000000` +- Encoding value. Different types of JEntry header have different encoding values. + - `null`, `true`, `false`: no encoding value, identified by the JEntry header. + - `string`: a normal UTF-8 string. + - `number`: an encoded number to represent uint64s, int64s and float64s. + - `container`: a nested `json` value with a recursive structure. +#### An encoding example + +```text +// JSON value +[false, 10, {"k":"v"}] + +// JSONB encoding +0x80000003 array container header (3 JEntries) +0x30000000 false JEntry header (no encoding value) +0x20000002 number JEntry header (encoding value length 2) +0x5000000e container JEntry header (encoding value length 14) +0x500a number encoding value (10) +0x40000001 object container header (1 JEntry) +0x10000001 string key JEntry header (encoding value length 1) +0x10000001 string value JEntry header (encoding value length 1) +0x6b string encoding value ("k") +0x76 string encoding value ("v") +``` -## JSONB value struct +## Jsonb value -``` rust -// JSONB value +The `jsonb` value is an enumeration that represents all kinds of `JSON` values and serves as an intermediate for converting other data types to the `jsonb` binary format value. + +```rust +// jsonb value #[derive(Clone, PartialEq, Eq)] pub enum Value<'a> { Null, @@ -33,3 +73,73 @@ pub enum Value<'a> { } ``` +## Built-in functions + +`jsonb` implements a number of commonly used built-in functions. Since most functions only focus on a subset of the total value, using container headers and JEntry headers to can efficiently skip over intermediate parts of the `jsonb` value. This avoids time-consuming deserialisation operations and provides very high performance. For more information, see https://docs.rs/jsonb/latest/jsonb/#functions + +## SQL/JSONPath + +[SQL/JSONPath](https://www.iso.org/standard/67367.html) is a query language used to select and extract a subset of elements from a `jsonb` value. + +#### Operators + +The following operators have been implemented: + +| Operator | Description | Examples | +|--------------------------|--------------------------------------------------------------|--------------------| +| `$` | The root element | `$` | +| `@` | The current element in the filter expression | `$.event?(@ == 1)` | +| `.*` | Selecting all elements in an Object | `$.*` | +| `.` | Selecting element that match the name in an Object | `$.event` | +| `:` | Alias of `.` | `$:event` | +| `[""]` | Alias of `.` | `$["event"]` | +| `[*]` | Selecting all elements in an Array | `$[*]` | +| `[, ..]` | Selecting 0-based `n-th` elements in an Array | `$[1, 2]` | +| `[last - , ..]` | Selecting `n-th` element before the last element in an Array | `$[0, last - 1]` | +| `[ to , ..]` | Selecting all elements of a range in an Array | `$[1 to last - 2]` | +| `?()` | Selecting all elements that matched the filter expression | `$?(@.price < 10)` | + +## Examples + +```rust +fn main() { + let json = r#" + { + "name":"Fred", + "phones":[ + { + "type":"home", + "number":3720453 + }, + { + "type": "work", + "number":5062051 + } + ] + }"#; + + let path = r#"$.phones[*]?(@.number == 3720453)"#; + + // parse JSON string to jsonb value + let value = jsonb::parse_value(json.as_bytes()).unwrap(); + // encode jsonb value to jsonb binary value + let jsonb = value.to_vec(); + // parse JSONPath string + let json_path = jsonb::jsonpath::parse_json_path(path.as_bytes()).unwrap(); + // select subset value from jsonb binary value + let mut sub_jsonb = Vec::new(); + let mut sub_offsets = Vec::new(); + jsonb::get_by_path(&jsonb, json_path, &mut sub_jsonb, &mut sub_offsets); + + // value={"number":3720453,"type":"home"} + println!("value={}", jsonb::to_string(&sub_jsonb)); +} +``` + +## Contributing + +`jsonb` is an open source project and all kinds of contributions are welcome! You can help with ideas, code or documentation. + +## License + +Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0) diff --git a/src/lib.rs b/src/lib.rs index bbf6112..2c2db4b 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -12,6 +12,55 @@ // See the License for the specific language governing permissions and // limitations under the License. +//! `jsonb` is a binary format `JSON` representation inspired by [PostgreSQL](https://www.postgresql.org/docs/current/datatype-json.html) and [CockroachDB](https://www.cockroachlabs.com/docs/stable/jsonb). It provides a fast, lightweight and easy-to-use API for working with `JSON` data. +//! +//! ## Features +//! +//! - Good compatibility: `jsonb` fully supports the `JSON` standard and can be used to store complex data structures. +//! - Fast performance: `jsonb` is designed for high performance, allowing you to work with large `JSON` data sets with ease. +//! - Easy to use: `jsonb` provides a number of built-in functions to support various operations, and also supports the `JSONPath` syntax for selecting and extracting subset elements. +//! - Safe and secure: `jsonb` is written in Rust, which provides memory and thread safety guarantees, making it a safe choice for handling sensitive data. +//! +//! ## Encoding format +//! +//! The `jsonb` encoding format is a tree-like structure. Each node contains a container header, a number of JEntry headers, and nested encoding values. +//! +//! - 32-bit container header. 3 bits identify the type of value, including `scalar`, `object` and `array`, and 29 bits identify the number of JEntries in the `array` or `object`. The root node of the `jsonb` value is always a container header. +//! - `scalar` container header: `0x20000000` +//! - `object` container header: `0x40000000` +//! - `array` container header: `0x80000000` +//! - 32-bit JEntry header. 1 bit identifies whether the JEntry stores a length or an offset, 3 bits identify the type of value, including `null`, `string`, `number`, `false`, `true` and `container`, and the remaining 28 bits identify the length or offset of the encoding value. +//! - `null` JEntry header: `0x00000000` +//! - `string` JEntry header: `0x10000000` +//! - `number` JEntry header: `0x20000000` +//! - `false` JEntry header: `0x30000000` +//! - `true` JEntry header: `0x40000000` +//! - `container` JEntry header `0x50000000` +//! - Encoding value. Different types of JEntry header have different encoding values. +//! - `null`, `true`, `false`: no encoding value, identified by the JEntry header. +//! - `string`: a normal UTF-8 string. +//! - `number`: an encoded number to represent uint64s, int64s and float64s. +//! - `container`: a nested `json` value with a recursive structure. +//! +//! #### An encoding example +//! +//! ```text +//! // JSON value +//! [false, 10, {"k":"v"}] +//! +//! // JSONB encoding +//! 0x80000003 array container header (3 JEntries) +//! 0x30000000 false JEntry header (no encoding value) +//! 0x20000002 number JEntry header (encoding value length 2) +//! 0x5000000e container JEntry header (encoding value length 14) +//! 0x500a number encoding value (10) +//! 0x40000001 object container header (1 JEntry) +//! 0x10000001 string key JEntry header (encoding value length 1) +//! 0x10000001 string value JEntry header (encoding value length 1) +//! 0x6b string encoding value ("k") +//! 0x76 string encoding value ("v") +//! ``` + #![allow(clippy::uninlined_format_args)] mod constants;