Skip to content

Commit

Permalink
doc: fix missed reference and typos (#109)
Browse files Browse the repository at this point in the history
  • Loading branch information
liuq19 authored Sep 6, 2024
1 parent 99b37cd commit 140ac5f
Show file tree
Hide file tree
Showing 10 changed files with 25 additions and 24 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ English | [中文](README_ZH.md)

A fast Rust JSON library based on SIMD. It has some references to other open-source libraries like [sonic_cpp](https://github.com/bytedance/sonic-cpp), [serde_json](https://github.com/serde-rs/json), [sonic](https://github.com/bytedance/sonic), [simdjson](https://github.com/simdjson/simdjson), [rust-std](https://github.com/rust-lang/rust/tree/master/library/core/src/num) and more.

***For Golang users to use `sonic_rs`, please see [for_Golang_user.md](docs/for_Golang_user.md)***
***For Golang users to use `sonic_rs`, please see [for_Golang_user.md](https://github.com/cloudwego/sonic-rs/blob/main/docs/for_Golang_user.md)***

***For users to migrate from `serde_json` to `sonic_rs`, can see [serdejson_compatibility](docs/serdejson_compatibility.md)***
***For users to migrate from `serde_json` to `sonic_rs`, can see [serdejson_compatibility](https://github.com/cloudwego/sonic-rs/blob/main/docs/serdejson_compatibility.md)***

## Requirements/Notes

Expand Down Expand Up @@ -463,5 +463,10 @@ Thanks the following open-source libraries. sonic-rs has some references to othe

We rewrote many SIMD algorithms from sonic-cpp/sonic/simdjson/yyjson for performance. We reused the de/ser codes and modified necessary parts from serde_json to make high compatibility with `serde`. We reused part codes about floating parsing from rust-std to make it more accurate.

Referenced papers:
1. [Parsing Gigabytes of JSON per Second](https://arxiv.org/abs/1902.08318)
2. [JSONSki: streaming semi-structured data with bit-parallel fast-forwarding](https://dl.acm.org/doi/10.1145/3503222.3507719)


## Contributing
Please read [CONTRIBUTING.md](CONTRIBUTING.md) for information on contributing to sonic-rs.
9 changes: 7 additions & 2 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
sonic-rs 是一个基于 SIMD 的高性能 JSON 库。它参考了其他开源库如 [sonic_cpp](https://github.com/bytedance/sonic-cpp)[serde_json](https://github.com/serde-rs/json)[sonic](https://github.com/bytedance/sonic)[simdjson](https://github.com/simdjson/simdjson)[rust-std](https://github.com/rust-lang/rust/tree/master/library/core/src/num) 等。


***对于 Golang 用户迁移 Rust 使用 `sonic_rs`, 请参考 [for_Golang_user_zh.md](docs/for_Golang_user_zh.md)***
***对于 Golang 用户迁移 Rust 使用 `sonic_rs`, 请参考 [for_Golang_user_zh.md](https://github.com/cloudwego/sonic-rs/blob/main/docs/for_Golang_user_zh.md)***

***对于 用户从 `serde_json` 迁移 `sonic_rs`, 请参考 [serdejson_compatibility](docs/serdejson_compatibility.md)***
***对于 用户从 `serde_json` 迁移 `sonic_rs`, 请参考 [serdejson_compatibility](https://github.com/cloudwego/sonic-rs/blob/main/docs/serdejson_compatibility.md)***

## ***要求/注意事项***

Expand Down Expand Up @@ -455,6 +455,11 @@ Thanks the following open-source libraries. sonic-rs has some references to othe

我们为了性能重写了来自 sonic-cpp/sonic/simdjson/yyjson 的许多 SIMD 算法。我们重用了来自 serde_json 的反/序列化代码,并修改了必要的部分以与 serde 高度兼容。我们重用了来自 rust-std 的部分浮点解析代码,使其结构更准确。

参考论文:
1. [Parsing Gigabytes of JSON per Second](https://arxiv.org/abs/1902.08318)
2. [JSONSki: streaming semi-structured data with bit-parallel fast-forwarding](https://dl.acm.org/doi/10.1145/3503222.3507719)


## 如何贡献

请阅读 [CONTRIBUTING.md](CONTRIBUTING.md)
2 changes: 1 addition & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This document shows key roadmap of `sonic-rs` development. It may help users kno

0. ~~make sonic-rs support stable Rust~~

1. optimize the performance in aarch64 (WIP: 50%)
1. ~~optimize the performance in aarch64 (WIP: 50%)~~

2. runtime CPU detection

Expand Down
6 changes: 1 addition & 5 deletions docs/for_Golang_user.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,7 @@ Corresponding API references:

- Parsing into Golang `interface{}/any` or sonic-go `ast.Node`:

It is recommended to replace it with `sonic_rs::Value` for better performance.

***if the json has duplicated keys, pls use `serde_json::Value`, because `sonic_rs::Value` not maintain a hashmap inner***

***even though use `serde_json::Value`, still can be parsed use `sonic_rs::from_str/from_slice`***
It is recommended to replace it with `sonic_rs::Value` for better performance.

- Using `gjson.Get` or `jsonparser.Get` APIs:

Expand Down
6 changes: 1 addition & 5 deletions docs/for_Golang_user_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,10 @@

建议使用 `sonic_rs::Value` 替换,性能更优。

***如果 JSON 中有重复的key,建议使用 `serde_json::Value`, 因为 `sonic_rs::Value` 中没有建立哈希表***

***即使使用 `serde_json::Value`, 也可以使用 `sonic_rs::from_str/from_slice` 进行解析,性能相比原生会更好一些***

- 使用 `gjson.Get``jsonparser.Get` 等API:
gjson/jsonparser get API 本身未做严格的JSON 校验,因此可以使用 `sonic_rs::get_unchecked` 进行平替。 sonic_rs get API 会返回一个 `Result<LazyValue>`. 如果没有找到该字段,会报错。

`LazyValue` 可以用 `as_bool, as_str`等将 JSON 进一步***解析成对应的类型**
`LazyValue` 可以用 `as_bool, as_str`等将 JSON 进一步**解析成对应的类型**

如果只需要拿到原始的raw JSON, ***不做解析***,请使用 `as_raw_str, as_raw_faststr` 等 API. 参考例子: [get_from.rs](../examples/get_from.rs)

Expand Down
4 changes: 2 additions & 2 deletions docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This document will introduce some performance optimization details of sonic-rs (

## Get fields from JSON/parsing JSON on-demand

The on-demand parsing algorithm focuses on skipping unnecessary fields, and the challenge lies in skipping JSON containers, including JSON Objects and JSON Arrays. This is because we need to pay attention to the brackets in the JSON string, such as `{ "key": "value {}"}`. We utilize the SIMD instructions to calculate the bitmap of the string, and then by counting the number of brackets, we can skip the entire JSON container.
The on-demand parsing algorithm focuses on skipping unnecessary fields, and the challenge lies in skipping JSON containers, including JSON Objects and JSON Arrays. This is because we need to pay attention to the brackets in the JSON string, such as `{ "key": "value {}"}`. We utilize the SIMD instructions to calculate the bitmap of the string, and then by counting the number of brackets, we can skip the entire JSON container. Reference the paper [JSONSki](https://dl.acm.org/doi/10.1145/3503222.3507719).

The overall algorithm is as follows:

Expand Down Expand Up @@ -118,7 +118,7 @@ In addition, we also optimize for compact JSON and cases where there's only one

## Float number parsing using SIMD

Parsing floating-point numbers is one of the most time-consuming operations in JSON parsing. For 16-length number strings, we can directly use SIMD instructions for parsing, as it can read ASCII number characters and accumulate them step by step. Refer to [simd_str2int](https://github.com/cloudwego/sonic-rs/blob/main/src/util/arch/x86_64.rs#L115) for the specific algorithm. This algorithm comes from [sonic-cpp](https://github.com/bytedance/sonic-cpp/blob/master/include/sonic/internal/arch/sse/str2int.h).
Parsing floating-point numbers is one of the most time-consuming operations in JSON parsing. For 16-length number strings, we can directly use SIMD instructions for parsing, as it can read ASCII number characters and accumulate them step by step. Refer to [simd_str2int](https://github.com/cloudwego/sonic-rs/blob/main/src/util/arch/x86_64.rs#L115) for the specific algorithm. This algorithm comes from [sonic-cpp](https://github.com/bytedance/sonic-cpp/blob/master/include/sonic/internal/arch/sse/str2int.h).

When parsing floating-point numbers, we only need to consider 17 significant digit bits for 64-bit floating-point numbers according to the IEEE754 specification. Thus, in this function, we employ a switch table to decrease unnecessary SIMD instructions.

Expand Down
4 changes: 2 additions & 2 deletions docs/performance_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## 按需解析

如何实现一个性能更好的按需解析算法。按需解析的性能关键在于跳过不需要的字段,其中难点在于如何跳过 JSON container, 包括 JSON Object 和 JSON array,因为我们需要注意 JSON 字符串中的括号,例如 `"{ "key": "value {}"}`。 我们利用了 simd 指令计算字符串的bitmap,然后通过计算括号的数量来跳过整个JSON container。
如何实现一个性能更好的按需解析算法。按需解析的性能关键在于跳过不需要的字段,其中难点在于如何跳过 JSON container, 包括 JSON Object 和 JSON array,因为我们需要注意 JSON 字符串中的括号,例如 `"{ "key": "value {}"}`。 我们利用了 simd 指令计算字符串的bitmap,然后通过计算括号的数量来跳过整个JSON container。参考论文 [JSONSki](https://dl.acm.org/doi/10.1145/3503222.3507719).

整体算法如下:

Expand Down Expand Up @@ -122,7 +122,7 @@ JSON 规范中的空格字符有: ` `, `\n`, '\r', '\t`. 利用 SIMD 指令跳
```


对于长度为16的数字字符串,是可以直接使用 SIMD 指令进行解析,读取 ascii 数字字符并且逐步累加的。 具体算法可以参考[simd_str2int](https://github.com/cloudwego/sonic-rs/blob/main/src/util/arch/x86_64.rs#L115)。这个算法来源于 [sonic-cpp](https://github.com/bytedance/sonic-cpp/blob/master/include/sonic/internal/arch/sse/str2int.h). 在解析浮点数时,按照 IEEE754 规范,对于64 位浮点数,我们只需要关注17位有效数字。因此,在这个函数里面使用了一个 switch table 来减少不必要的 SIMD 指令。
对于长度为16的数字字符串,是可以直接使用 SIMD 指令进行解析,读取 ascii 数字字符并且逐步累加的。 具体算法可以参考[simd_str2int](https://github.com/cloudwego/sonic-rs/blob/main/src/util/arch/x86_64.rs#L115)。这个算法来源于 [sonic-cpp](https://github.com/bytedance/sonic-cpp/blob/master/include/sonic/internal/arch/sse/str2int.h). 在解析浮点数时,按照 IEEE754 规范,对于64 位浮点数,我们只需要关注17位有效数字。因此,在这个函数里面使用了一个 switch table 来减少不必要的 SIMD 指令。


## 使用 SIMD 序列化 JSON string
Expand Down
2 changes: 1 addition & 1 deletion docs/serdejson_compatibility.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# A quick guide to migrate from serde_json

The goal of sonic-rs is performance and easiness (more APIs and ALLINONE) to use. Otherwise, reconmend to use `serde_json`.
The goal of sonic-rs is performance and easiness (more APIs and ALLINONE) to use. Otherwise, recommended to use `serde_json`.

Just replace as follows:

Expand Down
1 change: 0 additions & 1 deletion docs/value_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@

# A new and user-friendly areana-based document design

TODO: ^_^ ...
6 changes: 3 additions & 3 deletions src/reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -245,10 +245,10 @@ impl<'a> Reader<'a> for Read<'a> {
}
}

fn validate_utf8(&mut self, allowd_space: (usize, usize)) -> Result<()> {
if self.next_invalid_utf8 < allowd_space.0 {
fn validate_utf8(&mut self, allowed_space: (usize, usize)) -> Result<()> {
if self.next_invalid_utf8 < allowed_space.0 {
Err(invalid_utf8(self.slice, self.next_invalid_utf8))
} else if self.next_invalid_utf8 < allowd_space.1 {
} else if self.next_invalid_utf8 < allowed_space.1 {
// this space is allowed, should update the next invalid utf8 position
self.next_invalid_utf8 = match from_utf8(&self.slice[self.index..]) {
Ok(_) => usize::MAX,
Expand Down

0 comments on commit 140ac5f

Please sign in to comment.