Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser.rl: parse_string implement a fast path #689

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

casperisfine
Copy link

If we assume most string don't contain any escape sequence we can avoid a lot of costly operations when it holds true.

Before:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   884.000 i/100ms
                  oj   789.000 i/100ms
          Oj::Parser   943.000 i/100ms
           rapidjson   584.000 i/100ms
Calculating -------------------------------------
                json      8.897k (± 1.3%) i/s  (112.40 μs/i) -     45.084k in   5.068520s
                  oj      7.967k (± 1.5%) i/s  (125.52 μs/i) -     40.239k in   5.051985s
          Oj::Parser      9.564k (± 1.4%) i/s  (104.56 μs/i) -     48.093k in   5.029626s
           rapidjson      5.947k (± 1.4%) i/s  (168.16 μs/i) -     29.784k in   5.009437s

Comparison:
                json:     8896.5 i/s
          Oj::Parser:     9563.8 i/s - 1.08x  faster
                  oj:     7966.8 i/s - 1.12x  slower
           rapidjson:     5946.7 i/s - 1.50x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    83.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    54.000 i/100ms
Calculating -------------------------------------
                json    823.083 (± 1.8%) i/s    (1.21 ms/i) -      4.150k in   5.043805s
                  oj    632.538 (± 1.4%) i/s    (1.58 ms/i) -      3.200k in   5.060073s
          Oj::Parser    769.122 (± 1.8%) i/s    (1.30 ms/i) -      3.850k in   5.007501s
           rapidjson    548.494 (± 1.5%) i/s    (1.82 ms/i) -      2.754k in   5.022153s

Comparison:
                json:      823.1 i/s
          Oj::Parser:      769.1 i/s - 1.07x  slower
                  oj:      632.5 i/s - 1.30x  slower
           rapidjson:      548.5 i/s - 1.50x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    427.162 (± 1.2%) i/s    (2.34 ms/i) -      2.173k in   5.087666s
                  oj    351.463 (± 2.8%) i/s    (2.85 ms/i) -      1.768k in   5.035149s
          Oj::Parser    461.849 (± 3.7%) i/s    (2.17 ms/i) -      2.340k in   5.074461s
           rapidjson    395.155 (± 1.8%) i/s    (2.53 ms/i) -      1.989k in   5.034927s

Comparison:
                json:      427.2 i/s
          Oj::Parser:      461.8 i/s - 1.08x  faster
           rapidjson:      395.2 i/s - 1.08x  slower
                  oj:      351.5 i/s - 1.22x  slower

After:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   953.000 i/100ms
                  oj   813.000 i/100ms
          Oj::Parser   956.000 i/100ms
           rapidjson   563.000 i/100ms
Calculating -------------------------------------
                json      9.525k (± 1.2%) i/s  (104.98 μs/i) -     47.650k in   5.003252s
                  oj      8.117k (± 0.5%) i/s  (123.20 μs/i) -     40.650k in   5.008283s
          Oj::Parser      9.590k (± 3.2%) i/s  (104.27 μs/i) -     48.756k in   5.089794s
           rapidjson      6.020k (± 0.9%) i/s  (166.10 μs/i) -     30.402k in   5.050155s

Comparison:
                json:     9525.3 i/s
          Oj::Parser:     9590.1 i/s - same-ish: difference falls within error
                  oj:     8116.7 i/s - 1.17x  slower
           rapidjson:     6020.5 i/s - 1.58x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    87.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    75.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    866.563 (± 0.8%) i/s    (1.15 ms/i) -      4.350k in   5.020138s
                  oj    643.567 (± 0.8%) i/s    (1.55 ms/i) -      3.264k in   5.072101s
          Oj::Parser    777.346 (± 3.5%) i/s    (1.29 ms/i) -      3.900k in   5.023933s
           rapidjson    557.158 (± 0.7%) i/s    (1.79 ms/i) -      2.805k in   5.034731s

Comparison:
                json:      866.6 i/s
          Oj::Parser:      777.3 i/s - 1.11x  slower
                  oj:      643.6 i/s - 1.35x  slower
           rapidjson:      557.2 i/s - 1.56x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    40.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    429.216 (± 1.2%) i/s    (2.33 ms/i) -      2.173k in   5.063351s
                  oj    354.755 (± 1.1%) i/s    (2.82 ms/i) -      1.785k in   5.032374s
          Oj::Parser    465.114 (± 3.7%) i/s    (2.15 ms/i) -      2.360k in   5.081634s
           rapidjson    387.135 (± 1.3%) i/s    (2.58 ms/i) -      1.950k in   5.037787s

Comparison:
                json:      429.2 i/s
          Oj::Parser:      465.1 i/s - 1.08x  faster
           rapidjson:      387.1 i/s - 1.11x  slower
                  oj:      354.8 i/s - 1.21x  slower

If we assume most string don't contain any escape sequence we can avoid
a lot of costly operations when it holds true.

Before:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   884.000 i/100ms
                  oj   789.000 i/100ms
          Oj::Parser   943.000 i/100ms
           rapidjson   584.000 i/100ms
Calculating -------------------------------------
                json      8.897k (± 1.3%) i/s  (112.40 μs/i) -     45.084k in   5.068520s
                  oj      7.967k (± 1.5%) i/s  (125.52 μs/i) -     40.239k in   5.051985s
          Oj::Parser      9.564k (± 1.4%) i/s  (104.56 μs/i) -     48.093k in   5.029626s
           rapidjson      5.947k (± 1.4%) i/s  (168.16 μs/i) -     29.784k in   5.009437s

Comparison:
                json:     8896.5 i/s
          Oj::Parser:     9563.8 i/s - 1.08x  faster
                  oj:     7966.8 i/s - 1.12x  slower
           rapidjson:     5946.7 i/s - 1.50x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    83.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    54.000 i/100ms
Calculating -------------------------------------
                json    823.083 (± 1.8%) i/s    (1.21 ms/i) -      4.150k in   5.043805s
                  oj    632.538 (± 1.4%) i/s    (1.58 ms/i) -      3.200k in   5.060073s
          Oj::Parser    769.122 (± 1.8%) i/s    (1.30 ms/i) -      3.850k in   5.007501s
           rapidjson    548.494 (± 1.5%) i/s    (1.82 ms/i) -      2.754k in   5.022153s

Comparison:
                json:      823.1 i/s
          Oj::Parser:      769.1 i/s - 1.07x  slower
                  oj:      632.5 i/s - 1.30x  slower
           rapidjson:      548.5 i/s - 1.50x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    427.162 (± 1.2%) i/s    (2.34 ms/i) -      2.173k in   5.087666s
                  oj    351.463 (± 2.8%) i/s    (2.85 ms/i) -      1.768k in   5.035149s
          Oj::Parser    461.849 (± 3.7%) i/s    (2.17 ms/i) -      2.340k in   5.074461s
           rapidjson    395.155 (± 1.8%) i/s    (2.53 ms/i) -      1.989k in   5.034927s

Comparison:
                json:      427.2 i/s
          Oj::Parser:      461.8 i/s - 1.08x  faster
           rapidjson:      395.2 i/s - 1.08x  slower
                  oj:      351.5 i/s - 1.22x  slower
```

After:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   953.000 i/100ms
                  oj   813.000 i/100ms
          Oj::Parser   956.000 i/100ms
           rapidjson   563.000 i/100ms
Calculating -------------------------------------
                json      9.525k (± 1.2%) i/s  (104.98 μs/i) -     47.650k in   5.003252s
                  oj      8.117k (± 0.5%) i/s  (123.20 μs/i) -     40.650k in   5.008283s
          Oj::Parser      9.590k (± 3.2%) i/s  (104.27 μs/i) -     48.756k in   5.089794s
           rapidjson      6.020k (± 0.9%) i/s  (166.10 μs/i) -     30.402k in   5.050155s

Comparison:
                json:     9525.3 i/s
          Oj::Parser:     9590.1 i/s - same-ish: difference falls within error
                  oj:     8116.7 i/s - 1.17x  slower
           rapidjson:     6020.5 i/s - 1.58x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    87.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    75.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    866.563 (± 0.8%) i/s    (1.15 ms/i) -      4.350k in   5.020138s
                  oj    643.567 (± 0.8%) i/s    (1.55 ms/i) -      3.264k in   5.072101s
          Oj::Parser    777.346 (± 3.5%) i/s    (1.29 ms/i) -      3.900k in   5.023933s
           rapidjson    557.158 (± 0.7%) i/s    (1.79 ms/i) -      2.805k in   5.034731s

Comparison:
                json:      866.6 i/s
          Oj::Parser:      777.3 i/s - 1.11x  slower
                  oj:      643.6 i/s - 1.35x  slower
           rapidjson:      557.2 i/s - 1.56x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    40.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    429.216 (± 1.2%) i/s    (2.33 ms/i) -      2.173k in   5.063351s
                  oj    354.755 (± 1.1%) i/s    (2.82 ms/i) -      1.785k in   5.032374s
          Oj::Parser    465.114 (± 3.7%) i/s    (2.15 ms/i) -      2.360k in   5.081634s
           rapidjson    387.135 (± 1.3%) i/s    (2.58 ms/i) -      1.950k in   5.037787s

Comparison:
                json:      429.2 i/s
          Oj::Parser:      465.1 i/s - 1.08x  faster
           rapidjson:      387.1 i/s - 1.11x  slower
                  oj:      354.8 i/s - 1.21x  slower
```
@casperisfine
Copy link
Author

Not too sure why it helps quite a lot on activitypub, but less on twitter and almost not on ctim_catalog.

@casperisfine
Copy link
Author

Not too sure why it helps quite a lot on activitypub, but less on twitter and almost not on ctim_catalog.

Alright, after instrumenting:

citm_catalog.json:

  • cache_hit: 25576
  • fastpath: 1027
  • slowpath: 1

twitter.json:

  • cache_hit: 13345
  • fastpath: 4442
  • slowpath: 312

activitypub.json:

  • cache_hit: 878
  • fastpath: 598
  • slowpath: 59

So that explain why they don't all benefit the same from this.

@byroot byroot merged commit d315ac8 into ruby:master Nov 6, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants