Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stdio in std/misc/json #739

Merged
merged 10 commits into from
Sep 6, 2023
Merged

Support stdio in std/misc/json #739

merged 10 commits into from
Sep 6, 2023

Conversation

vyzo
Copy link
Collaborator

@vyzo vyzo commented Jul 24, 2023

See #734.
Currently on top of #740; will rebase once it merges to master

Summary of changes:

  • Reworks the json library module to support both ports and stdio
  • Breaking change: the write-json-alist thing is removed; we should never have exposed such an internal detail

TBD:

  • update documentation
  • add some more tests

@vyzo vyzo added this to the Gerbil18 milestone Jul 24, 2023
@vyzo
Copy link
Collaborator Author

vyzo commented Jul 25, 2023

Obligatory benchmarks:

Port

$ /tmp/json-benchmark-static port ~/Downloads/large-file.json
(time (call-with-input-file _file461_ std/text/json/api#read-json))
    0.542142 secs real time
    0.542195 secs cpu time (0.461861 user, 0.080334 system)
    29 collections accounting for 0.197161 secs real time (0.192309 user, 0.004895 system)
    1413609784 bytes allocated
    77822 minor faults
    no major faults
(time (call-with-output-file _tmp465_ (lambda (_g466468_) (std/text/json/api#write-json__% _object463_ _g466468_))))
    0.349361 secs real time
    0.349443 secs cpu time (0.325637 user, 0.023806 system)
    no collections
    195863480 bytes allocated
    6474 minor faults
    no major faults

Buffered String IO

$ /tmp/json-benchmark-static stdio ~/Downloads/large-file.json
(time (std/text/json/json-benchmark#call-with-buffered-file-reader _file451_ std/text/json/api#read-json))
    0.585576 secs real time
    0.585866 secs cpu time (0.502309 user, 0.083557 system)
    29 collections accounting for 0.209739 secs real time (0.194723 user, 0.015119 system)
    1413826552 bytes allocated
    83232 minor faults
    no major faults
(time (std/text/json/json-benchmark#call-with-buffered-file-writer _tmp455_ (lambda (_g456458_) (std/text/json/api#write-json__% _object453_ _g456458_))))
    0.280695 secs real time
    0.280884 secs cpu time (0.277164 user, 0.003720 system)
    no collections
    185637944 bytes allocated
    2574 minor faults
    no major faults

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

Some more benchmarks after tuning the bounds checking, using a very large file (125MB).
The decoder is about 7% slower than Marc's C code, while the encoder is over 20% faster; we are doing something right.

Port

$ /tmp/json-benchmark port ~/Downloads/very-large-file.json
(time (call-with-input-file _file461_ std/text/json/api#read-json))
    2.723470 secs real time
    2.723387 secs cpu time (2.339546 user, 0.383841 system)
    40 collections accounting for 1.049149 secs real time (1.004368 user, 0.044715 system)
    7067819224 bytes allocated
    388484 minor faults
    no major faults
(time (call-with-output-file _tmp465_ (lambda (_g466468_) (std/text/json/api#write-json__% _object463_ _g466468_))))
    1.758368 secs real time
    1.749159 secs cpu time (1.549515 user, 0.199644 system)
    no collections
    963882776 bytes allocated
    30623 minor faults
    no major faults

Buffered String IO

$ /tmp/json-benchmark stdio ~/Downloads/very-large-file.json
(time (std/text/json/json-benchmark#call-with-buffered-file-reader _file451_ std/text/json/api#read-json))
    2.928129 secs real time
    2.928009 secs cpu time (2.531820 user, 0.396189 system)
    40 collections accounting for 1.131799 secs real time (1.092533 user, 0.039235 system)
    7068037288 bytes allocated
    415724 minor faults
    no major faults
(time (std/text/json/json-benchmark#call-with-buffered-file-writer _tmp455_ (lambda (_g456458_) (std/text/json/api#write-json__% _object453_ _g456458_))))
    1.410870 secs real time
    1.410811 secs cpu time (1.334912 user, 0.075899 system)
    no collections
    961611384 bytes allocated
    16814 minor faults
    no major faults

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

Optimistically unrolling the loop for mostly ascii decoding brings it within about 5%:

$ /tmp/json-benchmark stdio ~/Downloads/very-large-file.json
(time (std/text/json/json-benchmark#call-with-buffered-file-reader _file451_ std/text/json/api#read-json))
    2.877477 secs real time
    2.877508 secs cpu time (2.505714 user, 0.371794 system)
    40 collections accounting for 1.151969 secs real time (1.137607 user, 0.014381 system)
    7068048552 bytes allocated
    415748 minor faults
    no major faults
(time (std/text/json/json-benchmark#call-with-buffered-file-writer _tmp455_ (lambda (_g456458_) (std/text/json/api#write-json__% _object453_ _g456458_))))
    1.382397 secs real time
    1.366885 secs cpu time (1.298993 user, 0.067892 system)
    no collections
    959151976 bytes allocated
    16513 minor faults
    no major faults

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

Interestingly, in the smaller file where there is less time spent in gc, the string buffered reader is toe to toe (within 1%) with the port reader:

$ /tmp/json-benchmark stdio ~/Downloads/large-file.json
(time (std/text/json/json-benchmark#call-with-buffered-file-reader _file451_ std/text/json/api#read-json))
    0.556950 secs real time
    0.556928 secs cpu time (0.456605 user, 0.100323 system)
    29 collections accounting for 0.205059 secs real time (0.201356 user, 0.003701 system)
    1413822296 bytes allocated
    83230 minor faults
    no major faults
(time (std/text/json/json-benchmark#call-with-buffered-file-writer _tmp455_ (lambda (_g456458_) (std/text/json/api#write-json__% _object453_ _g456458_))))
    0.294428 secs real time
    0.278578 secs cpu time (0.262919 user, 0.015659 system)
    no collections
    197908600 bytes allocated
    4086 minor faults
    no major faults

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

I think I am done optimizing the beast; it's fast enough now.

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

In both the small and large files, the time differential in the decoder is about the same as the differential in gc time.

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

I am thinking that I will break this in two prs, once for strio and one for the json codecs; much easier to follow and reason about.

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 26, 2023

I'll pile up some improvements I am planning for the bio interface as well; I will cherry pick those and pull to #728.

@vyzo vyzo mentioned this pull request Jul 26, 2023
2 tasks
@vyzo vyzo changed the base branch from stdio to stdio-strio July 26, 2023 20:41
@vyzo vyzo requested review from fare, ober and drewc July 26, 2023 20:43
@vyzo vyzo force-pushed the json-stdio branch 2 times, most recently from deda067 to 6439acd Compare July 27, 2023 07:13
@vyzo vyzo force-pushed the json-stdio branch 2 times, most recently from c42dadf to 69358d6 Compare July 27, 2023 07:25
@vyzo vyzo force-pushed the stdio-strio branch 2 times, most recently from 0bf7c27 to 9d7bd68 Compare July 27, 2023 12:11
@vyzo
Copy link
Collaborator Author

vyzo commented Jul 27, 2023

We have a new entrant, the bio buffered reader/writer, which is the fastest of all for json parsing.

$ /tmp/json-benchmark bio ~/Downloads/large-file.json
(time (std/text/json/json-benchmark#call-with-buffered-reader _file474_ std/text/json/api#read-json))
    0.514235 secs real time
    0.514194 secs cpu time (0.447235 user, 0.066959 system)
    29 collections accounting for 0.190660 secs real time (0.189526 user, 0.001124 system)
    1413630800 bytes allocated
    78063 minor faults
    no major faults
(time (std/text/json/json-benchmark#call-with-buffered-writer _tmp478_ (lambda (_g479481_) (std/text/json/api#write-json__% _object476_ _g479481_))))
    0.268870 secs real time
    0.268842 secs cpu time (0.244994 user, 0.023848 system)
    no collections
    194400440 bytes allocated
    6176 minor faults
    no major faults

@vyzo
Copy link
Collaborator Author

vyzo commented Jul 27, 2023

I think we got a winner here.

@fare
Copy link
Collaborator

fare commented Aug 7, 2023

Can you complete this PR?

@vyzo
Copy link
Collaborator Author

vyzo commented Aug 7, 2023

yes, soon; the base needs a bit of work.

@vyzo vyzo force-pushed the json-stdio branch 2 times, most recently from ccc867f to 213ea16 Compare August 21, 2023 16:00
@vyzo
Copy link
Collaborator Author

vyzo commented Aug 21, 2023

rebased.

@vyzo vyzo marked this pull request as ready for review August 21, 2023 16:38
@vyzo
Copy link
Collaborator Author

vyzo commented Aug 21, 2023

This is ready.

@vyzo vyzo force-pushed the json-stdio branch 3 times, most recently from f177a95 to 17393eb Compare September 4, 2023 17:05
Base automatically changed from stdio-strio to master September 6, 2023 17:51
@vyzo vyzo merged commit 4700826 into master Sep 6, 2023
@vyzo vyzo deleted the json-stdio branch September 6, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants