-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.json: Unify stringify and writeStream #16405
Conversation
I am basically completely uninvolved with zig development, BUT I used json a lot and I agree with you. Sometimes when working with C structs it is a desired feature when you output some fields manually via emitNumber/emitString etc and some would be nice to just json.stringify. I would just say that merging objectField and emitXXXXXX into one call is confusing since it is not clearly differentiating (for the user) which one is the field name and which one is the field value. Maybe these should be different. But other than that thumbs up for this one! |
if (self.int.negated) try w.writeAll("-"); | ||
try jsw.emitNumber(self.int.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this was broken, because the old emitNumber()
would render large integers in quotes, resulting in -"12345678901234567890"
. The new code will render it as "-12345678901234567890"
.
The null hypothesis is to implement something with minimal resource usage, and a debug version that requires an allocator and tracks more state to provide debugging tools is extra. So I suggest that the unqualified function name be the one that does not have an allocator parameter, since it's not actually needed. If you can provide assertions without an allocator parameter then you can base it on the optimization mode and not require the user to make a choice. "safe" and "unsafe" are not really the concepts we use in Zig. Instead, the concepts are:
Therefore I think The use case of writing JSON text with a compile-time known upper bound nesting depth, with assertions to prevent mistakes, is regressed in this PR since it requires an allocator now. In summary, there is not really a way to ask for an allocator that is "for safety checks", and these changes are problematic because, as you suspected, there is not a simple way to automatically do the correct thing for each build mode. So very likely, in practice, this change would result in many users generating invalid JSON (via the unchecked function) using zig standard library and thinking zig is a silly language that can't even figure out text processing, and another set of users (using the checked function) would notice that Zig's json processing is slower than its competitors, and think zig is a silly language that can't even have high performance of simple text processing. |
@andrewrk I believed I've addressed all your concerns. The default now does safety checks up to 256 nesting levels (which i believe is the same memory size as an empty The only thing missing perhaps is setting the 256 depth to |
I don't know how to debug the CI failures. I tried attaching a debugger, and it looks like it might be in i'll try bisecting the problem in my code, |
ccee113
to
fd34cbe
Compare
I rebased this on your behalf and then force pushed. If you're getting "segmentation fault" rather than a stack trace with a debug build, it very likely means a stack overflow in the compiler. I'm guessing you added some comptime checks that are triggering #13724. Another clue is
Edit: gdb could not handle the stack overflow but lldb did handle it, and it is indeed stack overflow:
After this (at the top) it repeats until stack overflow. |
If it's any help, the following stripped down example reproduces the same issue: pub fn writeStreamMaxDepth(out_stream: anytype, comptime max_depth: ?usize) WriteStream(
@TypeOf(out_stream),
if (max_depth) |d| .{ .checked_to_fixed_depth = d } else .assumed_correct,
) {
return .{};
}
pub fn WriteStream(
comptime OutStream: type,
comptime safety_checks: union(enum) {
checked_to_arbitrary_depth,
checked_to_fixed_depth: usize,
assumed_correct,
},
) type {
_ = OutStream;
_ = safety_checks;
return struct {};
} The part confusing Autodoc seems to be related to the inferred type of the pub fn writeStreamMaxDepth(out_stream: anytype, comptime max_depth: ?usize) WriteStream(
@TypeOf(out_stream),
@as(SafetyChecks, if (max_depth) |d| .{ .checked_to_fixed_depth = d } else .assumed_correct),
) {
return .{};
}
pub const SafetyChecks = union(enum) {
checked_to_arbitrary_depth,
checked_to_fixed_depth: usize,
assumed_correct,
};
pub fn WriteStream(
comptime OutStream: type,
comptime safety_checks: SafetyChecks,
) type {
_ = OutStream;
_ = safety_checks;
return struct {};
} Unfortunately this is as far as my understanding goes. I don't really understand why the
|
Rebase the branch on latest master, I've fixed the infinite loop in 426e737 |
fd34cbe
to
14abf70
Compare
Thanks for your help @ianprime0509 and @kristoff-it ! ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something about making a safety check null
Yeah it would be idiomatic to avoid safety checks in the unsafe release modes, and of course it is always idiomatic to not store dead memory. However, that is fine to leave as a follow-up enhancement.
Nice work!
I request you please either rebase the commits and clean up the messages, or squash the whole branch upon merging.
I've added the breaking changes summary to my 0.11 std.json upgrade guide here: https://gist.github.com/thejoshwolfe/0642d7f42485d4d632a22ffe05dd6f29 |
Inspired by this comment:
zig/src/Autodoc.zig
Lines 946 to 953 in 3ec3374
and my own experience adding a 5th copypaste of code that looks like this:
I figured it was time to implement
stringify()
usingwriteStream()
, and that's what this PR does. The user-suppliedpub fn jsonStringify
method that plugs into this mechanism now takes ajson.WriteStream
instead of a regularout_stream
andoptions
or whatever. This means the implementation of custom stringifiers gets dramatically simpler:This PR almost entirely rewrites
json.WriteStream
.The new API does away with the finely specified
arrayElem
,emitNumber
,emitString
, etc, and instead you just callobjectField(s)
for object keys, andwrite(x)
for every other value. You still callbeginArray
/endArray
for arrays andbeginObject
/endObject
for objects, and thenwrite(x)
for everything that was previously supported only by thestringify()
entrypoint, such as structs and unions and everything. There is a grammar in theWriteStream
docs that specifies how you must call the functions.This comment is also realized in this PR:
zig/lib/std/json/write_stream.zig
Lines 21 to 22 in 546212f
, although the design has some questionable details. There are two different functions to call depending on whether you want the safety, which is primarily because you only need an allocator for the safe version. Should this instead be implemented by checking the build mode or something? Please advise.
Breaking changes to
StringifyOptions
The structure of
StringifyOptions
is overhauled to be more flat.escape_unicode
moved to the top level (because it applies to object keys, it's still relevant when in strings-as-arrays mode, so it should be accessible outside the.String
string mode.). TODO: rename toensure_ascii
, like the Python option. Since ASCII is a subset of Unicode, the nameescape_unicode
isn't precisely what it's doing. Also TODO: change0x7f
to be considered a non-ascii character, which would slightly change the behavior.escape_solidus
removed. I'm pretty sure the motivation for this option in Zig was because the JSON spec allows it to be escaped, and I'm pretty sure that's because server-generated JSON content inside<script>
tags wanted to escape<\/script>
, which is very far removed from anything that Zig cares about. A proper solution to escaping dangerous web characters would look something a lot more like Go's implementation, which escapes<>&
and some obscure whitespace stuff instead of/
. For now, this can be done as a post-processing pass over the string, and if we want a more performant single-pass solution integrated withstd.json
, we can add that later.string = .Array
replaced by.emit_strings_as_arrays = true
. It's unclear why we even have this option, but I didn't have a compelling reason to remove it.whitespace.indent_level
removed. The internal bookkeeping is now being done with a field inside theWriteStream
instead of a mutable field in the options object. Technically, this removes the ability for a caller to give additional indentation levels, which can easily be implemented again, but I'd like to see a use case for that before we add it back. (And if we do add it back, should it be a slice instead of a number?)whitespace.separator
andwhitespace.indent
combined into.whitespace = .minified
vs.whitespace = .indent_2
etc. I'm guessing that we don't need to support a usecase where we've got indentation but no separator after the:
, or a separator after the:
but no indentation. Seems like you either want minified or pretty printed. Please let me know if there's another use case I'm not considering. We've got 1, 2, 3, 4, and 8 space indentation and 1 tab indentation. Do we need anything else? This is probably all we need, but it's easy to add more.emit_null_optional_fields
is unchanged. ... However, I think we should change the behavior to omit optional fields that are null only when their default value is actually null. That's TODO for another time though.The default whitespace formatting in various apis is now unified to be consistent. Previously: the default
StringifyOptions{}
had minified whitespace, butStringifyOptions{ .whitespace = .{} }
gave 4-space pretty printing, and thenWriteStream
defaulted to using 1-space pretty printing. Now the default everywhere is minified whitespace, and switching to pretty printing is as simple as changing.{}
to.{ .whitespace = .indent_2 }
.