-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YAML revamp #5007
YAML revamp #5007
Conversation
I couldn't find any of the benchmarks in that file or as part of the commit message. Can you add that to the PR? Thank you. |
@luislavena I hit "submit" before time. I'm still editing the description :) |
I updated the description. This is now ready to review 😄 |
src/yaml/any.cr
Outdated
# | ||
# Raises if the underlying value is not an `Array`. | ||
def [](index : Int) : YAML::Any | ||
# Raises if the underlying value is not an `Array` nor a `Has. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] nor a `Has.
->
[...] nor a
Hash
.
src/yaml/any.cr
Outdated
else | ||
raise "Expected Hash for #[](key : String), not #{object.class}" | ||
raise "Expected Array or hash for #[](index_or_key), not #{object.class}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash -> Hash
src/yaml/from_yaml.cr
Outdated
{% end %} | ||
{% end %} | ||
|
||
pull.raise("error deserailizing alias") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize exception message?
src/yaml/pull_parser.cr
Outdated
end | ||
end | ||
|
||
raise("error deserailizing alias") if raise_on_alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize exception message?
@@ -0,0 +1,36 @@ | |||
module YAML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not leave 'em in LibYAML
and use alias
instead of moving all the enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because API docs don't show for lib
types.
src/yaml/mapping.cr
Outdated
return obj | ||
end | ||
|
||
instance = allocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't instance
be a fresh variable? Similarly obj
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. There's no risk of clashing other variables.
src/yaml/mapping.cr
Outdated
instance | ||
end | ||
|
||
def initialize(%pull : ::YAML::PullParser, _dummy : Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of _dummy
argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just from description:
However, right now the compiler doesn't allow both a new and an initialize with the same argument "overloadness": the initialize wins and the new is hidden. This might be fixed later. For now, to make them different overloads, I added a _dummy argument at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair 'nuff, could we have TODO
or FIXME
comment then to keep those edgy cases at bay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/yaml/pull_parser.cr
Outdated
|
||
private def record_anchor(object_id, crystal_type_id) | ||
anchor = self.anchor | ||
return unless anchor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simplify it to:
return unless anchor = self.anchor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it harder to understand like that.
src/yaml/any.cr
Outdated
value ? Any.new(value) : nil | ||
else | ||
raise "Expected Hash for #[]?(key : String), not #{object.class}" | ||
raise "Expected Array for #[]?(index : Int), not #{object.class}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected Array for #[]?(index : Int), not #{object.class}
->
Expected Array or Hash for #[]?(index_or_key), not #{object.class}
src/yaml/schema/core.cr
Outdated
# ``` | ||
def self.parse_scalar(string : String) : Type | ||
case string | ||
when .empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this with the other null checks?
when "~", "null", "Null", "NULL", .empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is to make it more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I totally forgot I could write it like that. I'll change it :-)
Great job @asterite. The code reorganization looks good. More than anything, I am just happy to see these fixes get in. Was support added for |
@jwaldrip Do you know what's the difference between the YAML 1.1 and 1.2 specs? Maybe the regexes for the core type? I can see here that booleans are just variants of I think I like version 1.2 better because it's simpler. It's a bit strange to deserialize "yes" as But I don't know if that's the difference. Is there anything else? (in any case, you are totally right that we should mention which version we support) About the tags:
|
To a bit a more to the above: I'd really prefer to implement 1.2, making "y", "yes", "on", etc., be booleans is super strange, given that JSON is so popular and However, YAML 1.2 doesn't support timestamp, right? So we shouldn't deserialize Time at all? |
Actually, we are using I'll soon add support for all types defines here. I also checked pyyaml (which uses libyaml) and they also implicitly parse times like in this PR and like in Ruby's Psych implementation, so I guess it's safe if we do the same thing. |
@asterite, Even that Cheers. If I can chime in, I would definitely like Thank you for working on this! ❤️ ❤️ ❤️ |
Also |
I actually tried to make |
@asterite, array of single element hashes makes sense. Did you happen to implement |
@jwaldrip Yes, it decodes to |
Hmmm... I managed to implement everything (not pushed yet) except merge. Well, merge actually works with I'll explain the difficulty. For example this: development: &development
host: foo.com
port: 1234
test:
<<: *development
port: 4567 With this: require "yaml"
class HTTPSetting
YAML.mapping(name: String, port: Int32)
end We want to The way it currently works is that when So I'm thinking the only solution to this is to parse YAML first to an intermediate structure, and This implies a medium-sized refactor, and of course it's a breaking change. It will also slow down parsing a bit because of the intermediate structure, but I don't think it will get that bad. But it's better to be feature complete than fast but incomplete. That is, unless someone has a better idea of how to solve this issue :-) |
src/yaml/any.cr
Outdated
value ? Any.new(value) : nil | ||
else | ||
raise "Expected Hash for #[]?(key : String), not #{object.class}" | ||
raise "Expected Array or Hash for #[]?(index : Int), not #{object.class}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument declaration should be index_or_int
instead of index : Int
.
src/yaml/pull_parser.cr
Outdated
def read_null_or | ||
if kind == EventKind::SCALAR && (value = self.value).nil? || (value && value.empty?) | ||
if kind == EventKind::SCALAR && scalar_style.plain? && value.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if kind.scalar? && ...
?
src/yaml/pull_parser.cr
Outdated
expect_kind expected_kind | ||
read_next | ||
end | ||
|
||
def read_raw | ||
case kind | ||
when EventKind::SCALAR | ||
when .scalar? | ||
self.value.not_nil!.tap { read_next } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.not_nil!
call here is no longer needed.
src/yaml/pull_parser.cr
Outdated
@@ -188,25 +320,25 @@ class YAML::PullParser | |||
|
|||
def read_raw(io) | |||
case kind | |||
when EventKind::SCALAR | |||
when .scalar? | |||
self.value.not_nil!.inspect(io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@asterite surely we only have to store the YAML structures which are tagged? |
So... this PR is now really big, but all the changes are needed to correctly (Conclusion: YAML is hard, but also fun ^_^) At this point, I think it's better to browse the implementation, or the specs, Parsing and dumping YAML is now done in a completely different way. For parsing (with For parsing with For dumping, we have objects implement By the way, this is also how Ruby does it, probably because of the same This should be slightly slower than before, and a bit more memory will be One of the features this refactor supports is merging with "<<". For example, require "yaml"
class Setting
YAML.mapping(host: String, port: Int32)
end
yaml = <<-YAML
development: &development
host: foo.com
port: 1234
test:
<<: *development
port: 4567
YAML
settings = Hash(String, Setting).from_yaml(yaml)
p settings
# =>
# {"development" => #<Setting:0x106f7fd80 @host="foo.com", @port=1234>,
# "test" => #<Setting:0x106f7fd20 @host="foo.com", @port=4567>} This works because when the implementation of Another advantage is that when deserializing a union of types there's Finally, here's the updated benchmark result:
The previous one was:
We can see it's a bit slower, but still around 9~10 times faster than |
def Union.new(ctx : YAML::ParseContext, node : YAML::Nodes::Node) | ||
if node.is_a?(YAML::Nodes::Alias) | ||
{% for type in T %} | ||
{% if type < ::Reference %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe this trick? looks like you want !type.is_a?(Reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means "if the type inherits from Reference". If so, we can check for an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like Crystal need explicit method named .reference?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but note that this only works in macros. In runtime this isn't possible (yet)
Now fixes #2873 too :-) |
- Add support for YAML core schema - Add support for possibly other schemas via the abstract YAML::Parser class - Add missing arguments to YAML::Builder (tag, anchor, scalar) - Add support for parsing and generating recursive data structures - Add support for merge (<<) - Move LibYAML enums to the YAML namespace
I decided to remove |
LGTM |
LGTM |
Looks really great, thanks @asterite and @jwaldrip for the hard work! Just as a note: We should consider applying an independent YAML test suite like yaml/yaml-test-suite (though it seems to be incomplete) to make sure that the implementation meets the standard. This doesn't need to be in this PR, obviously. |
🕚 to 🚢 |
@asterite @bcardiff @mverzilli merging this before the weekend will be a nice way to end the week! 😄 |
raise YAML::ParseException.new(ex.message.not_nil!, *location) | ||
end | ||
def {{type.id}}.new(ctx : YAML::ParseContext, node : YAML::Nodes::Node) | ||
{{type.id}}.new parse_scalar(ctx, node, Int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asterite I don't follow how/if UInt64
is supported. The call to parse_scalar uses Int64
to check if the parsed type matches, but UInt64 won't fit.
require "yaml"
typeof(YAML::Schema::Core.parse_scalar(UInt64::MAX.to_s)) # => Bool | Float64 | Int64 | String | Time | Nil
YAML::Schema::Core.parse_scalar(UInt64::MAX.to_s).class # => Float64
Note that
it_parses_scalar UInt64::MAX.to_s, UInt64::MAX
passes since UInt64::MAX == UInt64::MAX.to_f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's supported if you do UInt64.from_yaml(...)
, but parse_scalar
only returns one of the Type
values, so just Int64
is supported (I don't know if it overflows or it just returns a string in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns a Float64 now :-)
def to_yaml(yaml : YAML::Builder) | ||
yaml.scalar Time::Format::ISO_8601_DATE_TIME.format(self) | ||
def to_yaml(yaml : YAML::Nodes::Builder) | ||
if kind.utc? || kind.unspecified? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be encapsulated in a format somehow? It seems pretty useful :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! We can probably do that later. Also, right now just 3 decimals are shown but after we merge the PR that makes Time hold nanoseconds we can show all 9 digits.
The build failed: https://travis-ci.org/crystal-lang/crystal/jobs/282301194 Maybe some conflict between this PR and #5022? |
This is a big PR so I'll explain everything in detail.
What's wrong with the current YAML implementation?
Several things:
YAML.parse
,YAML.dump
andYAML.mapping
only support the failsafe schema, when the core schema is used by default in every other programming language. The core schema supports many things, for example "null" will parse tonil
, "0b01" will parse as a binary integer, and "2010-01-02" will parse as a time.YAML.mapping
, and dumping viaYAML.dump
/#to_yaml
fails on recursive types: anchors and aliases are simply ignored.YAML::Builder
to adjust the style, anchors and tags of sequences, mappings and scalars.This PR fixes all of that.
I'll explain what changed in each of the files in this PR, but first I'll give a general overview.
General overview
YAML.parse
now delegates toYAML::Schema::Core::Parser.parse
. The idea is that all logic related to the core schema is underYAML::Schema::Core
. This letsYAML::PullParser
keep its simplicity.YAML::Builder
andYAML::PullParser
can track anchors, aliases and recursive data structures. At first I thought about making a layer on top of these types. However, anchors, aliases and recursive types are a core part of the YAML specification, and an implementation can't simply ignore that. That's why parsing and emitting YAML must take that into account.File by file explanation
You can use this explanation as you traverse the diff (same order as there).
spec/std/yaml/any_spec.cr
YAML::Any
now wraps more types, for exampleInt64
,Float64
and time, so methods to access these values need to be tested.Additionally, YAML mappings (
Hash
in Crystal) can have complex keys, not just strings. That's why we should be able to index with1
into aHash
.A few specs changed because what previously returned a String now returns an Int.
spec/std/yaml/builder_spec.cr
YAML::Builder
can now be specified the styles, anchors and tags of mappings, sequences and scalars, and this is tested.spec/std/yaml/mapping_spec.cr
YAML.mapping
can now serialize and deserialize recursive structures, and this is testedspec/std/yaml/schema/core_spec.cr
Here we test parsing scalars into the different types (
Nil
,Bool
,Int64
,Float64
,Time
, etc.) according to the core schema. The nice thing about this is that this can almost be tested independently of the other YAML classes and structs.spec/std/yaml/serialization_spec.cr
#to_yaml
andYAML.dump
now use the core schema by default, so some variants of the different types (likenull
andNull
fornil
) are tested. Not every variant is tested because this is already covered incore_spec.cr
.Additionally we test that recursive arrays and hashes can be serialized.
spec/std/yaml/yaml_spec.cr
Some tests changed because of the core schema. I didn't add more specs to test that, for example, "~" parses into
nil
because that is covered incore_spec.cr
. If we assume the parsing delegates to the core schema parser it's a bit redundant to test this twice.src/yaml.cr
Clarify in the docs that the core schema is used, and change
YAML::Type
accordingly.src/yaml/any.cr
See the explanation for
any_spec.cr
src/yaml/builder.cr
See the explanation for
builder_spec.cr
.Additionally, the builder now tracks info to be able to dump recursive data. This is all explained in the comments.
If you implement
to_yaml
for aReference
type you need to useregister
when needed. However,to_yaml
is implemented for the basic Crystal types, andYAML.mapping
provides a correct implementation, so needing to learn this should be pretty rare (but it's still explained).src/yaml/enums.cr
I moved the enums from
LibYAML
into theYAML
namespace, so that wan can more easily use them without needing to know aboutLibYAML
.src/yaml/from_yaml.cr
Implementation of
from_yaml
for many Crystal types.Most of this is done with the helper method
parse_scalar
in that same file. We simply parse a scalar according to the core schema from the pull parser, and if it's of the type we were trying to deserialize, we return it. Otherwise it's an error.This might seem a bit slow, to parse a scalar for example to deserialize
nil
. However,parse_scalar
doesn't allocate memory, so it's fast. I include some benchmarks at the end.Some other changes include tracking of types to allow recursiveness.
src/yaml/lib_yaml.cr
Changed because the enums were moved to the
YAML
schema.src/yaml/mapping.cr
Changed to support recursion.
To implement this is a bit tricky. First, we can't implement this just with
initialize
because we want to return a different object if we find it in the anchors. So, we need to define anew
method. However, right now the compiler doesn't allow both anew
and aninitialize
with the same argument "overloadness": theinitialize
wins and thenew
is hidden. This might be fixed later. For now, to make them different overloads, I added a_dummy
argument at the end.src/yaml/parser.cr
Now the Parser is an abstract class to possibly support more schemas. Reads the docs to understand how. There are also two implementations with this: failsafe and core.
According to YAML's spec there are three recommended schemas:
For Failsafe the spec says:
That's why I see no reason not to support this, specially since it's so easy with how
YAML::Parser
was refactored.Ideally I'd like to support
JSON
schema too, shouldn't be that hard, except I don't understand well the spec 😊src/yaml/pull_parser.cr
This type had a few changes, some breaking changes too:
anchor
).foo?
instead of the long enum name when possibleTo support recursive types I use a a bit of low-level knowledge of Crystal, mainly
object_id
andcrystal_type_id
.src/yaml/schema/core.cr
This implements all the logic related to the core schema. Provides a few public methods, and a parser.
To parse a scalar no Regex is used. And no memory is allocated. It's fast. I'll show a benchmark in the end.
src/yaml/schema/core/time_parser.cr
Specialized time parser. See the docs for why I didn't use
Time.parse
.src/yaml/schema/fail_safe.cr
Failsafe implementation. Doesn't use
YAML::Any
. Might be useful to some (for exampleshards
could use this if it previously didn't need the core schema).src/yaml/to_yaml.cr
Updated to support recursive types. Also changed
String
to emit double quoted for reserved scalars in the core schema. And forTime
there's custom logic to use the nicest output format.How is this different than #4838
First, I'd like to thank you, @jwaldrip, because you was the first who decided to start doing something about this missing YAML feature. The code is great, but I personally imagined it organized a bit differently. In the process I looked at your code (I even copied most of
YAML::Any
from it).So, this PR is similar to #4838 but with a few differences:
I'd still like to thank @jwaldrip in the changelog entry because without it, I wouldn't have done anything (no pressure to provide an implementation).
Benchmarks
I used this benchmark:
I get:
We can run almost the same code in Ruby. Make sure to change
parse
toload
becauseparse
just builds a YAML nodes tree. In Ruby I get:So more than 10x faster to dump and parse.
Can we do it faster? Maybe. But for me, if we are already much faster than Ruby it's enough (if this parsing speed is good-enough for Ruby, for us it's more than good-enough)
Fixes #4384
Fixes #1745
Fixes #2873
Fixes #3101
Fixes #4929
Closes #4838