Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML fixes #4838

Closed
wants to merge 31 commits into from
Closed

YAML fixes #4838

wants to merge 31 commits into from

Conversation

jwaldrip
Copy link

No description provided.

nil
else
raise YAML::ParseException.new("Expected nil, not #{value}", 0, 0)
end
end

def Bool.new(pull : YAML::PullParser)
pull.read_scalar == "true"
case value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need

case value = pull.read_scalar

here

@akzhan
Copy link
Contributor

akzhan commented Aug 17, 2017

You must add examples to yaml_spec.

value = pull.read_scalar
if value.empty?
case value = pull.read_scalar
when "", "NULL", "Null", "null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary because libyaml recognizes null values and it looks wrong, because it would also recognize a quoted string "null" as nil value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but it doesn't I will have to look more in depth to libyaml again tonight. But I was running into issues where when I inspect the return value of read_scalar I get "null".

@straight-shoota
Copy link
Member

straight-shoota commented Aug 17, 2017

The parsers for Bool and Nil are wrong because they don't check whether it's a plain value or an escaped string: "true" would be recognized as true value, "false" as a boolean. They are broken in the current implementation, too:

require "yaml"
class Foo
  YAML.mapping(bool: Bool, null: Nil)
end

pp Foo.from_yaml(%(bool: true\nnil: null))  # => #<Foo @bool=true, @null=nil>
pp Foo.from_yaml(%(bool: "true"\nnull: "")) # => #<Foo @bool=true, @null=nil>

carc.in

YAML's boolean type only recognizes true and false as valid values (spec) and only null as null value (spec). It's only the Language-Independent Types for YAML which add additional aliases (bool, null). If these aliases should be supported, then ~ would also need to be added as null value.

@jwaldrip
Copy link
Author

@akzhan the details of the yaml spec are cited in the commit messages.

@jwaldrip
Copy link
Author

@straight-shoota ruby implements the language independent types.

$ ruby -r yaml -e "puts YAML.load 'foo: YES'"
{"foo"=>true}

$ ruby -r yaml -e "puts YAML.load 'foo: ~'"
{"foo"=>nil}

@straight-shoota
Copy link
Member

straight-shoota commented Aug 17, 2017

Oh, and YAML.parse needs to respect the plain values as well:

pp YAML.parse(%(bool: "true"\nnull: "")) # => {"bool" => "true", "null" => ""}
pp YAML.parse(%(bool: true\nnil: null)) # => {"bool" => "true", "nil" => "null"}

@RX14
Copy link
Contributor

RX14 commented Aug 17, 2017

There should be specs for all these special cases.

@jwaldrip
Copy link
Author

@straight-shoota The following is wrong

require "yaml"
class FooBar
  YAML.mapping({
    foo: Nil,
    bar: Bool
  })
end

yaml = FooBar.from_yaml <<-yaml
foo: null
bar: true
yaml

puts yaml
$ crystal run ./test.cr
Expected nil, not null (YAML::ParseException)
0x109617295: *CallStack::unwind:Array(Pointer(Void)) at ??
0x109617231: *CallStack#initialize:Array(Pointer(Void)) at ??
0x109617208: *CallStack::new:CallStack at ??
0x109616245: *raise<YAML::ParseException>:NoReturn at ??
0x1096399b9: *Nil::new<YAML::PullParser>:Nil at ??
0x109662adf: *FooBar#initialize<YAML::PullParser>:Bool at ??
0x109662a11: *FooBar::new<YAML::PullParser>:FooBar at ??
0x109662960: *FooBar@Object::from_yaml<String>:FooBar at ??
0x109605d3d: __crystal_main at ??
0x1096160a8: main at ??

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Aug 17, 2017

First and foremost, the current parser relies on libyaml which is a YAML 1.1 parser, not a YAML 1.2 parser. Trying to parse YAML 1.2 on top of a YAML 1.1 may prove problematic, if not impossible.

Now, you can't assume a scalar value that looks like true, false or null to be a Bool or Nil. The value was unquoted by the parser, so it could have been a explicit string. For example we can't assume 11038400529011648716446655440000 to be a (big) integer, because it could be an UUID (without - which is a valid format). You must check the scalar style before trying any type transformation.

It seems libyaml does report the scalar style, but this isn't exposed by the YAML::PullParser#read_scalar wrapper. Maybe we could introduce a YAML::PullParser#style in the same vein as YAML::PullParser#value, so the scalar style could be checked (before reading the scalar). For example the following should parse all YAML 1.2 implicit types and special values, if and only if the scalar has a plain style:

class YAML::PullParser
  def style
    @event.data.scalar.style
  end
end

style, str = parser.style, parser.read_scalar

if style.plain?
  case str
  when "true", "True", "TRUE"
    true
  when "false", "False", "FALSE"
    false
  when "null", "Null", "NULL", "~", ""
    nil
  when ".inf", ".Inf", ".INF", "+.inf", "+.Inf", "+.INF"
    Float64::INFINITY
  when "-.inf", "-.Inf", "-.INF"
    -Float64::INFINITY
  when ".nan", ".NaN", ".NAN"
    Float64::NAN
  else
    str.to_i?(whitespace: false, underscore: false) ||
      str.to_f?(whitespace: false, underscore: false) ||
      str
  end
else
  str
end

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous comment: the scalar style must be plain for any transformation to actually happen, otherwise it was a string.

Furthermore, yes, on, no and off do not represent boolean values as per the YAML 1.2 specification, and are thus invalid.

nil
else
raise YAML::ParseException.new("Expected nil, not #{value}", 0, 0)
end
end

def Bool.new(pull : YAML::PullParser)
pull.read_scalar == "true"
case value = pull.read_scalar
when "true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.3.2 states that only "true", "True" and "TRUE" represent the boolean true value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are from the Language-Independent Types for YAML. see my comment above.

Copy link
Author

@jwaldrip jwaldrip Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota correct, @ysbaddaden also the 1.2 spec is not in libyaml. libyaml is YAML 1.1

case value = pull.read_scalar
when "true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"
true
when "false", "False", "FALSE", "off", "Off", "OFF", "n", "N", "no", "No", "NO"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.3.2 states that only "false", "False" and "FALSE" represent the boolean false value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are from the Language-Independent Types for YAML. see my comment above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota correct, @ysbaddaden also the 1.2 spec is not in libyaml. libyaml is YAML 1.1

@jwaldrip
Copy link
Author

@ysbaddaden made a ton of changes doing just that, I can remove the YAML 1.1 boolean types, I also need to make the inclusions for the other types you mentioned. What do you think of the changes thus far?

@jwaldrip
Copy link
Author

@ysbaddaden I am actually going to push back on the YAML 1.2 compliance. libyaml is only 1.1 compliant, and unless we roll our own parser in the future, I think we need to stay inline with the libyaml dependency.

@jwaldrip jwaldrip force-pushed the yaml-spec branch 2 times, most recently from bee14b7 to 60135da Compare August 20, 2017 01:27
@jwaldrip
Copy link
Author

@ysbaddaden @asterite @RX14

Any other changes I need to make here?

@akzhan
Copy link
Contributor

akzhan commented Aug 20, 2017

@jwaldrip I'm just the stranger but Travis reported:

Using compiled compiler at `.build/crystal'
Error in spec/std_spec.cr:2: while requiring "./std/**"
require "./std/**"
^
Syntax error in spec/std/yaml/from_yaml_spec.cr:223: Multiple assignment count mismatch
    values = "", "NULL", "Null", "null", "~"
    ^
Makefile:108: recipe for target '.build/std_spec' failed

src/yaml.cr Outdated
when ".nan", ".NaN", ".NAN"
"NaN"
else
if value.to_i64?
Copy link
Member

@straight-shoota straight-shoota Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure: Are the number formats accepted by to_i64 and to_f64 identical to the YAML number format?

Copy link
Author

@jwaldrip jwaldrip Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota For the basic types yes, it does not accept the base variants. I was looking into adding that, but would make for a much larger refactor. Ruby doesn't even implement them correctly.

Copy link
Member

@straight-shoota straight-shoota Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about both false positives and false negatives. For starters, whitespace argument should be false (this particularly shouldn't really matter, but whitespaces are not allowed in YAML numbers) and prefix should be true as well as underscore and strict (which is true by default).
YAML int type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you must specify to_i?(whitespace: false, underscore: false) and keep strict and prefix enabled, otherwise integers and floats with leading/suffix spaces or underscores that are valid Crystal numbers would get parsed, even though they're invalid YAML numbers.

src/yaml.cr Outdated
@@ -147,4 +148,28 @@ module YAML
def self.dump(object, io : IO)
object.to_yaml(io)
end

# Determines if the scaler is reserved
def self.reserved_scalar(value : String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method shouldn't return strings but symbols or enum values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this method much. For example it tries to linearize the scalar value (using a non-standard naming: infinity, NaN), but sometimes returns a detected tag (int, float) which means the integer/float number will have to be parsed again.

Copy link
Author

@jwaldrip jwaldrip Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ysbaddaden I am not sure I understand. infinity and NaN are represented with .nan and .inf how do you mean int/float will have to be parsed again?

Copy link
Member

@straight-shoota straight-shoota Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he's referring to the last branch, where you're calling value.to_i64? and value.to_f64? where you're parsing a number just to throw it away and (maybe) parse it again. This is bad.
In most cases, this method is only used to check if the plain scalar has a specific value. This could be implemented directly. It'd also a simplification: In Nil.new you only need to know if it is a representation of nil, there is no need to test if it might be true or a number at this point. String.to_yaml is the only place where the current behaviour - a full check over all possible plain values - is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I would move each simplified case into each Class#from_yaml, then have String#to_yaml implement a complete check for reserved scalar values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ysbaddaden something like this. Although I havent tested it yet, it would fail because not all subclasses support the required arguments for new.

def Class.new(pull : YAML::PullParser)
  return pull.read_scalar if pull.style != LibYAML::ScalarStyle::PLAIN
  location = pull.location
  case value = pull.read_plain_scalar
  when /^(y|Y|yes|Yes|YES|true|True|TRUE|on|On|ON)$/
    true
  when /^(n|N|no|No|NO|false|False|FALSE|off|Off|OFF)$/
    false
  when /^[-+]?0b[0-1_]$/ # base2 int
    new(value, base: 2, underscore: true)
  when /^[-+]?0[0-7_]+$/ # base8 int
    new(value, base: 8, underscore: true)
  when /^[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)?$/ # Base 10 int/float
    new(value, underscore: true)
  when /^[-+]?0x[0-9a-fA-F_]+$/ # base16 int
    new(value, base: 16, underscore: true)
  when /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+\.[0-9_]*$/ # Base 60 int/float
    deg, min, sec = value.split(":")
      .map_with_index { |v| new(v, underscore: true, whitepsace: false) / 60 ** i }
    deg >= 0 ? deg + (min + sec) : deg - (min + sec)
  when /^[-+]?\.(inf|Inf|INF)$/ # (infinity)
    new(value.sub(/^([-+])?\.(inf|Inf|INF)$/, "\\1Infinity"))
  when /^\.(nan|NaN|NAN)$/ # (not a number)
    new("NaN")
  when /^()~|null|Null|NULL)$/
    nil
  else
    raise YAML::ParseException.new("Expected #{self.name}, not #{value}", *location)
  end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?? You should separate the parsing - not make it even worse by using unnecessary regular expressions everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't introduce new regexes into hot paths.

location = pull.location
style = pull.data.scalar.style
value = pull.read_scalar
return value unless style == LibYAML::ScalarStyle::PLAIN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is badly structured. Please refactor this to express the purpose better:

if style == LibYAML::ScalarStyle::PLAIN && YAML.reserved_scalar(value)
  YAML::ParseException.new("Expected string, not #{value}", *location)
end
return value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plain style merely says: plain value without formatting, no quotes, no multiline, ... Even when it looks like a value, it can perfectly be a string. A good example is Shards' SPEC where version: 1.0 is valid and 1.0 will be a string!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ysbaddaden incorrect, 1.0 is reserved and is not a string, it can only be a float. I can give you numerous examples from other languages that have properly implemented yaml. Even ruby treats this accordingly.

$ ruby -r yaml -e 'puts YAML.load("1.0").class'
Float
$ ruby -r yaml -e 'puts YAML.load(%("1.0")).class'
String

Time::Format::ISO_8601_DATE_TIME.parse(pull.read_scalar)
location = pull.location
begin
Time::Format::ISO_8601_DATE_TIME.parse(pull.read_scalar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML timestamp type uses a specific subset of ISO 8601 which requires a custom parser. This doesn't need to be implemented in this PR but I'd recommend putting in a TODO note.

require "yaml"

private class ObjectStub
def self.new(pp : YAML::PullParser)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use a more descriptive variable name, even in such a simple case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, parser is a better name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im going to use pull as thats how its referenced everywhere else.


private class ObjectStub
def self.new(pp : YAML::PullParser)
pp.read_scalar.should eq "hello"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's bad design to have the actual test in a stub object it should be directly in the it spec.

Nil.from_yaml(nil.to_yaml).should eq nil
end

it "should return nil if #{value}" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add delimiters around #{value} to make the empty string recognizable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use .inspect to quote it automatically.

Bool.from_yaml(true.to_yaml).should eq true
end

values = ["true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like adding so many specs for each single plain value. It would probably be better to put all variations together in one spec to have three specs with inner loops of 11 values instead of 33 individual specs which test exactly the same behaviour. This just adds dozens of mostly meaningless specs. I think the different keywords can be safely combined into a single spec.
This applies to the other ocations as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota Done, see changes

Copy link
Contributor

@Fryguy Fryguy Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside to this is that if there is a particular failure on one of the values, but not the other values, I don't think you'll be able to differentiate that in the error message. That is you'll see something like this:

  1) Bool.from_yaml true values should return true if a truthy value
     Failure/Error: Bool.from_yaml(value).should eq true

       expected: true
            got: false

     # ./spec/std/yaml/from_yaml_spec.cr:69

When I've done this in other projects, I've always been forced to append some custom message to the expectation that says "for value #{foo}", so I can tell them apart.

Additionally, if the first value fails, then it never tests the other values, essentially making it fail-fast.

@straight-shoota Thoughts in general to this? Maybe it doesn't apply in this particular case, but I'm curious what you think overall.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy I originally had it where each value would generate its own test @straight-shoota recommended I inline it into a single test, I do agree with @Fryguy though... even though each one would generate its own test if multiple tests fail, you are able to fix them completely without having to run the test each time.

end

it "should parse a double quoted string" do
String.from_yaml("\"hello\"").should eq "hello"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use alternative string delimiters to avoid escaping: %("hello")

end

context "integers" do
it "should raise if an int is passed" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec should not wrap other specs.

end

context "floats" do
it "should raise if an int is passed" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

src/yaml.cr Outdated

# Determines if the scaler is reserved
def self.reserved_scalar(value : String)
case value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would downcase the value to simplify this logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, downcasing would incorrectly recognize nULL as a null value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I just checked the spec and you are right. Ruby seems to implement it wrong:

require "yaml"

YAML.load("[nUll, yEs, 1, 2]") # => [nil, true, 1, 2]

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going a good way, though it's becoming a little complex. My biggest issue is disallowing to parse strings that look like another type, but would still be valid (1.0 can be a plain string).

require "yaml"

private class ObjectStub
def self.new(pp : YAML::PullParser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, parser is a better name.

context "integers" do
it "should raise if an int is passed" do
expect_raises(YAML::ParseException) do
String.from_yaml("1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually String.from_yaml("1") should pass, since a non-quoted 1 can be a plain string.

A string doesn't depend on the scalar style (plain, quoted, ...).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me an example of another language that allows plain scalar values to be strings? I cannot seem to find one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwaldrip Other languages are dynamic. If you say "I want this attribute to be a String" (because of YAML.mapping) and I write 1.2 in it, then it must be considered as the string "1.2". In Ruby this is not possible because there's no such thing as a mapping, 1.2 will always be interpreted as a float.

context "floats" do
it "should raise if an int is passed" do
expect_raises(YAML::ParseException) do
String.from_yaml("1.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: 1.1 can be a plain string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me an example of another language that allows plain scalar values to be strings? I cannot seem to find one.

Copy link
Member

@straight-shoota straight-shoota Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ysbaddaden I'm not sure this is true. Any evidence for this? I couldn't find any specific rules about that, but such an ambiguity would feel strange. Of course, you could always parse any YAML value as a string because they are encoded as strings.

src/yaml.cr Outdated
when ".nan", ".NaN", ".NAN"
"NaN"
else
if value.to_i64?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you must specify to_i?(whitespace: false, underscore: false) and keep strict and prefix enabled, otherwise integers and floats with leading/suffix spaces or underscores that are valid Crystal numbers would get parsed, even though they're invalid YAML numbers.

src/yaml.cr Outdated
@@ -147,4 +148,28 @@ module YAML
def self.dump(object, io : IO)
object.to_yaml(io)
end

# Determines if the scaler is reserved
def self.reserved_scalar(value : String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this method much. For example it tries to linearize the scalar value (using a non-standard naming: infinity, NaN), but sometimes returns a detected tag (int, float) which means the integer/float number will have to be parsed again.

location = pull.location
style = pull.data.scalar.style
value = pull.read_scalar
return value unless style == LibYAML::ScalarStyle::PLAIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plain style merely says: plain value without formatting, no quotes, no multiline, ... Even when it looks like a value, it can perfectly be a string. A good example is Shards' SPEC where version: 1.0 is valid and 1.0 will be a string!

@jwaldrip
Copy link
Author

@ysbaddaden incorrect, 1.0 is reserved and is not a valid string, it can only be a float. I can give you numerous examples from other languages that have properly implemented yaml. Even ruby treats this accordingly.

$ ruby -r yaml -e 'puts YAML.load("1.0").class'
Float
$ ruby -r yaml -e 'puts YAML.load(%("1.0")).class'
String

location = pull.location
begin
value = pull.read_plain_scalar
return NAN if YAML.reserved_scalar(value) == :nan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of three calls to the same method the result should be stored in a variable.

Copy link
Contributor

@ysbaddaden ysbaddaden Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better, as an example of removing reserved_scalar:

case value = pull.read_plain_scalar
when ".nan", ".NaN", ".NAN" then NAN
when ".inf", ".Inf", ".INF", "+.inf", "+.Inf", "+.INF" then INFINITY
when "-.inf", "-.Inf", "-.INF" then -INFINITY
else new(value)
end

Each possible value is detailed here, not hidden elsewhere; no useless checks (for other types); no intermediary values (symbols or strings).

@jwaldrip
Copy link
Author

How can we get back on track here. I don't agree with some of @ysbaddaden's suggestions. Can we get someone like @asterite to weigh in here? The method of matching with regex comes closest to matching the YAML spec.

@straight-shoota
Copy link
Member

The regexes are used to describe the value format, but you don't have to implement it that way. It makes particularly no sense to do this for numbers because there are dedicated parse methods in Crystal. No need for regexes. And you don't need them to match full strings either.

@RX14
Copy link
Contributor

RX14 commented Aug 22, 2017

The fact is that regexes are slow for matching short strings, so we can't use them.

@asterite
Copy link
Member

@jwaldrip If you'd like my opinion, the first or second commit you sent were fine: match the value in each T.from_yaml. I don't think there's a need to check which specific "reserved" scalar it is in each type. For example, there's no point in knowing that the scalar is a :float when trying to parse Nil from YAML.

And no regexes (like you had initially), I did a small benchmark and regexes are slower here (and consume more memory). I don't know what were the problems that led to these many changes.

@straight-shoota
Copy link
Member

@jwaldrip Maybe you misunderstood @ysbaddaden. He suggested to move the checks into each (individual) Class#from_yaml (where Class is to be substituted). You instead put all into Class.from_yaml literally.

src/yaml.cr Outdated
!reserved.nil?
case {value[0]?, value[1]?, value[2]?, value[3]?, value[4]?}
when {.try(&.ascii_number?), .try(&.ascii_number?), .try(&.ascii_number?), .try(&.ascii_number?), '-'}
!!(Time::Format::ISO_8601_DATE_TIME.parse(value) rescue false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis should not be needed here, because false is a Bool anyway

src/yaml.cr Outdated
{'+', '.', .try(&.ascii_number?), _, _}
clean_value = value.gsub('_', "")
!!clean_value.to_f64? || !!clean_value.to_i64?(prefix: true)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should return false if none of the cases matches.

@asterite
Copy link
Member

@jwaldrip I'm pretty sure this review process has become a PITA for you. I'll probably take this over, if that's OK with you.

@jwaldrip
Copy link
Author

I am happy as long as the changes get in. I don't mind making them... it is a learning process. What is left to get this thing approved and merged?

@asterite
Copy link
Member

@jwaldrip I now think YAML.parse should support the core schema by default. But this only changes how YAML.mapping works. So most changes should be done in YAML::PullParser, I think. In any case, don't worry, we could probably merge this like that and later refactor it. But I don't know yet.

@RX14
Copy link
Contributor

RX14 commented Sep 10, 2017

I've thought "just this minor change then i'll merge it" several times now but I keep thinking something more...

@straight-shoota
Copy link
Member

Same for me 😆

@jwaldrip How do you feel about @RX14's suggestion to move the parsing logic into read_next? Are you willing to implement it? Or are there any other considerations about this?

@asterite
Copy link
Member

@straight-shoota I'll tackle this issue myself in small chunks

@straight-shoota
Copy link
Member

Why not let @jwaldrip complete it? As I understand it, he is happy to continue working and he has already put much effort in it.

@asterite
Copy link
Member

@straight-shoota Fine by me, though I can see an infinite number of change requests coming (like in the past history of this PR).

@asterite
Copy link
Member

I think the code is good enough now. One thing I'd like to improve is moving all code related to the YAML core schema to YAML::Schema::Core: read_timestamp and others just don't belong in the pull parser. Same goes for YAML.reserved_value?. And I'm not sure about those advance flags either.

In any case, we can merge this and I can later make those changes. Or you can make them. Or I can build a separate PR based on your code. What do you prefer?

I also still have a big doubt about String.from_yaml. For example this:

# YAML file
version: 1.2

If I parse it from a dynamic language the value will be the float 1.2. But this is because parsing is always done with the core schema. In such cases one can probably do yaml["version"].to_s to allow such things. In Crystal, if we disallow that, one will have to write:

# YAML file
version: "1.2"

and it's uglier to write. And this might actually break existing shards!

One can use YAML::Any in that case, though, and then invoke to_s on it, but maybe it's ugly. Maybe not, I don't know :-)

And we can always update shards to make these cases work, so it might not be a big deal.

@jwaldrip
Copy link
Author

@asterite I think we should merge is as is. As we discussed before, other languages just support the core schema, so that seems like a further enhancement to support more than one schema. The advance would make sense pulled out if all the read_type methods were not in the pull parser. But as you said, I think this is good for now. It is a vast improvement over the current state of the current YAML implementation. Once this is in, we can work on further enhancements.

@jwaldrip
Copy link
Author

jwaldrip commented Sep 17, 2017

@asterite also, just because existing shards were using a method that was not native the core YAML spec, does not mean you should make an exception to implementing it the right way. The language is still in alpha and breaking changes should be expected. Also, shards that have a full semver version should be just fine as version: 1.2.3 would parse as a string, not a float.

asterite
asterite previously approved these changes Sep 17, 2017
@asterite
Copy link
Member

There's still time until the next release so this doesn't have to be rushed. Let's not merge yet. In the meantime I'll try to implement the same but with a different code organization, and maybe support for both failsafe and core schemas. If I fail (or if my code doesn't end up being liked) we can merge this. In any case we'll thank you in the changelog :)

@jwaldrip
Copy link
Author

@asterite Code organization I can understand. Supporting both 1.2 and 1.1 schemas I can understand. But why the need for the failsafe schema? Who would use it. Are we over optimizing by including more than one schema to begin with? Almost every other language with YAML in the standard lib supports just one schema. I think we should strive for parity before iterating further. It just sounds like "gold plating" and "scope creep" to me.

@asterite asterite dismissed their stale review September 18, 2017 19:11

Can be implemented in a better way

@RX14
Copy link
Contributor

RX14 commented Sep 18, 2017

We've all suffered trying to keep track of this PR for too long, we should merge this PR using this PR's approach, and then add support for other schemas later.

Regarding the arguments in this PR regarding number parsing, my opinion is that making 1.2 be accepted as a string if String.from_yaml is used is simply leaking crystal's parser details into how YAML is parsed. In my mind, the pull parser works (conceptually) by reading the next token, deducing it's type, and converting to a crystal value. read_string then simply asserts that the next token is a string, and if so it returns the crystal representation (obviously this doesn't have to be the reality for performance reasons). Making something which can parse as a string or number based on how the parser is used feels wrong and a leaky abstraction. Not to mention incompatible with other languages.

@asterite
Copy link
Member

I don't want to merge this. Please don't merge it.

In the past I've merged PRs just because it seems there was a rush to merge things. This led to code that's hard to understand, refactor and maintain. I'm not saying that this is the case in this PR. I just want to be cautious. There's really no rush.

I'm also preparing a different PR where the full core YAML schema is implemented, is faster than this PR and is better organized in terms of type definitions and methods. We can later review both and choose which one we think is more suitable.

@RX14 My approach does what you suggest. It greatly simplifies the code.

@RX14
Copy link
Contributor

RX14 commented Sep 19, 2017

@asterite sorry, I misread and thought your idea was simply adding support for other schemas based on this PR. If you have a better idea of how to organise the code then we should wait for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants