YAML fixes #4838

jwaldrip · 2017-08-17T01:47:44Z

No description provided.

akzhan · 2017-08-17T09:00:34Z

src/yaml/from_yaml.cr

    nil
  else
    raise YAML::ParseException.new("Expected nil, not #{value}", 0, 0)
  end
 end

 def Bool.new(pull : YAML::PullParser)
-  pull.read_scalar == "true"
+  case value


you need

case value = pull.read_scalar

here

akzhan · 2017-08-17T09:02:06Z

You must add examples to yaml_spec.

straight-shoota · 2017-08-17T12:32:21Z

src/yaml/from_yaml.cr

-  value = pull.read_scalar
-  if value.empty?
+  case value = pull.read_scalar
+  when "", "NULL", "Null", "null"


This should not be necessary because libyaml recognizes null values and it looks wrong, because it would also recognize a quoted string "null" as nil value.

It should, but it doesn't I will have to look more in depth to libyaml again tonight. But I was running into issues where when I inspect the return value of read_scalar I get "null".

straight-shoota · 2017-08-17T12:48:17Z

The parsers for Bool and Nil are wrong because they don't check whether it's a plain value or an escaped string: "true" would be recognized as true value, "false" as a boolean. They are broken in the current implementation, too:

require "yaml"
class Foo
  YAML.mapping(bool: Bool, null: Nil)
end

pp Foo.from_yaml(%(bool: true\nnil: null))  # => #<Foo @bool=true, @null=nil>
pp Foo.from_yaml(%(bool: "true"\nnull: "")) # => #<Foo @bool=true, @null=nil>

carc.in

YAML's boolean type only recognizes true and false as valid values (spec) and only null as null value (spec). It's only the Language-Independent Types for YAML which add additional aliases (bool, null). If these aliases should be supported, then ~ would also need to be added as null value.

jwaldrip · 2017-08-17T14:17:19Z

@akzhan the details of the yaml spec are cited in the commit messages.

jwaldrip · 2017-08-17T14:24:52Z

@straight-shoota ruby implements the language independent types.

$ ruby -r yaml -e "puts YAML.load 'foo: YES'"
{"foo"=>true}

$ ruby -r yaml -e "puts YAML.load 'foo: ~'"
{"foo"=>nil}

straight-shoota · 2017-08-17T14:33:54Z

Oh, and YAML.parse needs to respect the plain values as well:

pp YAML.parse(%(bool: "true"\nnull: "")) # => {"bool" => "true", "null" => ""}
pp YAML.parse(%(bool: true\nnil: null)) # => {"bool" => "true", "nil" => "null"}

RX14 · 2017-08-17T15:06:22Z

There should be specs for all these special cases.

jwaldrip · 2017-08-17T15:08:50Z

@straight-shoota The following is wrong

require "yaml"
class FooBar
  YAML.mapping({
    foo: Nil,
    bar: Bool
  })
end

yaml = FooBar.from_yaml <<-yaml
foo: null
bar: true
yaml

puts yaml

$ crystal run ./test.cr
Expected nil, not null (YAML::ParseException)
0x109617295: *CallStack::unwind:Array(Pointer(Void)) at ??
0x109617231: *CallStack#initialize:Array(Pointer(Void)) at ??
0x109617208: *CallStack::new:CallStack at ??
0x109616245: *raise<YAML::ParseException>:NoReturn at ??
0x1096399b9: *Nil::new<YAML::PullParser>:Nil at ??
0x109662adf: *FooBar#initialize<YAML::PullParser>:Bool at ??
0x109662a11: *FooBar::new<YAML::PullParser>:FooBar at ??
0x109662960: *FooBar@Object::from_yaml<String>:FooBar at ??
0x109605d3d: __crystal_main at ??
0x1096160a8: main at ??

ysbaddaden · 2017-08-17T16:13:43Z

First and foremost, the current parser relies on libyaml which is a YAML 1.1 parser, not a YAML 1.2 parser. Trying to parse YAML 1.2 on top of a YAML 1.1 may prove problematic, if not impossible.

Now, you can't assume a scalar value that looks like true, false or null to be a Bool or Nil. The value was unquoted by the parser, so it could have been a explicit string. For example we can't assume 11038400529011648716446655440000 to be a (big) integer, because it could be an UUID (without - which is a valid format). You must check the scalar style before trying any type transformation.

It seems libyaml does report the scalar style, but this isn't exposed by the YAML::PullParser#read_scalar wrapper. Maybe we could introduce a YAML::PullParser#style in the same vein as YAML::PullParser#value, so the scalar style could be checked (before reading the scalar). For example the following should parse all YAML 1.2 implicit types and special values, if and only if the scalar has a plain style:

class YAML::PullParser
  def style
    @event.data.scalar.style
  end
end

style, str = parser.style, parser.read_scalar

if style.plain?
  case str
  when "true", "True", "TRUE"
    true
  when "false", "False", "FALSE"
    false
  when "null", "Null", "NULL", "~", ""
    nil
  when ".inf", ".Inf", ".INF", "+.inf", "+.Inf", "+.INF"
    Float64::INFINITY
  when "-.inf", "-.Inf", "-.INF"
    -Float64::INFINITY
  when ".nan", ".NaN", ".NAN"
    Float64::NAN
  else
    str.to_i?(whitespace: false, underscore: false) ||
      str.to_f?(whitespace: false, underscore: false) ||
      str
  end
else
  str
end

ysbaddaden

As per my previous comment: the scalar style must be plain for any transformation to actually happen, otherwise it was a string.

Furthermore, yes, on, no and off do not represent boolean values as per the YAML 1.2 specification, and are thus invalid.

ysbaddaden · 2017-08-17T16:18:02Z

src/yaml/from_yaml.cr

    nil
  else
    raise YAML::ParseException.new("Expected nil, not #{value}", 0, 0)
  end
 end

 def Bool.new(pull : YAML::PullParser)
-  pull.read_scalar == "true"
+  case value = pull.read_scalar
+  when "true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"


10.3.2 states that only "true", "True" and "TRUE" represent the boolean true value.

These values are from the Language-Independent Types for YAML. see my comment above.

@straight-shoota correct, @ysbaddaden also the 1.2 spec is not in libyaml. libyaml is YAML 1.1

ysbaddaden · 2017-08-17T16:18:16Z

src/yaml/from_yaml.cr

+  case value = pull.read_scalar
+  when "true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"
+    true
+  when "false", "False", "FALSE", "off", "Off", "OFF", "n", "N", "no", "No", "NO"


10.3.2 states that only "false", "False" and "FALSE" represent the boolean false value.

These values are from the Language-Independent Types for YAML. see my comment above.

@straight-shoota correct, @ysbaddaden also the 1.2 spec is not in libyaml. libyaml is YAML 1.1

jwaldrip · 2017-08-19T14:14:30Z

@ysbaddaden made a ton of changes doing just that, I can remove the YAML 1.1 boolean types, I also need to make the inclusions for the other types you mentioned. What do you think of the changes thus far?

jwaldrip · 2017-08-19T14:43:43Z

@ysbaddaden I am actually going to push back on the YAML 1.2 compliance. libyaml is only 1.1 compliant, and unless we roll our own parser in the future, I think we need to stay inline with the libyaml dependency.

jwaldrip · 2017-08-20T01:27:53Z

@ysbaddaden @asterite @RX14

Any other changes I need to make here?

akzhan · 2017-08-20T04:06:23Z

@jwaldrip I'm just the stranger but Travis reported:

Using compiled compiler at `.build/crystal'
Error in spec/std_spec.cr:2: while requiring "./std/**"
require "./std/**"
^
Syntax error in spec/std/yaml/from_yaml_spec.cr:223: Multiple assignment count mismatch
    values = "", "NULL", "Null", "null", "~"
    ^
Makefile:108: recipe for target '.build/std_spec' failed

straight-shoota · 2017-08-20T12:59:52Z

src/yaml.cr

+    when ".nan", ".NaN", ".NAN"
+      "NaN"
+    else
+      if value.to_i64?


I'm not sure: Are the number formats accepted by to_i64 and to_f64 identical to the YAML number format?

@straight-shoota For the basic types yes, it does not accept the base variants. I was looking into adding that, but would make for a much larger refactor. Ruby doesn't even implement them correctly.

I'm concerned about both false positives and false negatives. For starters, whitespace argument should be false (this particularly shouldn't really matter, but whitespaces are not allowed in YAML numbers) and prefix should be true as well as underscore and strict (which is true by default).
YAML int type

Yes, you must specify to_i?(whitespace: false, underscore: false) and keep strict and prefix enabled, otherwise integers and floats with leading/suffix spaces or underscores that are valid Crystal numbers would get parsed, even though they're invalid YAML numbers.

straight-shoota · 2017-08-20T13:01:09Z

src/yaml.cr

@@ -147,4 +148,28 @@ module YAML
  def self.dump(object, io : IO)
    object.to_yaml(io)
  end
+
+  # Determines if the scaler is reserved
+  def self.reserved_scalar(value : String)


This method shouldn't return strings but symbols or enum values.

I'm not sure I like this method much. For example it tries to linearize the scalar value (using a non-standard naming: infinity, NaN), but sometimes returns a detected tag (int, float) which means the integer/float number will have to be parsed again.

@ysbaddaden I am not sure I understand. infinity and NaN are represented with .nan and .inf how do you mean int/float will have to be parsed again?

I think he's referring to the last branch, where you're calling value.to_i64? and value.to_f64? where you're parsing a number just to throw it away and (maybe) parse it again. This is bad.
In most cases, this method is only used to check if the plain scalar has a specific value. This could be implemented directly. It'd also a simplification: In Nil.new you only need to know if it is a representation of nil, there is no need to test if it might be true or a number at this point. String.to_yaml is the only place where the current behaviour - a full check over all possible plain values - is needed.

Exactly. I would move each simplified case into each Class#from_yaml, then have String#to_yaml implement a complete check for reserved scalar values.

@ysbaddaden something like this. Although I havent tested it yet, it would fail because not all subclasses support the required arguments for new.

def Class.new(pull : YAML::PullParser) return pull.read_scalar if pull.style != LibYAML::ScalarStyle::PLAIN location = pull.location case value = pull.read_plain_scalar when /^(y|Y|yes|Yes|YES|true|True|TRUE|on|On|ON)$/ true when /^(n|N|no|No|NO|false|False|FALSE|off|Off|OFF)$/ false when /^[-+]?0b[0-1_]$/ # base2 int new(value, base: 2, underscore: true) when /^[-+]?0[0-7_]+$/ # base8 int new(value, base: 8, underscore: true) when /^[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)?$/ # Base 10 int/float new(value, underscore: true) when /^[-+]?0x[0-9a-fA-F_]+$/ # base16 int new(value, base: 16, underscore: true) when /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+\.[0-9_]*$/ # Base 60 int/float deg, min, sec = value.split(":") .map_with_index { |v| new(v, underscore: true, whitepsace: false) / 60 ** i } deg >= 0 ? deg + (min + sec) : deg - (min + sec) when /^[-+]?\.(inf|Inf|INF)$/ # (infinity) new(value.sub(/^([-+])?\.(inf|Inf|INF)$/, "\\1Infinity")) when /^\.(nan|NaN|NAN)$/ # (not a number) new("NaN") when /^()~|null|Null|NULL)$/ nil else raise YAML::ParseException.new("Expected #{self.name}, not #{value}", *location) end end

What is this?? You should separate the parsing - not make it even worse by using unnecessary regular expressions everywhere.

Please don't introduce new regexes into hot paths.

straight-shoota · 2017-08-20T13:05:18Z

src/yaml/from_yaml.cr

+  location = pull.location
+  style = pull.data.scalar.style
+  value = pull.read_scalar
+  return value unless style == LibYAML::ScalarStyle::PLAIN


This is badly structured. Please refactor this to express the purpose better:

if style == LibYAML::ScalarStyle::PLAIN && YAML.reserved_scalar(value) YAML::ParseException.new("Expected string, not #{value}", *location) end return value

The plain style merely says: plain value without formatting, no quotes, no multiline, ... Even when it looks like a value, it can perfectly be a string. A good example is Shards' SPEC where version: 1.0 is valid and 1.0 will be a string!

@ysbaddaden incorrect, 1.0 is reserved and is not a string, it can only be a float. I can give you numerous examples from other languages that have properly implemented yaml. Even ruby treats this accordingly.

$ ruby -r yaml -e 'puts YAML.load("1.0").class' Float $ ruby -r yaml -e 'puts YAML.load(%("1.0")).class' String

straight-shoota · 2017-08-20T13:10:05Z

src/yaml/from_yaml.cr

-  Time::Format::ISO_8601_DATE_TIME.parse(pull.read_scalar)
+  location = pull.location
+  begin
+    Time::Format::ISO_8601_DATE_TIME.parse(pull.read_scalar)


YAML timestamp type uses a specific subset of ISO 8601 which requires a custom parser. This doesn't need to be implemented in this PR but I'd recommend putting in a TODO note.

straight-shoota · 2017-08-20T13:12:32Z

spec/std/yaml/from_yaml_spec.cr

+require "yaml"
+
+private class ObjectStub
+  def self.new(pp : YAML::PullParser)


I'd recommend to use a more descriptive variable name, even in such a simple case.

Yes, parser is a better name.

Im going to use pull as thats how its referenced everywhere else.

straight-shoota · 2017-08-20T13:14:49Z

spec/std/yaml/from_yaml_spec.cr

+
+private class ObjectStub
+  def self.new(pp : YAML::PullParser)
+    pp.read_scalar.should eq "hello"


I think it's bad design to have the actual test in a stub object it should be directly in the it spec.

straight-shoota · 2017-08-20T13:16:25Z

spec/std/yaml/from_yaml_spec.cr

+      Nil.from_yaml(nil.to_yaml).should eq nil
+    end
+
+    it "should return nil if #{value}" do


Please add delimiters around #{value} to make the empty string recognizable.

Or just use .inspect to quote it automatically.

straight-shoota · 2017-08-20T13:22:31Z

spec/std/yaml/from_yaml_spec.cr

+      Bool.from_yaml(true.to_yaml).should eq true
+    end
+
+    values = ["true", "True", "TRUE", "on", "On", "ON", "y", "Y", "yes", "Yes", "YES"]


I don't like adding so many specs for each single plain value. It would probably be better to put all variations together in one spec to have three specs with inner loops of 11 values instead of 33 individual specs which test exactly the same behaviour. This just adds dozens of mostly meaningless specs. I think the different keywords can be safely combined into a single spec.
This applies to the other ocations as well.

@straight-shoota Done, see changes

The only downside to this is that if there is a particular failure on one of the values, but not the other values, I don't think you'll be able to differentiate that in the error message. That is you'll see something like this:

1) Bool.from_yaml true values should return true if a truthy value Failure/Error: Bool.from_yaml(value).should eq true expected: true got: false # ./spec/std/yaml/from_yaml_spec.cr:69

When I've done this in other projects, I've always been forced to append some custom message to the expectation that says "for value #{foo}", so I can tell them apart.

Additionally, if the first value fails, then it never tests the other values, essentially making it fail-fast.

@straight-shoota Thoughts in general to this? Maybe it doesn't apply in this particular case, but I'm curious what you think overall.

@Fryguy I originally had it where each value would generate its own test @straight-shoota recommended I inline it into a single test, I do agree with @Fryguy though... even though each one would generate its own test if multiple tests fail, you are able to fix them completely without having to run the test each time.

straight-shoota · 2017-08-20T13:24:01Z

spec/std/yaml/from_yaml_spec.cr

+  end
+
+  it "should parse a double quoted string" do
+    String.from_yaml("\"hello\"").should eq "hello"


Better use alternative string delimiters to avoid escaping: %("hello")

straight-shoota · 2017-08-20T13:26:18Z

spec/std/yaml/from_yaml_spec.cr

+  end
+
+  context "integers" do
+    it "should raise if an int is passed" do


This spec should not wrap other specs.

straight-shoota · 2017-08-20T13:26:40Z

spec/std/yaml/from_yaml_spec.cr

+  end
+
+  context "floats" do
+    it "should raise if an int is passed" do


asterite · 2017-08-20T13:29:58Z

src/yaml.cr

+
+  # Determines if the scaler is reserved
+  def self.reserved_scalar(value : String)
+    case value


I would downcase the value to simplify this logic

No, downcasing would incorrectly recognize nULL as a null value.

Ah, OK. I just checked the spec and you are right. Ruby seems to implement it wrong:

require "yaml" YAML.load("[nUll, yEs, 1, 2]") # => [nil, true, 1, 2]

ysbaddaden

I think this is going a good way, though it's becoming a little complex. My biggest issue is disallowing to parse strings that look like another type, but would still be valid (1.0 can be a plain string).

ysbaddaden · 2017-08-20T16:32:37Z

spec/std/yaml/from_yaml_spec.cr

+require "yaml"
+
+private class ObjectStub
+  def self.new(pp : YAML::PullParser)


Yes, parser is a better name.

ysbaddaden · 2017-08-20T16:34:33Z

spec/std/yaml/from_yaml_spec.cr

+  context "integers" do
+    it "should raise if an int is passed" do
+      expect_raises(YAML::ParseException) do
+        String.from_yaml("1")


Actually String.from_yaml("1") should pass, since a non-quoted 1 can be a plain string.

A string doesn't depend on the scalar style (plain, quoted, ...).

Can you give me an example of another language that allows plain scalar values to be strings? I cannot seem to find one.

@jwaldrip Other languages are dynamic. If you say "I want this attribute to be a String" (because of YAML.mapping) and I write 1.2 in it, then it must be considered as the string "1.2". In Ruby this is not possible because there's no such thing as a mapping, 1.2 will always be interpreted as a float.

ysbaddaden · 2017-08-20T16:35:02Z

spec/std/yaml/from_yaml_spec.cr

+  context "floats" do
+    it "should raise if an int is passed" do
+      expect_raises(YAML::ParseException) do
+        String.from_yaml("1.1")


Same here: 1.1 can be a plain string.

Can you give me an example of another language that allows plain scalar values to be strings? I cannot seem to find one.

@ysbaddaden I'm not sure this is true. Any evidence for this? I couldn't find any specific rules about that, but such an ambiguity would feel strange. Of course, you could always parse any YAML value as a string because they are encoded as strings.

ysbaddaden · 2017-08-20T16:40:27Z

src/yaml.cr

+    when ".nan", ".NaN", ".NAN"
+      "NaN"
+    else
+      if value.to_i64?


Yes, you must specify to_i?(whitespace: false, underscore: false) and keep strict and prefix enabled, otherwise integers and floats with leading/suffix spaces or underscores that are valid Crystal numbers would get parsed, even though they're invalid YAML numbers.

ysbaddaden · 2017-08-20T16:42:58Z

src/yaml.cr

@@ -147,4 +148,28 @@ module YAML
  def self.dump(object, io : IO)
    object.to_yaml(io)
  end
+
+  # Determines if the scaler is reserved
+  def self.reserved_scalar(value : String)


I'm not sure I like this method much. For example it tries to linearize the scalar value (using a non-standard naming: infinity, NaN), but sometimes returns a detected tag (int, float) which means the integer/float number will have to be parsed again.

ysbaddaden · 2017-08-20T16:49:23Z

src/yaml/from_yaml.cr

+  location = pull.location
+  style = pull.data.scalar.style
+  value = pull.read_scalar
+  return value unless style == LibYAML::ScalarStyle::PLAIN


The plain style merely says: plain value without formatting, no quotes, no multiline, ... Even when it looks like a value, it can perfectly be a string. A good example is Shards' SPEC where version: 1.0 is valid and 1.0 will be a string!

jwaldrip · 2017-08-20T17:40:42Z

@ysbaddaden incorrect, 1.0 is reserved and is not a valid string, it can only be a float. I can give you numerous examples from other languages that have properly implemented yaml. Even ruby treats this accordingly.

$ ruby -r yaml -e 'puts YAML.load("1.0").class'
Float
$ ruby -r yaml -e 'puts YAML.load(%("1.0")).class'
String

straight-shoota · 2017-08-20T17:51:33Z

src/yaml/from_yaml.cr

+    location = pull.location
+    begin
+      value = pull.read_plain_scalar
+      return NAN if YAML.reserved_scalar(value) == :nan


Instead of three calls to the same method the result should be stored in a variable.

Or better, as an example of removing reserved_scalar:

case value = pull.read_plain_scalar when ".nan", ".NaN", ".NAN" then NAN when ".inf", ".Inf", ".INF", "+.inf", "+.Inf", "+.INF" then INFINITY when "-.inf", "-.Inf", "-.INF" then -INFINITY else new(value) end

Each possible value is detailed here, not hidden elsewhere; no useless checks (for other types); no intermediary values (symbols or strings).

jwaldrip · 2017-08-22T02:49:06Z

How can we get back on track here. I don't agree with some of @ysbaddaden's suggestions. Can we get someone like @asterite to weigh in here? The method of matching with regex comes closest to matching the YAML spec.

straight-shoota · 2017-08-22T05:14:51Z

The regexes are used to describe the value format, but you don't have to implement it that way. It makes particularly no sense to do this for numbers because there are dedicated parse methods in Crystal. No need for regexes. And you don't need them to match full strings either.

RX14 · 2017-08-22T09:58:04Z

The fact is that regexes are slow for matching short strings, so we can't use them.

asterite · 2017-08-22T10:43:40Z

@jwaldrip If you'd like my opinion, the first or second commit you sent were fine: match the value in each T.from_yaml. I don't think there's a need to check which specific "reserved" scalar it is in each type. For example, there's no point in knowing that the scalar is a :float when trying to parse Nil from YAML.

And no regexes (like you had initially), I did a small benchmark and regexes are slower here (and consume more memory). I don't know what were the problems that led to these many changes.

straight-shoota · 2017-08-22T11:19:19Z

@jwaldrip Maybe you misunderstood @ysbaddaden. He suggested to move the checks into each (individual) Class#from_yaml (where Class is to be substituted). You instead put all into Class.from_yaml literally.

straight-shoota · 2017-09-10T09:25:55Z

src/yaml.cr

-    !reserved.nil?
+    case {value[0]?, value[1]?, value[2]?, value[3]?, value[4]?}
+    when {.try(&.ascii_number?), .try(&.ascii_number?), .try(&.ascii_number?), .try(&.ascii_number?), '-'}
+      !!(Time::Format::ISO_8601_DATE_TIME.parse(value) rescue false)


Parenthesis should not be needed here, because false is a Bool anyway

straight-shoota · 2017-09-10T09:27:01Z

src/yaml.cr

+         {'+', '.', .try(&.ascii_number?), _, _}
+      clean_value = value.gsub('_', "")
+      !!clean_value.to_f64? || !!clean_value.to_i64?(prefix: true)
+    end


This method should return false if none of the cases matches.

… yaml-spec

asterite · 2017-09-10T18:45:25Z

@jwaldrip I'm pretty sure this review process has become a PITA for you. I'll probably take this over, if that's OK with you.

jwaldrip · 2017-09-10T19:40:28Z

I am happy as long as the changes get in. I don't mind making them... it is a learning process. What is left to get this thing approved and merged?

asterite · 2017-09-10T20:01:36Z

@jwaldrip I now think YAML.parse should support the core schema by default. But this only changes how YAML.mapping works. So most changes should be done in YAML::PullParser, I think. In any case, don't worry, we could probably merge this like that and later refactor it. But I don't know yet.

RX14 · 2017-09-10T22:40:32Z

I've thought "just this minor change then i'll merge it" several times now but I keep thinking something more...

straight-shoota · 2017-09-11T08:25:39Z

Same for me 😆

@jwaldrip How do you feel about @RX14's suggestion to move the parsing logic into read_next? Are you willing to implement it? Or are there any other considerations about this?

asterite · 2017-09-11T11:50:13Z

@straight-shoota I'll tackle this issue myself in small chunks

straight-shoota · 2017-09-11T12:13:18Z

Why not let @jwaldrip complete it? As I understand it, he is happy to continue working and he has already put much effort in it.

asterite · 2017-09-11T12:44:50Z

@straight-shoota Fine by me, though I can see an infinite number of change requests coming (like in the past history of this PR).

… yaml-spec

asterite · 2017-09-17T12:25:55Z

I think the code is good enough now. One thing I'd like to improve is moving all code related to the YAML core schema to YAML::Schema::Core: read_timestamp and others just don't belong in the pull parser. Same goes for YAML.reserved_value?. And I'm not sure about those advance flags either.

In any case, we can merge this and I can later make those changes. Or you can make them. Or I can build a separate PR based on your code. What do you prefer?

I also still have a big doubt about String.from_yaml. For example this:

# YAML file
version: 1.2

If I parse it from a dynamic language the value will be the float 1.2. But this is because parsing is always done with the core schema. In such cases one can probably do yaml["version"].to_s to allow such things. In Crystal, if we disallow that, one will have to write:

# YAML file
version: "1.2"

and it's uglier to write. And this might actually break existing shards!

One can use YAML::Any in that case, though, and then invoke to_s on it, but maybe it's ugly. Maybe not, I don't know :-)

And we can always update shards to make these cases work, so it might not be a big deal.

jwaldrip · 2017-09-17T22:20:23Z

@asterite I think we should merge is as is. As we discussed before, other languages just support the core schema, so that seems like a further enhancement to support more than one schema. The advance would make sense pulled out if all the read_type methods were not in the pull parser. But as you said, I think this is good for now. It is a vast improvement over the current state of the current YAML implementation. Once this is in, we can work on further enhancements.

jwaldrip · 2017-09-17T22:22:13Z

@asterite also, just because existing shards were using a method that was not native the core YAML spec, does not mean you should make an exception to implementing it the right way. The language is still in alpha and breaking changes should be expected. Also, shards that have a full semver version should be just fine as version: 1.2.3 would parse as a string, not a float.

asterite · 2017-09-18T10:48:30Z

There's still time until the next release so this doesn't have to be rushed. Let's not merge yet. In the meantime I'll try to implement the same but with a different code organization, and maybe support for both failsafe and core schemas. If I fail (or if my code doesn't end up being liked) we can merge this. In any case we'll thank you in the changelog :)

jwaldrip · 2017-09-18T15:02:21Z

@asterite Code organization I can understand. Supporting both 1.2 and 1.1 schemas I can understand. But why the need for the failsafe schema? Who would use it. Are we over optimizing by including more than one schema to begin with? Almost every other language with YAML in the standard lib supports just one schema. I think we should strive for parity before iterating further. It just sounds like "gold plating" and "scope creep" to me.

… yaml-spec

Can be implemented in a better way

RX14 · 2017-09-18T22:49:06Z

We've all suffered trying to keep track of this PR for too long, we should merge this PR using this PR's approach, and then add support for other schemas later.

Regarding the arguments in this PR regarding number parsing, my opinion is that making 1.2 be accepted as a string if String.from_yaml is used is simply leaking crystal's parser details into how YAML is parsed. In my mind, the pull parser works (conceptually) by reading the next token, deducing it's type, and converting to a crystal value. read_string then simply asserts that the next token is a string, and if so it returns the crystal representation (obviously this doesn't have to be the reality for performance reasons). Making something which can parse as a string or number based on how the parser is used feels wrong and a leaky abstraction. Not to mention incompatible with other languages.

asterite · 2017-09-18T23:59:57Z

I don't want to merge this. Please don't merge it.

In the past I've merged PRs just because it seems there was a rush to merge things. This led to code that's hard to understand, refactor and maintain. I'm not saying that this is the case in this PR. I just want to be cautious. There's really no rush.

I'm also preparing a different PR where the full core YAML schema is implemented, is faster than this PR and is better organized in terms of type definitions and methods. We can later review both and choose which one we think is more suitable.

@RX14 My approach does what you suggest. It greatly simplifies the code.

RX14 · 2017-09-19T10:20:53Z

@asterite sorry, I misread and thought your idea was simply adding support for other schemas based on this PR. If you have a better idea of how to organise the code then we should wait for that.

jwaldrip force-pushed the yaml-spec branch from 3683bbc to 41384f4 Compare August 17, 2017 02:38

akzhan suggested changes Aug 17, 2017

View reviewed changes

straight-shoota reviewed Aug 17, 2017

View reviewed changes

ysbaddaden requested changes Aug 17, 2017

View reviewed changes

jwaldrip force-pushed the yaml-spec branch from c49759c to 93b5de0 Compare August 19, 2017 14:11

jwaldrip force-pushed the yaml-spec branch 2 times, most recently from bee14b7 to 60135da Compare August 20, 2017 01:27

straight-shoota reviewed Aug 20, 2017

View reviewed changes

straight-shoota requested changes Aug 20, 2017

View reviewed changes

asterite requested changes Aug 20, 2017

View reviewed changes

ysbaddaden requested changes Aug 20, 2017

View reviewed changes

straight-shoota reviewed Aug 20, 2017

View reviewed changes

akzhan approved these changes Sep 10, 2017

View reviewed changes

straight-shoota reviewed Sep 10, 2017

View reviewed changes

straight-shoota requested changes Sep 10, 2017

View reviewed changes

jwaldrip added 2 commits September 10, 2017 09:37

return false, remove parens

50d5705

Merge branch 'master' of https://github.com/crystal-lang/crystal into…

e2c8a1a

… yaml-spec

jwaldrip added 3 commits September 16, 2017 10:49

make YAML.parse support the core spec

69ee625

Merge branch 'yaml-spec' of github.com:jwaldrip/crystal into yaml-spec

31ca301

Merge branch 'master' of https://github.com/crystal-lang/crystal into…

6a5b820

… yaml-spec

asterite previously approved these changes Sep 17, 2017

View reviewed changes

Merge branch 'master' of https://github.com/crystal-lang/crystal into…

7a36cdc

… yaml-spec

asterite mentioned this pull request Sep 19, 2017

YAML revamp #5007

Merged

mverzilli closed this in #5007 Oct 2, 2017

jwaldrip mentioned this pull request Jun 29, 2019

Incorrect Parsing of YAML Hash with Union Value #7936

Closed

YAML fixes #4838

YAML fixes #4838

Conversation

jwaldrip commented Aug 17, 2017

Choose a reason for hiding this comment

akzhan commented Aug 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota commented Aug 17, 2017 • edited Loading

jwaldrip commented Aug 17, 2017

jwaldrip commented Aug 17, 2017

straight-shoota commented Aug 17, 2017 • edited Loading

RX14 commented Aug 17, 2017

jwaldrip commented Aug 17, 2017

ysbaddaden commented Aug 17, 2017 • edited Loading

ysbaddaden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwaldrip Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwaldrip commented Aug 19, 2017

jwaldrip commented Aug 19, 2017

jwaldrip commented Aug 20, 2017

akzhan commented Aug 20, 2017

straight-shoota Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

jwaldrip Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

straight-shoota Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwaldrip Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

straight-shoota Aug 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy Aug 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

straight-shoota Aug 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwaldrip commented Aug 20, 2017

straight-shoota commented Aug 17, 2017 •

edited

Loading

straight-shoota commented Aug 17, 2017 •

edited

Loading

ysbaddaden commented Aug 17, 2017 •

edited

Loading

jwaldrip Aug 20, 2017 •

edited

Loading

straight-shoota Aug 20, 2017 •

edited

Loading

jwaldrip Aug 20, 2017 •

edited

Loading

straight-shoota Aug 20, 2017 •

edited

Loading

jwaldrip Aug 20, 2017 •

edited

Loading

straight-shoota Aug 20, 2017 •

edited

Loading

Fryguy Aug 23, 2017 •

edited

Loading

straight-shoota Aug 22, 2017 •

edited

Loading

ysbaddaden Aug 21, 2017 •

edited

Loading

RX14 commented Sep 10, 2017 •

edited

Loading