Force double quotes on integers and null in string #112

jnystad · 2022-06-30T15:01:25Z

Fixes #106

Also fixes quoting for integer values in strings.

I guess the double.TryParse was an optimization to avoid parsing long string values if unnecessary? If so, this might impact performance when serializing a lot of longer string values. Might add similar parsing attempts for bools and integers (null check is trivial), but not sure if C# parsers match the YAML spec exactly.

xoofx · 2022-06-30T15:33:58Z

Hm, I think that it was to avoid Schema.TryParse which is a lot more costly than the early exit double.TryParse

jnystad · 2022-07-01T08:34:46Z

Yes, that's what I thought. Any suggestion to fix? Perhaps parsing a substring would be sufficient.

nickcampau · 2022-07-01T12:40:56Z

I just ran into an issue that was caused by this issue yesterday. To my utter amazement this PR had just been opened a few hours before. I want to shoutout @jnystad 🥇 your timing couldn't have been more perfect. I was literally cloning down the repo to look into fixing this issue when I saw this PR.

I'd like to suggest that this PR not wait for any further performance enhancement modifications since it addresses the main issue of YAML non-compliance. Instead a new PR be created to improve the performance (if possible) to investigate those concerns.

xoofx · 2022-07-01T14:58:30Z

I'd like to suggest that this PR not wait for any further performance enhancement modifications since it addresses the main issue of YAML non-compliance. Instead a new PR be created to improve the performance (if possible) to investigate those concerns.

This behavior has been around for the past 5 years, so there is no urgency to land this PR.

Yes, that's what I thought. Any suggestion to fix? Perhaps parsing a substring would be sufficient.

Could you run a benchmark before/after using BenchmarkDotNet on a significant yml file to see the impact?

jnystad · 2022-07-04T11:57:50Z

Given the customizable nature of SharpYaml, and specifically since Schema in SerializerContext can be overridden I think any optimizations added here may come into conflict with custom Schema handling and therefore do not belong here.

Also, I'm not sure I have the overview of the library to be certain I'm creating useful performance tests.

A preliminary investigation using Stopwatch and loops indicate that the actual Serialize step has a negligible performance cost compared to initializing a new Serializer (i.e. creating a new Serializer and running Serialize takes ~500ms on a large object while the next thousand with the same Serializer takes ~5ms in total).

Also, I don't think the regexes used by the real parser are significantly slower than any other check one can do, at least not in a scale that matters. They probably return fast enough regardless of string length considering the limited matches they produce.

But as I said, I don't know the internal mechanics enough to determine if I'm actually testing this properly (as in, is there some caching where feeding the same deserialized object to the same Serializer somehow skips the actual serialization?).

Regarding the urgency, I wonder how existing users of SharpYaml and YamlDotNet (which has the same flaw with at least integer values) live with this. Perhaps they override the default behaviour (like I have done where I'm currently using SharpYaml until this fix is merged).

xoofx · 2022-07-04T17:36:20Z

A preliminary investigation using Stopwatch and loops indicate that the actual Serialize step has a negligible performance cost compared to initializing a new Serializer (i.e. creating a new Serializer and running Serialize takes ~500ms on a large object while the next thousand with the same Serializer takes ~5ms in total).

Afair, It is recommended to use a Serializer per thread and keep it around, as it is storing metadata information that you really don't want to recompute.

Also, I don't think the regexes used by the real parser are significantly slower than any other check one can do, at least not in a scale that matters. They probably return fast enough regardless of string length considering the limited matches they produce.

Right, this could be also further optimized with .NET 7+ regex optimizations in the future.

Anyway, thanks for the fix, I'm gonna merge it. The regression in perf might be negligeable and if someone notice it, we will have a candidate to fix it 😅

jnystad · 2022-07-05T10:04:10Z

Thanks!

Force double quotes on integers and null in string

da2c0e8

xoofx added the bug label Jul 3, 2022

xoofx merged commit a8fcaa5 into xoofx:master Jul 4, 2022

dciborow mentioned this pull request Nov 1, 2023

Issue with loadYamlContent function Azure/bicep#12279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force double quotes on integers and null in string #112

Force double quotes on integers and null in string #112

jnystad commented Jun 30, 2022

xoofx commented Jun 30, 2022

jnystad commented Jul 1, 2022

nickcampau commented Jul 1, 2022

xoofx commented Jul 1, 2022

jnystad commented Jul 4, 2022 •

edited

Loading

xoofx commented Jul 4, 2022

jnystad commented Jul 5, 2022

Force double quotes on integers and null in string #112

Force double quotes on integers and null in string #112

Conversation

jnystad commented Jun 30, 2022

xoofx commented Jun 30, 2022

jnystad commented Jul 1, 2022

nickcampau commented Jul 1, 2022

xoofx commented Jul 1, 2022

jnystad commented Jul 4, 2022 • edited Loading

xoofx commented Jul 4, 2022

jnystad commented Jul 5, 2022

jnystad commented Jul 4, 2022 •

edited

Loading