Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbers in scientific notation without dot are parsed as string #173

Open
coldfix opened this issue Jun 19, 2018 · 13 comments
Open

Numbers in scientific notation without dot are parsed as string #173

coldfix opened this issue Jun 19, 2018 · 13 comments
Labels

Comments

@coldfix
Copy link

coldfix commented Jun 19, 2018

Hi,

it appears that PyYAML requires a dot in the mantissa as well as a sign in the exponent in order to parse numbers in scientific notation as numbers and otherwise parses them as string:

>>> import yaml
>>> yaml.safe_load('1e+10')
'1e+10'
>>> yaml.safe_load('1.e+10')
10000000000.0
>>> yaml.safe_load('1.0e10')
'1.0e10'

According to YAML 1.2 spec the dot is optional as in JSON:

Either 0, .inf, -.inf, .nan, or scientific notation matching the regular expression -? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.

Furthermore, JSON makes even the sign in the exponent optional: https://www.json.org/number.gif

Since YAML 1.2 aspires to be a superset of JSON, I believe it should make the sign optional as well.

Others than ran into similar problems:

https://stackoverflow.com/questions/30458977/yaml-loads-5e-6-as-string-and-not-a-number

Best, Thomas

@perlpunk
Copy link
Member

Since YAML 1.2 aspires to be a superset of JSON, I believe it should make the sign optional as well.

The sign is optional.
This is the regex for the YAML JSON Schema:
-? ( 0 | [1-9] [0-9]* ) ( \. [0-9]* )? ( [eE] [-+]? [0-9]+ )?
And this one for Core Schema:
[-+]? ( \. [0-9]+ | [0-9]+ ( \. [0-9]* )? ) ( [eE] [-+]? [0-9]+ )?

You were quoting the canonical form.

@coldfix
Copy link
Author

coldfix commented Jul 19, 2018

Oh, my mistake. So I guess both changes are justified.

@ischoegl
Copy link

ischoegl commented Mar 28, 2019

I ran into the same issue (which I consider a bug). The stackoverflow solution provides a simple fix that I hope will find its way upstream sooner than later (unfortunately it was not adopted in the step from 3.12 to 5.1)

@ischoegl
Copy link

For anyone running into similar issues, the derived package ruamel.yaml resolves issues like the one stated above.

@GenevieveBuckley
Copy link

It seems like it's not just the presence or absence or a decimal point that affects things, but also whether there is an explicit sign (positive or negative) preceeding the exponent.

I'm unsure if this is also addressed by PR #174 or if it could be included.

In [1]: import yaml

In [2]: yaml_str = ('''
   ...: hello:
   ...:   - 5.0E6
   ...:   - 5.0e6
   ...:   - 5.E6
   ...:   - 5.e6
   ...:   - 5E6
   ...:   - 5e6
   ...: ''')

In [3]: yaml.safe_load(yaml_str)
Out[3]: {'hello': ['5.0E6', '5.0e6', '5.E6', '5.e6', '5E6', '5e6']}

In [4]: yaml_str = ('''
   ...: hello:
   ...:   - 5.0E-6
   ...:   - 5.0e-6
   ...:   - 5.E-6
   ...:   - 5.e-6
   ...:   - 5E-6
   ...:   - 5e-6
   ...: ''')

In [5]: yaml.safe_load(yaml_str)
Out[5]: {'hello': [5e-06, 5e-06, 5e-06, 5e-06, '5E-6', '5e-6']}

In [6]: yaml_str = ('''
   ...: hello:
   ...:   - 5.0E+6
   ...:   - 5.0e+6
   ...:   - 5.E+6
   ...:   - 5.e+6
   ...:   - 5E+6
   ...:   - 5e+6
   ...: ''')

In [7]: yaml.safe_load(yaml_str)
Out[7]: {'hello': [5000000.0, 5000000.0, 5000000.0, 5000000.0, '5E+6', '5e+6']}

Reference to original discussion: microscopium/microscopium#159 (comment)

@AlexanderNaehring
Copy link

AlexanderNaehring commented Nov 7, 2019

Issue is still present, I use the fix found on Stackoverflow, manually adding the new implicit resolver.

@jbresciani
Copy link

I stumbled into this one today. we use a python script to alter a kubernetes manifest. it uses yaml.safe_load() to read in the manifest file, alters the dictionary and yal.dump() 's it back to the yaml. Today that manifest had an annotation of githash: "3900e128", it loaded into the script properly but when it was dumped back out it became githash: 3900e128 which kubectl then read in as serviceTag: 3.9e+131

The following is a trimmed down recreation of what happens

#/usr/bin/env python
import yaml

data = yaml.safe_load("githash: '3900e128'")
print(data)
print(yaml.dump(data))

pyyaml appears to be considering this as a string and therefore removing the quotes and the output yaml ends up being
githash: 3900e128

when it should be since it was quoted in the origional yaml file.
githash: "3900e128"

I can work around it using the following

#/usr/bin/env python
import yaml


def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')

yaml.add_representer(str, quoted_presenter)

data = yaml.safe_load("githash: '3900e128'")
print(data)
print(yaml.dump(data))

which outputs
"githash": "3900e128"

@perlpunk
Copy link
Member

Since pyyaml implements YAML 1.1, this behaviour is actually correct.
It would be better to implement resolvers and representers for YAML 1.2 and let users choose the version.
I had already implemented the resolvers in a proof of concept.
It still depends on @ingydotnet 's plans about the safe load issue, because my changes will need to rearrange the methods a bit and might conflict with those changes.

EdDev added a commit to EdDev/kubernetes-nmstate that referenced this issue Mar 3, 2021
In scenarios where an interface name is composed of numbers in
scientific notation without a dot, the application panics with a failure
to extract a string type from a Go interface (which is actually a float64).

In order to fix the problem, the parsing of the interface name is performed
using a dedicated interface-state structure.

Note: this change does not solve the problem of incorrectly representing
valid scientific numeric values like `10e+02` as strings (they are now
represented back as full integers: "1000").
This is due to the YAML-to-JSON conversion which does not obey to
the defined member type.

ref: yaml/pyyaml#173

Signed-off-by: Edward Haas <[email protected]>
kubevirt-bot pushed a commit to nmstate/kubernetes-nmstate that referenced this issue Mar 3, 2021
* state: Interpret basic state through an explicit structure

In order to ease the interpretation of the state yaml data, use a type
structure that represents the basic root items of state:
- interfaces
- routes:
  - config
  - running

In order to keep the generic nature of these items content,
they use as value a generic Go interface{}.

The filter helper functions, with this change, have been also made
immutable. They now return the filtered values.

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Introduce unit tests for numeric iface names

Interfaces names with numeric values are accepted by the kernel.
Introduce tests that checks the behavior when using numeric names.

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Do not panic on interpretation of ifaces names

In scenarios where an interface name is composed of numbers in
scientific notation without a dot, the application panics with a failure
to extract a string type from a Go interface (which is actually a float64).

In order to fix the problem, the parsing of the interface name is performed
using a dedicated interface-state structure.

Note: this change does not solve the problem of incorrectly representing
valid scientific numeric values like `10e+02` as strings (they are now
represented back as full integers: "1000").
This is due to the YAML-to-JSON conversion which does not obey to
the defined member type.

ref: yaml/pyyaml#173

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Fix interpretation of scientific numeric iface names

This change solves the problem of incorrectly representing
valid scientific numeric valuesi without a dot in them (e.g. `10e20`)
as strings (e.g. represented back as `1000`).

The problem originates from the YAML-to-JSON conversion which does not obey
to the defined member type.

Fixed by using the go-yaml package (which does not perform such a
conversion from YAML to JSON).
The go-yaml is used just to unmarshal the name from the original raw
byte stream and update the state structure with the proper name string.

Signed-off-by: Edward Haas <[email protected]>
@kislyuk
Copy link

kislyuk commented Apr 28, 2021

It still depends on @ingydotnet 's plans about the safe load issue

It appears @ingydotnet has not committed to this repository in 3 months now, and there are a number of increasingly pressing quality-of-life issues stemming from the lack of YAML 1.2 support. What is the strategy for landing YAML 1.2 support in PyYAML?

@nitzmahone
Copy link
Member

I'm not speaking for everyone, but IMO: it'll land when it's ready. There are no paid contributors or maintainers on this project, and since this is generally considered the "stable" Python YAML parser, if the API changes required to support YAML 1.2 break the world, it's going to be hell for everyone involved (even if we have "well, it's a major release, just deal" to hide behind).

@perlpunk has been doing some great work recently to get the functionality in there, and I'm still trying to find some time to really sit down and bash around with it to see what we can do to preserve backward compatibility for at least the common use cases.

@kislyuk
Copy link

kislyuk commented Apr 28, 2021

@nitzmahone thanks for that context, and thanks to @perlpunk for all the hard work as well. The main reason I was asking was to see if the release planning process was stuck waiting for input from an absent contributor. Is there anything specific that external contributors can do to speed up the process (PRs that could be accepted for specific features or refactors)? Even an interim release where the 1.2 grammar is available, but off by default, would be terrific.

@perlpunk
Copy link
Member

perlpunk commented Apr 28, 2021

@kislyuk

Even an interim release where the 1.2 grammar is available, but off by default, would be terrific.

That's the current plan. I think defaulting to 1.2, if ever, needs to wait for a while. It would break a lot of stuff =)

I'm trying to figure out a syntax with which people can use the 1.2 schema, but without adding too much classes into pyyaml itself. #512 is a draft that adds a lot of classes, but I'm working on top of that branch for an alternative.

You say "1.2 grammar". Do you think of something specific in the grammar? Because I'm really only working on the schema. #512 has some details on that.

edit: fixed PR number

aksg87 added a commit to aksg87/adpkd-segmentation-pytorch that referenced this issue May 3, 2021
@coldfix
Copy link
Author

coldfix commented Jun 30, 2023

Superseded by #555.

Edit: Sorry, wanted to close the associated PR instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants