Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float parsing is unnecessarily restrictive #290

Closed
rogpeppe opened this issue Nov 13, 2017 · 14 comments · Fixed by carvel-dev/ytt#483
Closed

Float parsing is unnecessarily restrictive #290

rogpeppe opened this issue Nov 13, 2017 · 14 comments · Fixed by carvel-dev/ytt#483

Comments

@rogpeppe
Copy link
Contributor

rogpeppe commented Nov 13, 2017

PR #171 restricted the possible float syntax, but it's not clear that it was the right thing to do.
The YAML spec itself provides a regexp for floats (-?[1-9](\.[0-9]*[1-9])?(e[-+][1-9][0-9]*)?) that is cast into doubt by the example immediately following (2.3e4) which is not matched by the regexp.

Also, JSON allows floats of the form 2e5 and YAML is supposed to be able to read JSON.

As it seems other implementations treat [0-9]+e[0-9]+ as a string, perhaps we should do that too, despite JSON-incompatibility, but treat it as a float if there's a decimal point in the mantissa or a + or - sign before the exponent.

@rogpeppe
Copy link
Contributor Author

@alexjh @aarondl What do you think?

@alexjh
Copy link

alexjh commented Nov 21, 2017

treat it as a float if there's a decimal point in the mantissa or a + or - sign before the exponent.

@rogpeppe I think that's reasonable from our use case. A git commit ID shouldn't have those characters :)

@ensonic
Copy link

ensonic commented Dec 5, 2017

The current version seems to fail on "0." which should definitely be accepted.

yaml: unmarshal errors:↵  line 3: cannot unmarshal !!str `0.` into float64"

@rogpeppe
Copy link
Contributor Author

rogpeppe commented Dec 6, 2017

The current version seems to fail on "0." which should definitely be accepted.

Why do you say that it should definitely be accepted? That string doesn't match the regular expression provided in the standard, which includes mandatory digit after the decimal point.

@ensonic
Copy link

ensonic commented Dec 7, 2017

Why do you say that it should definitely be accepted? That string doesn't match the regular expression provided in the standard, which includes mandatory digit after the decimal point.

That part of the regexp is (\.[0-9]*[1-9])? That means: dot + [0-9] zero or many times + [1-9], aka if al you have is '0' after the dot, you can leave it out. Do I miss anything?

We have a tool that write yaml and we now patched it as a hotfix.

@aarondl
Copy link

aarondl commented Dec 7, 2017

@ensonic I think you're actually reading the regexp wrong. It specifically has an unqualified [1-9] (no *, ? or + afterwards) and that means that you must have exactly one digit that is 1-9 following a .

Here's a regexp tester showing a negative match for 0.:
https://regex101.com/r/5sY1K4/1

Note that I put a 0 at the beginning of the regexp to allow us to test with 0.0, 0.1, 0. etc.

@rogpeppe
Copy link
Contributor Author

rogpeppe commented Dec 7, 2017

On reading the spec a bit more closely, the regexp currently used is the canonical pattern for numbers; the actual regexp is specified in section 10.3.2 and is -?(0|[1-9][0-9]*)(\.[0-9]*)?([eE][-+]?[0-9]+)?, which does actually allow a trailing dot. Apologies to @ensonic - it looks like 0. should be accepted (but there may be a backward compatibility issue here if YAML previously output by go-yaml did not quote strings of the form [0-9]\..

@ensonic
Copy link

ensonic commented Dec 8, 2017

He, no worries. My brain actually misinterpreted the regexp above. But indeed on yaml.org (http://yaml.org/type/float.html) they list a different regexp. In any case, just wanted to let you know that we're not blocked.

@niemeyer
Copy link
Contributor

This issue seems somewhat theoretical right now. If someone has actual issues with the current behavior, please let us know and we'll look into it.

@niemeyer
Copy link
Contributor

Actually, reopening. Went looking further, and the changes in PR #171, and the tests added there, indeed disagree with the specification. So this is not just unnecessarily restrictive, it's plain wrong if we intend to implement version 1.2 of the spec. Will look further into that for v3.

@niemeyer niemeyer reopened this Mar 26, 2018
@rogpeppe
Copy link
Contributor Author

FWIW PR #171 (commit 4c78c97) seems to have caused breakage beyond the desired fix issue.
For example 2.2e11 parses as a string not a float (this breaks the gopkg.in/charm.v6 tests when using yaml tip).
Weirdly, it doesn't happen when parsing 2.2e11 at the top level.

This was the code I used to bisect, FWIW:

package main

import (
	"fmt"
	"os"

	yaml "gopkg.in/yaml.v2"
)

func main() {
	var x []interface{}
	err := yaml.Unmarshal([]byte(`- 2.2e11`), &x)
	if err != nil || len(x) != 1 {
		fmt.Println("error or not slice")
		os.Exit(1)
	}
	_, ok := x[0].(float64)
	if !ok {
		fmt.Println("not float64")
		os.Exit(1)
	}
}

rogpeppe added a commit to rogpeppe-contrib/charm that referenced this issue Jun 12, 2018
Previously it looked like docker resources were supported
but they didn't work because the filename parameter
was mandatory.

Also make the tests pass on yaml.v2 tip; see go-yaml/yaml#290.
ExternalReality pushed a commit to ExternalReality/charm that referenced this issue Jul 3, 2018
Previously it looked like docker resources were supported
but they didn't work because the filename parameter
was mandatory.

Also make the tests pass on yaml.v2 tip; see go-yaml/yaml#290.
@erikgrinaker
Copy link

This hit us as well - we can no longer use e.g. 10e9 as a shorthand for 10 GB, and 10000000000 is not as readable.

@niemeyer
Copy link
Contributor

Fixed in 7b8349a. Sorry for the trouble.

@niemeyer niemeyer changed the title float parsing is unnecessarily restrictive Float parsing is unnecessarily restrictive Mar 20, 2019
niemeyer added a commit that referenced this issue Apr 3, 2019
The regular expression is copy & pasted form the one in the spec.
The change suggested in #171 and integrated was improper.

Closes #290.
@rehevkor5
Copy link

on yaml.org (http://yaml.org/type/float.html) they list a different regexp

Right, but that is for Yaml 1.1, whereas the [+-] is optional in YAML 1.2's JSON schema tag resolution and YAML 1.2s core schema tag resolution.

Regarding decimal and plus/minus characters,

A git commit ID shouldn't have those characters

Right. However, since the decimal and the [+-] is not required, YAML parses certain unquoted hex strings (git commits included) as floats, for example: 504e199. This might be surprising if your YAML serialization library is not explicitly quoting strings or tagging them. Just a FYI if you're getting errors like "error unmarshaling JSON" with type mismatches.

I feel like this may be a weakness in the YAML 1.2 spec under 10.1.1.3 Generic String where it describes the Canonical Form glibly as "The obvious." I am not very familiar with how the spec works, but it seems to me that if there is no clearly described canonical form for strings, serialization tools are liable to produce output (like the example above) which does not produce accurate round-trip ser/deser. I plan to ask what they think about that on the YAML mailing list.

pivotaljohn pushed a commit to carvel-dev/ytt that referenced this issue Sep 4, 2021
The regular expression is copy & pasted form the one in the spec.
The change suggested in go-yaml/yaml#171 and integrated was improper.

Closes go-yaml/yaml#290

(cherry-pick of go-yaml/yaml@7b8349a)

Signed-off-by: John Ryan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants