-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anchors: Add new rule to detect undeclared or duplicated anchors #550
Conversation
According to the YAML specification [^1]: - > It is an error for an alias node to use an anchor that does not > previously occur in the document. The `forbid-undeclared-aliases` option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default. - > The alias refers to the most recent preceding node having the same > anchor. This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option `forbid-duplicated-anchors` allows that. It's disabled by default. - > It is not an error to specify an anchor that is not used by any > alias node. This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) could be implemented in the future. See #537. Fixes #395 Closes #420 [^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to try it out with actual yaml files. I think it looks good so far though. The unit test currently tests following scenarios:
- rule disabled
- rule enabled with option forbid-undeclared-aliases but disabled option forbid-duplicated-anchors
- rule enabled with option forbid-duplicated-anchors but disabled option forbid-undeclared-aliases
Should there be another test with both options enabled? Just to cover all the scenarios.
Also, like the yaml samples; they cover all the different types of contents for each tests I think.
self.check('---\n' | ||
'- &b true\n' | ||
'- &i 42\n' | ||
'- &s hello\n' | ||
'- &f_m {k: v}\n' | ||
'- &f_s [1, 2]\n' | ||
'- *b\n' | ||
'- *i\n' | ||
'- *s\n' | ||
'- *f_m\n' | ||
'- *f_s\n' | ||
'---\n' # redeclare anchors in a new document | ||
'- &b true\n' | ||
'- &i 42\n' | ||
'- &s hello\n' | ||
'- *b\n' | ||
'- *i\n' | ||
'- *s\n' | ||
'---\n' | ||
'block mapping: &b_m\n' | ||
' key: value\n' | ||
'extended:\n' | ||
' <<: *b_m\n' | ||
' foo: bar\n' | ||
'---\n' | ||
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n' | ||
'...\n', conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the yaml sample to be hard to read, mostly because of single-quotes and new-line character (\n
) on every line. Some samples are are JSON instead of yaml-way. It also makes the self.check(...)
a bit hard to read in my opinion, but it's probably a personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. This style is common to all other rules tests, so I guess it's more a general question. In my case, I got used to this notation and can read it easily.
'---\n' | ||
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n' | ||
'...\n', conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something. Is it a valid yaml? I think *x: 4
is invalid, is it? I tried the yaml-to-json converting tool and also this online yamllint. Both tools are unable to process it because of *x: 4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is valid YAML 🙂
Anchor x
equals string "b"
, so *x: 4
means b: 4
.
You can try by yourself:
$ echo '{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}' | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)'
{
"a": 1,
"b": 4,
"c": 3,
"e": 3
}
PS: Don't trust online YAML linters...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... Got it. Thanks for explaining. I have not used anchors that way yet. Learned something new today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*x: something
is not valid YAML.
It should be
*x : something
.
It is required to put a space after the alias.
It's just that libyaml and PyYAML didn't implement this correctly.
The reason is that a colon can be part of an anchor name, but libyaml and PyYAML don't allow this, so they also don't require a space between the alias and the colon.
To be compatible with all YAML processors, add a space.
A linter where the underlying parser does not error on this should probably error or warn on this itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem6=(24, 7), | ||
problem7=(27, 36)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the referenced problem lines be commented on for consistency like the other problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, but I don't feel the need for that: I only put comments on problem lines that are tricky to understand/maintain. If you look at other tests files, comments are rare, and placed only if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback @amimas 👍
I would be best if you could try it, because once merged and released, I won't be able spend much more time on it.
Should there be another test with both options enabled? Just to cover all the scenarios.
I asked myself the same question while writing them, but concluded no, otherwise it would create a lot of repetition especially when you add your third option forbid-unused-anchors
. And each option should work individually so I think we're OK. What do you think?
self.check('---\n' | ||
'- &b true\n' | ||
'- &i 42\n' | ||
'- &s hello\n' | ||
'- &f_m {k: v}\n' | ||
'- &f_s [1, 2]\n' | ||
'- *b\n' | ||
'- *i\n' | ||
'- *s\n' | ||
'- *f_m\n' | ||
'- *f_s\n' | ||
'---\n' # redeclare anchors in a new document | ||
'- &b true\n' | ||
'- &i 42\n' | ||
'- &s hello\n' | ||
'- *b\n' | ||
'- *i\n' | ||
'- *s\n' | ||
'---\n' | ||
'block mapping: &b_m\n' | ||
' key: value\n' | ||
'extended:\n' | ||
' <<: *b_m\n' | ||
' foo: bar\n' | ||
'---\n' | ||
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n' | ||
'...\n', conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. This style is common to all other rules tests, so I guess it's more a general question. In my case, I got used to this notation and can read it easily.
'---\n' | ||
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n' | ||
'...\n', conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is valid YAML 🙂
Anchor x
equals string "b"
, so *x: 4
means b: 4
.
You can try by yourself:
$ echo '{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}' | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)'
{
"a": 1,
"b": 4,
"c": 3,
"e": 3
}
PS: Don't trust online YAML linters...
problem6=(24, 7), | ||
problem7=(27, 36)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, but I don't feel the need for that: I only put comments on problem lines that are tricky to understand/maintain. If you look at other tests files, comments are rare, and placed only if needed.
I'll merge this now so it can go in the same release as #548 (comment). |
Thanks for merging it. Apologies for the delay here. Have been busy lately and didn't get a chance to look into it further. |
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be noted `{*x : 4}`. The reason is that a colon can be part of an anchor name. See this comment from Tina Müller for more details: #550 (comment) Even if it's not a problem for yamllint, let's fix our tests to include valid YAML snippets.
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be noted `{*x : 4}`. The reason is that a colon can be part of an anchor name. See this comment from Tina Müller for more details: #550 (comment) Even if it's not a problem for yamllint, let's fix our tests to include valid YAML snippets.
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be noted `{*x : 4}`. The reason is that a colon can be part of an anchor name. See this comment from Tina Müller for more details: #550 (comment) Even if it's not a problem for yamllint, let's fix our tests to include valid YAML snippets.
In the rare case when the key before `:` is an alias (e.g. `{*x : 4}`), the space before `:` is required (although this requirement is not enforced by PyYAML), the reason being that a colon can be part of an anchor name. Consequently, this commit adapts the `colons` rule to avoid failures when this happens. See this comment from Tina Müller for more details: #550 (comment)
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be noted `{*x : 4}`. The reason is that a colon can be part of an anchor name. See this comment from Tina Müller for more details: #550 (comment) Even if it's not a problem for yamllint, let's fix our tests to include valid YAML snippets.
In the rare case when the key before `:` is an alias (e.g. `{*x : 4}`), the space before `:` is required (although this requirement is not enforced by PyYAML), the reason being that a colon can be part of an anchor name. Consequently, this commit adapts the `colons` rule to avoid failures when this happens. See this comment from Tina Müller for more details: #550 (comment)
According to the YAML specification 1:
The
forbid-undeclared-aliases
option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default.This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option
forbid-duplicated-anchors
allows that. It's disabled by default.This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g.
forbid-unused-anchors
) could be implemented in the future. See feat: add rule for unused anchors #537.Fixes #395
Closes #420
Footnotes
https://yaml.org/spec/1.2.2/#71-alias-nodes ↩