Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic input validation for built-in attributes #57321

Merged
merged 2 commits into from
Jan 16, 2019

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented Jan 4, 2019

Correct top-level shape (#[attr] vs #[attr(...)] vs #[attr = ...]) is enforced for built-in attributes, built-in attributes must also fit into the "meta-item" syntax (aka the "classic attribute syntax").

For some subset of attributes (found by crater run), errors are lowered to deprecation warnings.

NOTE: This PR previously included #57367 as well.

@petrochenkov petrochenkov added T-lang Relevant to the language team, which will review and decide on the PR/issue. S-waiting-on-crater Status: Waiting on a crater run to be completed. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 4, 2019
@petrochenkov
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Jan 4, 2019

⌛ Trying commit 3b37f15 with merge b9139d2...

bors added a commit that referenced this pull request Jan 4, 2019
Implement basic input validation for built-in attributes + Stabilize `unrestricted_attribute_tokens`

Based on #57272

---

In accordance with the plan in https://internals.rust-lang.org/t/unrestricted-attribute-tokens-feature-status/8561:
- The correct top-level shape (`#[attr]` vs `#[attr(...)]` vs `#[attr = ...]`) is enforced for built-in attributes.
- For non-built-in non-macro attributes:
    - The key-value form is restricted to bare minimum required to support what we support on stable - unsuffixed literals (#34981).
    - Arbitrary token streams in remaining forms (`#[attr(token_stream)]`, `#[attr{token_stream}]`, `#[attr[token_stream]]` ) are now allowed without a feature gate, like in macro attributes.

This will close #55208 once completed.
Need to go through crater first.
@petrochenkov
Copy link
Contributor Author

cc @CAD97

@bors
Copy link
Contributor

bors commented Jan 4, 2019

☀️ Test successful - status-travis
State: approved= try=True


#[derive] //~ WARNING empty trait list in `derive`
struct Foo;
// compile-pass

#[derive()] //~ WARNING empty trait list in `derive`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw... I tried to look for the reason why this happens instead of unused_attributes but got lost... (cc #54651).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hardcoded warning in libsyntax/ext/derive.rs.
derive is removed during expansion currently (

let item = self.fully_configure(item)
.map_attrs(|mut attrs| { attrs.retain(|a| a.path != "derive"); attrs });
) so it doesn't survive until unused attribute checking, IIRC.

Expansion should degrade #[derive(Macro1, Macro2, ...)] into #[derive()] instead with derive attribute itself marked as used when it applies a macro.

src/test/ui/proc-macro/attribute.rs Outdated Show resolved Hide resolved
@Centril
Copy link
Contributor

Centril commented Jan 4, 2019

  • The correct top-level shape (#[attr] vs #[attr(...)] vs #[attr = ...]) is enforced for built-in attributes, built-in attributes also must fit into the "meta-item" syntax (aka the "classic attribute syntax").

I really enjoyed this and the improved error messages as well as the improvements to feature_gate.rs.

  • For non-built-in non-macro attributes:
    • The key-value form is restricted to bare minimum required to support what we support on stable - unsuffixed literals (Allow all literals in attributes (Tracking Issue for RFC #1559) #34981).
    • Arbitrary token streams in remaining forms (#[attr(token_stream)], #[attr{token_stream}], #[attr[token_stream]] ) are now allowed without a feature gate, like in macro attributes.

My understanding is that the stable grammar of attributes then are:

OuterAttr = "#" attr:Attr;
InnerAttr = "#" "!" attr:Attr;
Attr = "[" path:Path input:AttrInput "]";
AttrInput =
  | {}
  | "(" TOKEN_STREAM ")"
  | "[" TOKEN_STREAM "]"
  | "{" TOKEN_STREAM "}"
  | "=" LITERAL
  ;

Is that correct?

This will close #55208 once completed.

I have some thoughts on this:

  • The restrictions and improvements to error messages you are making I think we should make right now since they are fantastic and correct.
  • However, it seems that the more liberal attribute grammar (i.e. the stabilization) is quite a nontrivial change that, while I entirely agree with it, should probably go through an RFC.
  • But I understand that you don't enjoy writing prose so maybe T-Lang (e.g. me or some other member) or someone else can assist with the RFC authorship.
  • I do think | "=" TOKEN_TREE should be allowed at the same time as the other liberalizations take effect as I think #[foo = 1 + 1] is often more idiomatic and we should not stabilize the unidiomatic variant first and have the community rush to use that one instead of the idiomatic variant.

Need to go through crater first.

Regardless of what I've written here let's do that.

@Centril
Copy link
Contributor

Centril commented Jan 4, 2019

Assigning to r? @nikomatsakis for technical review
(I did review everything and it looked great but I'm not an expert...).

@petrochenkov
Copy link
Contributor Author

@craterbot run start=master#c0bbc3927e28c22edefe6a1353b5ecc95ea9a104 end=try#b9139d2caca1db46014a9c302d5c47cfae0d8ae0 mode=check-only

@craterbot
Copy link
Collaborator

👌 Experiment pr-57321 created and queued.
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot removed the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jan 4, 2019
@craterbot
Copy link
Collaborator

🚧 Experiment pr-57321 is now running on agent aws-1.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jan 4, 2019

@Centril

My understanding is that the stable grammar of attributes then are:

Correct.

But I understand that you don't enjoy writing prose so maybe T-Lang (e.g. me or some other member) or someone else can assist with the RFC authorship.

In this case the prose is already kind of written actually, in https://internals.rust-lang.org/t/unrestricted-attribute-tokens-feature-status/8561/3.

However, it seems that the more liberal attribute grammar (i.e. the stabilization) is quite a nontrivial change that, while I entirely agree with it, should probably go through an RFC.

Hm, I feel exactly oppositely, this is the most trivial part of the proposal.
The summary of the stabilization change is that we allow inert proc macro helpers to use same syntax as proc macro attributes themselves.
That's actually already true de-facto for non-derive helper attributes, because proc macros must remove them and feature gating is done after expansion (oops), so the relaxed syntax is already usable on stable.
FCP would be enough, I think.

The restrictions and improvements to error messages you are making I think we should make right now since they are fantastic and correct.

This also needs some attention and FCP, since this is some change of direction.
Previously attribute checking was pretty relaxed in most cases (garbage input is either recognized and used or ignored), but for some attributes input was randomly checked and errors reported.
With this work the intent changes to checking the input grammar (and eventually the attribute target) for all built-in attributes.
I think this is an improvement, but who knows what others think.

I do think | "=" TOKEN_TREE should be allowed at the same time as the other liberalizations take effect as I think #[foo = 1 + 1] is often more idiomatic and we should not stabilize the unidiomatic variant first and have the community rush to use that one instead of the idiomatic variant.

I wouldn't want to delay simple and already semi-stable delimited attributes on this.
There are so many options for expanding from #[foo = LITERAL].

  • We can decide that key-value attribute form is legacy, and do not extend it.
  • We can make key-value attributes "expression-oriented" and extend the grammar to #[foo = EXPR], leaving other things like #[foo = TYPE] unsupported.
  • We can make key-value attributes "token-oriented" and extend the grammar to #[foo = TOKEN_TREE], leaving key-value attributes with multiple tokens #[foo = a b c] unsupported (including multi-token expressions).
  • We can make key-value attributes accept almost arbitrary token streams with single special token , (comma) acting as a terminator (#[foo = TOKEN_STREAM, bar = TOKEN_STREAM]).

I think this does requires an RFC and some thoughts about the future, but I wouldn't want to want to work on it myself, since it's not high priority.

@Centril
Copy link
Contributor

Centril commented Jan 4, 2019

Hm, I feel exactly oppositely, this is the most trivial part of the proposal.
The summary of the stabilization change is that we allow inert proc macro helpers to use same syntax as proc macros themselves.
That's actually already true de-facto for non-derive helper attributes, because proc macros must remove them and feature gating is done after expansion (oops), so the relaxed syntax is already usable on stable.
FCP would be enough, I think.

Fair.

This also needs some attention and FCP, since this is some change of direction.

Hah; that was implied, sorry... :)
Let's wait for crater and then I'll fcp-merge.

(I propose that we do what the PR description suggests; tho I would split off the #[$path = $token_tree] form to another feature gate rather than remove it from nightly...)

I think this is an improvement, but who knows what others think.

It definitely is an improvement; the improved diagnostics are particularly great.

I wouldn't want to delay simple and already semi-stable delimited attributes on this.
There are so many options for expanding from #[foo = LITERAL].

  • We can decide that key-value attribute form is legacy, and do not extend it.
  • We can make key-value attributes "expression-oriented" and extend the grammar to #[foo = EXPR], leaving other things like #[foo = TYPE] unsupported.
  • We can make key-value attributes accept almost arbitrary token streams with single special token , (comma) acting as a terminator (#[foo = TOKEN_STREAM, bar = TOKEN_STREAM]).

I think this does requires an RFC and some thoughts about the future, but I wouldn't want to want to work on it myself, since it's not high priority.

Also fair... I might have some time to work on this, or @CAD97 can do it if they prefer (we can collaborate ofc...).

@craterbot
Copy link
Collaborator

🎉 Experiment pr-57321 is completed!
📊 985 regressed and 0 fixed (50551 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Jan 5, 2019
@petrochenkov
Copy link
Contributor Author

985 regressed

O_O
I guess I should have used a forbid-by-default lint instead of an error after all.
Ok, activating plan B and downloading/grepping the logs would probably be faster than running crater again.

@Centril
Copy link
Contributor

Centril commented Jan 5, 2019

I guess I should have used a forbid-by-default lint instead of an error after all.
Ok, activating plan B and downloading/grepping the logs would probably be faster than running crater again.

We might want to analyze these to categorize the regressions and see what the roots are before going for plan B... After doing that we might want to allow some new forms (and give them correct semantics rather than just ignoring...) and interpret them correctly or not based on the analysis. From a quick look I saw a lot of #[link = "dl"] originating with a single root crate.

@petrochenkov
Copy link
Contributor Author

We might want to analyze these to categorize the regressions and see what the roots

That's exactly the purpose of the plan B.
To categorize the root regressions I need to download and grep the logs :)

@Centril
Copy link
Contributor

Centril commented Jan 5, 2019

That's exactly the purpose of the plan B.
To categorize the root regressions I need to download and grep the logs :)

Oh, OK =P
Do your thing ^.^

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jan 5, 2019

First of all - no regressions from restricting key-value attributes to literals.

Beside that, here's the list of offending attributes with some statistics (statistics include non-root regressions):

126 #[deprecated = "Deprecated by RFC3445"]
 21 #[deprecated = "For Zone signing DNSKEY should be used"]
  3 #[deprecated = "just use Client::query"]
  2 #[deprecated = "just use query(...) from `Client`"]
 21 #[deprecated = "obsolete, transform_from instead"]
  2 #[deprecated = "use `error` instead."]
  6 #[deprecated = "Use blacklist_type instead"]
 69 #[deprecated = "Use upgrade instead"]
  6 #[deprecated = "use whitelist_function instead"]
  6 #[deprecated = "use whitelist_type instead"]
  6 #[deprecated = "use whitelist_var instead"]
 25 #[deprecated = "will be removed post 0.9.x"]
  2 #[deprecated = "You probably want the `Gcode::major_number()` and `Gcode::minor_number()` methods instead"]
  5 #[deprecated = "use SIG0 or DNSSec constructors"]
  5 #[deprecated = "will be removed post 0.9.x"]
  2 #[deprecated = "This was a failed attempt at finding a suitable abstraction. The async-codec crate might be what you need instead."]
 21 #[deprecated = "Deprecated by RFC3007"]
 84 #[deprecated = "Magically disappeared from the Vulkan specs"]
  3 #[deprecated = "Moved to spirit::utils"]
 21 #[deprecated = "obsolete, transform_from instead"]
 21 #[deprecated = "Renamed to COLOR_SPACE_SRGB_NONLINEAR_KHR"]
 18 #[deprecated = "Renamed to DEBUG_REPORT_OBJECT_TYPE_DEBUG_REPORT_CALLBACK_EXT_EXT"]
  1 #[deprecated = "see trust_dns::client::ClientFuture"]
  6 #[deprecated = "will be removed post 0.9.x, use RecordSet"]
  5 #[derive(::EnumSetType, Debug)]
  8 #[ignore(dead_code)]
  1 #[ignore(note = "compile-only test")]
  2 #[ignore = "Unimplemented: https://github.com/alexcrichton/toml-rs/pull/264#issuecomment-431707209"]
  4 #[inline = "always"]
  6 #[link]
861 #[link = "dl"]
  3 #[should_panic = "panicked safely"]
  3 #[should_panic = "the desired length"]
  3 #[should_panic = "the desired offset"]

Here's the list of root regressions:

altbitflags
appendbuf
arc-swap
async-serialization
bindgen
cvar
enumset
frunk_core
gcode
glutin
libc-extra
pcap-rs
rlbot
servo-glutin
spirit
systemfd
throw
toml
trust-dns
trust-dns-proto
vk-sys
winit

@petrochenkov
Copy link
Contributor Author

Conclusions:

#[derive(::EnumSetType)] is a plain bug, absolute paths are not supported in meta-items, needs fixing.
There's also a suspicious case of #[doc] without an argument apparently coming from a desugared doc comment and macro (?) in pcap-rs, needs investigating.

It seems quite reasonable to support #[deprecated = "reason"] and #[ignore = "reason"] without warnings.

ignore(...), inline = "...", link, link = "...", should_panic = "..." need to be lowered from errors to deprecation lints.

#[link = "dl"] never worked, ha-ha.

@Centril
Copy link
Contributor

Centril commented Jan 5, 2019

#[derive(::EnumSetType)] is a plain bug, absolute paths are not supported in meta-items, needs fixing.
There's also a suspicious case of #[doc] without an argument apparently coming from a desugared doc comment and macro (?) in pcap-rs, needs investigating.

👍

It seems quite reasonable to support #[deprecated = "reason"] and #[ignore = "reason"] without warnings.

Agreed; I suggested the same in #56896 (comment).

ignore(...), inline = "...", link, link = "...", should_panic = "..." need to be lowered from errors to deprecation lints.

re. should_panic = "..." we could maybe make that synonymous with should_panic(expected = "...") similarly to deprecated as above?

As for deprecation lints, do you mean C-future-compatibility or that we should keep this forever?

#[link = "dl"] never worked, ha-ha.

😂

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jan 5, 2019

?re. should_panic = "..." we could maybe make that synonymous with should_panic(expected = "...") similarly to deprecated as above?

Yes, didn't notice that. If should_panic already supports specifying the reason, then adding shortcut is ok.

As for deprecation lints, do you mean C-future-compatibility or that we should keep this forever?

What's the difference? Even if the warning ends up being kept forever, it's going to be framed as a deprecation lint anyway.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jan 15, 2019

📌 Commit d3411d3 has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 15, 2019
@Centril
Copy link
Contributor

Centril commented Jan 15, 2019

@bors p=7

Rollup fairness.

@bors
Copy link
Contributor

bors commented Jan 16, 2019

⌛ Testing commit d3411d3 with merge ceb2512...

bors added a commit that referenced this pull request Jan 16, 2019
Implement basic input validation for built-in attributes

Correct top-level shape (`#[attr]` vs `#[attr(...)]` vs `#[attr = ...]`) is enforced for built-in attributes, built-in attributes must also fit into the "meta-item" syntax (aka the "classic attribute syntax").

For some subset of attributes (found by crater run), errors are lowered to deprecation warnings.

NOTE: This PR previously included #57367 as well.
@bors
Copy link
Contributor

bors commented Jan 16, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: nikomatsakis
Pushing ceb2512 to master...

@bors bors merged commit d3411d3 into rust-lang:master Jan 16, 2019
bors added a commit that referenced this pull request Feb 25, 2019
Stabilize `unrestricted_attribute_tokens`

In accordance with a plan described in https://internals.rust-lang.org/t/unrestricted-attribute-tokens-feature-status/8561/3.

Delimited non-macro non-builtin attributes now support the same syntax as macro attributes:
```
PATH
PATH `(` TOKEN_STREAM `)`
PATH `[` TOKEN_STREAM `]`
PATH `{` TOKEN_STREAM `}`
```
Such attributes mostly serve as inert proc macro helpers or tool attributes.
To some extent these attributes are de-facto stable due to a hole in feature gate checking (feature gating is done too late - after macro expansion.)
So if macro *removes* such helper attributes during expansion (and it must remove them, unless it's a derive macro), then the code will work on stable.

Key-value non-macro non-builtin attributes are now restricted to bare minimum required to support what we support on stable - unsuffixed literals (#34981).
```
PATH `=` LITERAL
```
(Key-value macro attributes are not supported at all right now.)
Crater run in #57321 found no regressions for this change.
There are multiple possible ways to extend key-value attributes (#57321 (comment)), but I'd expect an RFC for that and it's not a pressing enough issue to block stabilization of delimited attributes.

Built-in attributes are still restricted to the "classic" meta-item syntax, nothing changes here.
#57321 goes further and adds some additional restrictions (more consistent input checking) to built-in attributes.

Closes #55208
@ehuss
Copy link
Contributor

ehuss commented Feb 27, 2019

@petrochenkov I was curious, was the test attribute intentionally excluded from BUILTIN_ATTRIBUTES? It is the only builtin attribute that is not listed (that I know of).

Also "no_builtins" and "no_mangle" are listed twice, was that intentional?

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Feb 27, 2019

@ehuss

I was curious, was the test attribute intentionally excluded from BUILTIN_ATTRIBUTES?

test (and also bench and test_case) are built-in attribute macros rather than inert attributes, they are registered in fn register_builtins together with other built-in macros.
(cfg, cfg_attr and derive being in BUILTIN_ATTRIBUTES is also a bit questionable since they are not inert, but that's how things work right now.)

Also "no_builtins" and "no_mangle" are listed twice, was that intentional?

Ha, good catch.
The duplicates need to be removed
(Built-in attributes are searched by name using linear search, so the second instance is never found.)

@ehuss
Copy link
Contributor

ehuss commented Feb 27, 2019

are built-in attribute macros

Ah, thanks!. The reason I ask is because I'm looking at improving the reference documentation for attributes, and now all the attributes have nice input validation, except test/bench which is unrestricted. I'm unsure if that should be documented. I see a handful of crates taking advantage of that (such as #[test(whatever)]), which seems odd. Should those also be validated?

@petrochenkov
Copy link
Contributor Author

@ehuss

Should those also be validated?

I think so, each macro will have to do that individually though, without help from the validation infra introduced in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants